Click here to load reader
Upload
mercia
View
151
Download
7
Embed Size (px)
DESCRIPTION
核酸序列分析及结构预测 主 讲 张 军 细胞生物学及遗传学教研室. 第一节 核酸序列的数据形式 1. 串( string )符号或字符的有序排列,符号或字符来自有限集合 {A, T, G, C} 。序列( sequence )与串是同一概念。 s=ATTGCATATG ;串的长度 |s|; 串 s 某个位置的字符表示为 s i , 1 ≤ i ≤ |s| 。 特别的,长度为 0 的串称为空串( empty string ),用符号 ε 表示。. 2. 子串( substring )和子序列( subsequence ),二者不是相同的概念。 子串和超串 - PowerPoint PPT Presentation
Citation preview
1. string{A, T, G, C}sequence
s=ATTGCATATG|s|; ssi 1 i |s|
0empty string
2. substringsubsequence
s=ATGCGGTA; t=TGCGG; st
s=ATGCGGTA; t=TGTA st
interval
s=ATGCGGTACGTATACG; u=CG, s[i, i+1]
3. uw(concatenation),uws = ATGCGGTA; t=TGCGGst = ATGCGGTATGCGGts = TGCGGATGCGGTAs = ATsss= AT AT AT=s3
prefix
s = ATGCGGTAGC; prefix(s,3)=ATG; prefix(s,0) =
s1u, s=tu, tu
suffix
s = ATGCGGTAGC
suffix(s,3) =AGC suffix(s,2) =GC suffix(s,0) =
s1u, s=ut, tu
(killer agent)1
||-1
, s = ATGCGGTAGC
s= TGCGGTAGCs = ATGCGGTAG ATGC GGTAG ? (ATGC ) GGTAG ATGC ( GGTAG )
stu=(st)u=s(tu); |s| -1, |t| -1, |u| -1
|st| = |s| + |t| ,st
1
s[ij]= i-1 s |s| -j
prefix(s, k) = s |s| -k
Suffic(s, k) = |s| -k s
homology- Orthologous paralogous
similarity
a1 in species I, a1 in species II)a1 and a2 in species I
Alignment s=GACGGATTAGt=GATCGGAATAGAlignment2: GA-CGGATTAGGATCGGAATAGAlignment1:GACGGATTAG GATCGGAATAG
()
4DNA{A, C, G, T}
IUPAC
IUPAC
GGGuanine AAAdenine TTThymine CCCytosineRG or APurine YT or CPyrimidine MA or CAmino KG or TKeto SG or CStrong interaction (3 H bonds) WA or TWeak interaction (2 H bonds) HA or C or TNot-GBG or T or Cnot-AVG or C or Anot-T(not-U) DG or A or Tnot-C NG or A or T or CAny
DNA
1
2
3
4
54
(global alignment)
s=ATTGCATATGt=ATTGATATC
s=ATTGCATATGt=ATTG ATATC
121
s, t2sim(s, t)=max{score i}
s=ATTGCATATG s=ATTGCATATG t=ATTG ATATC; t=ATTG ATATC8(-2)(-1)=5 4+ (-2) + (-1) 5 =-1
2.
st(local alignment)st
s=AATTGCATATGt=ATTGT
s(2,3,4,5)=t(1,2,3,4)
3.
st
s=AATTGCATATGt=ATTGTs=AATTGCATATG s=AATTGCATATGt= - ATTGT - - - - - t= A- TTG - - T - - -
2
2sim(s, t)max{score i}
s t; t=AGCTT; s=TTA TTA - - TTA AGTTA AGCTAAGCTT
1 2
(cost)
dist(s,t)=min{cost i}
ACCGACAATATGCATA ATAGGTATAACAGTCAACCGACAATATGCATA ACTGACAATATGGATA
RNA
1 2
1 1
108108
aHomo sapiensPongo pygmaeusb108 (a) (b)
DNA
DNA
DNA
(fragment assembly)
fragment
ATTGGGCA; CGATT; TGGGCAGA - - ATTGGGCA - -CGATT - - - - - - - - - - - TGGGCAGA
CGATTGGGCAGA
(shortest common superstring)
(reconstruction)
(multicontig)
DNA DNA
DNADNADNADNARNA
DNAPromoterTerminator sequenceSplice site
DNA
training setcontrol set
training setcontrol set
Sn Sp
TpTnFnFp
functional sitefunctional sequencemotifsignal
functional region
A common consensus : NTATN
1
215
GGAATTCCRG or AYT or CMA or CKG or TSG or C(3)WA or T(2)HA or C or TGBG or T or CAVG or C or AT(U)DG or A or TCNG or A or T or C
: (1) N(2) (3) 54(4) 2(5) N1
TTATGATATATACGCTTGTC TCCAC TTATGATATATACGCTTGTC TCCAC TNNNN tTATG tACGC tTGTC tCCAC tTATG tACGC tTGTC tCCAC TNNNC [1] [2] [3] [4] [2] [3] NNNNNTNNNN TNNNC tACGc tTGTc tCCAc [4] [2] tACGc tTGTc tCCAc [3] TNSNC [5] Consensus1 TNSNC TTATG ATATA [5] Consensus2 NTATN TNSNC
DNA
B
4n 4n
M(aj,j)aj,a {A,T,G,C}
s=a1a2an
S=ATTGCA Ws = 1+6+14-5+8+19=43 TTWs TSWs T'S
MA+ A- 1M 23-6 3SiSi A+ 4Si A-5 4WSi TMSiM16 5WSi TMSiM16 67 7M2
MM
DNA
ORF,open reading frame
()
21 64/3
DNAORFORFORF
() 641
DNA36:4:1
DNA
DNAORFORFORF
ORF
fabcabca1,b1,c1, a2,b2,c2,, an+1,bn+1a1b1c1n
n
i
3nPiPi
()
() :
sensitivitySnspecificitySp
() EST
() 53
() ,e1, i1, , in-1, en , ATG-1n-UAG
donor- gt acceptor- ag
gene A
i0, e1, i1, , en, in ij0jn el1ln i0in
DNA 13
2
3-i0, e1-en, in
sourcesink
() DNARNA
HPESE-mailWSwebCL/EXSC
RNA
RNA
GCG GCG (Genetics Computer Group)
140
GCG GenBankEMBL GCGPIRSWISS-PROTSP-TrEMBL
1Gap: BestFit: FrameAlign: CompareDotPlot: GapShowProfileGap:
2PileUp: HmmerAlignPlotSimilarityPrettyPrettyBoxMEMEHmmerBuildHmmerCalibrateProfileMakeProfileGapOverlapNoOverlapOldDistances
3LookUp
StringSearch
Names
4BLASTNetBLASTFastASsearchTFastA/TfastX/FastXFrameSearchMotifSearchHmmerSearchProfileSearchProfileSegmentsFindPatternsMotifsWordSearchHmmerPfamSegments
5DNA/RNAMfoldDNARNAPlotFoldMfoldStemLoop
6PAUPSearchPAUPDisplayDistancesDiverge
7GelStartGelEnterGelMergeGelAssembleGelViewGelDisassemble
8TestCodeCodonPreferenceFramesRepeatCompositionCodonFrequencyCorrespond
9MapMapPlotMapSort: PeptideMapPlasmidMapPeptideSort:
10PrimePrimePairMeltTemp
11ProfileScanCoilScanHTHScanSPScanIsoelectric: PepPlotPeptideStructurePlotStructure
12 ReverseShuffle CorruptSampleDataSetGCGToBLAST