36
Chap. 2. Data Searches and Pairwise Alignment

Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Chap. 2. Data Searches and Pairwise Alignment

Page 2: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Dot plot

http://www.vivo.colostate.edu/molkit/dnadot/

dot plot은가로세로축에염기서열또는아미노산서열을배열하고서로같은부분에점을찍어나타낸것이다. 특정염기서열또는아미노산서열의내부구조를파악하기위해서 dot plot을사용한다. 주로긴서열은일정단위 window 구간으로잘라서로비교한다(예를들어 100bp 구간으로잘라 95%이상같으면동일한것으로취급하여점을찍음).

Page 3: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Sequence A a b c d e

a b d c ebSequence B

Dot Plot of A X B

ab

cd

e

a b d c ebSequence B

Sequence A

Page 4: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Q: 오른쪽의엽록체유전체는 interted repeat unit (IRa and Irb)에의해Large single copy region (LSC)과 small single copy region (SSC)으로나뉘어진다. 화살표부분을끊어선형 DNA를만든다면자신의서열에대한자신의서열을 dot plot 하면어떤모습일까?

Page 5: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Tandem repeat sequence!

Guess the sequence composition of sequences A based on the following dot plot!

Sequence A

Sequence A

Page 6: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Dot plot 를이용한연구의실제예:대사초(Carex siderosticta)의 전체엽록체유전체를 그림과같이밝혀냈다 (195,251 bp). 이엽록체유전체의를 self dot plot 및대사초엽록체유전체에대한근연 분류군의엽록체유전체를 dot plot 함으로서유전체의구조변이를파악할수있다.

Carex siderostictachloroplast genome

195,251 bp0 kb

20 kb

40 k

b60 kb

80 kb

100 kb

120 kb

140

kb

160 kb

180 kb

psbA

trnK‐UUU*matK

rps2

rpoC2

rpoC1*

rpoB

trnC‐G

CApetN

psbM

trnD‐GUC

trnY‐G

UAtrnE‐U

UCpsbDpsbC

trnS‐U

GA

psbZtrnG‐UCC

trnfM

‐CAU

trnV‐GA

CrbcL

atpBatpE

trnM‐CAU

trnV

‐UAC

*

ndhC

ndhK

ndhJ

trnF‐GAA

trnL‐UAA

*trnT

‐UGU

atpI

atpH

atpF*

atpA

trnS‐GCU

psbIpsbK

rps11

petD*

petB*psbH

psbN

psbT

psbB

psaApsaBrps14trnQ‐UUGrps4psaIycf4cemApetA

psbJ psbLpsbFpsbE

petLpetG

trnW‐CCAtrnP‐UGG

psaJrpl33

rpl20

3?rps12*

ycf3*trnS‐GGA rpl36rps16*rps8rpl14rpl16*

rpl22 rps19

trnH‐GUG

trnH‐GUG

rpl2*

trnI‐CAU

rps7

ndhB*

trnL‐CAA

3?rps12*rps7ndhB*trnL‐CAA

rrn16

trnI‐GAU*

ycf68*

trnA‐UG

C*rrn

23rrn

4.5

rrn5 trnR‐ACG

trnN

‐GUU

trnN‐GUU

rps15

ndhH

ndhA

*nd

hInd

hG Fhdn

r pl32t rnL‐U

AGccsA

ndhDpsaC

ndhEndhG

ndhIndhA*ndhHrps15

SSC

trnfM

‐CAU

trnR

‐UCU

trnG

‐GCC

*

5?rps12

rps19rpl2*trnI‐CAU

rrn16

trnI‐GAU*

ycf68*trnA‐UGC*

rrn23

rrn4.5

rrn5trnR‐ACG

Photosynthesis

Ribosomal RNA

Biosynthesis of cofactor

Uncharacterized protein

Energy metabolism

Transcription & translation

Transporter

Transfer RNA

Intron

Chloroplast genome of

Carex siderosticta195,251 bp

Page 7: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

짧은선들은무엇을나타낼까???

비슷한시퀀스의중복!

대사초엽록체유전체의 self dot plot

Page 8: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

- window size is 100 bp- minimum similarity is 75% in the dot plot.

중앙의대각선선이단절되어있다. 이것은무엇을의미할까?

두시퀀스사이에gap/insertion 이존재

부들

대사초

대사초와부들의엽록체유전체 dot plot

검정색선은동일방향으로일치. 빨간색선은 inversion 이일어난것

빨간색선과검정색선의차이는?

중앙에서한참벗어난부위에있는선이의미하는것은?

translocation!

Page 9: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

UGENE program에의한 dot plot 실습

Page 10: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Recognition of WGD (Whole Genome Duplications) usingself-dot plot of a genome

http://amborella.net/2010Bioinformatics/Week04-Bowers%20et%20al%202003%20Nature%20Genome%20dup.pdf

Arabidopsis thaliana

2003년 Arabidopsis전체유전체가밝혀진이후이를 dot plot을이용하여분석하여매우흥미로운사실을밝혀냈다. 다섯개의염색체를일렬로배열한후자신에대한자신의 dot plot을해본결과서열의중간중간에 + 기울기의, 그리고 –기울기의선들을발견하였다. 이것은전체유전체의많은부분이서로비슷한구간이존재함을의미한다. 예를들어서염색체 3번의뒷부분은염색체2번의뒷부분과거의같은염기서열을갖고있다(α11). 그리고염색체 2번의중간부위는염색체 1번의첫부분과거꾸로된매우유사한부위를갖고있다(α2). 이렇게중복이일어난모든부위(노란박스)를모두합쳐보면전체유전체부위의약60~70%에해당하는부위가중복이되었음을알수있다. 이것은Arabidopsis의진화의역사에서한지점에서 whole genome duplication (WGD; 전체유전체중복현상)이일어나고이후부분적으로치환, 결실, 중복, 전이등이일어났음을암시한다. 노란박스로표시된중복부위(α1~α27)을일렬로배열한후다시자신에대한자신의 dot plot을해보면다시이들내에서의중복구간을인식할수있다(β로표시된노란 box). 이들 β box의염기서열들을다시일렬로배열한후 dot plot을하면또다시중복부위를나타내는 box들을인식할수있었다. 이논문에서는전체유전체의단순한 dot plot에의해피지식물이나타난이후 Arabidopsis까지의진화역사동안에적어도 3회이상의WGD 현상이일어났음을보여주고있다.

Page 11: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

• 두개의 DNA 서열을정렬한 3가지결과• A match• A mismatch• A gap + A insertionATACGGA ATACGGA ATACGGAATACGGA GTACGGA A_ACGGA

두 DNA 서열의정렬

Page 12: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

DOROTHYCROWFOOTHODKINDOROTHY-------HODKIN

GAP or indel

Page 13: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

• Optimal alignment를찾아내기는매우힘듬.

AGAT_G AGATG A_TACG __ATACG

• 무엇이 optimal alignment일까? • 또다른 alignment들이있을까?

Page 14: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Simple alignment• Match score and penalty

Match score = 1Mismatch score = 0 이라고가정할때위의정렬서열들의 score 들은?

이방법과마찬가지로어떤 “조건”을주어각각다르게이루어진정렬들을 “평가” 할수있다.

Page 15: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

• 여러종류의정렬서열들이있다면어떤것이가장좋은것(optimal)인지판단할기준이필요함.

• Let’s say if seq1 = ‘_’ or seq2 = ‘_’ : gap penalty• If no gaps and seq1 = seq2 : match score• If no gaps and seq1 not equal seq2 : mismatch score

• 만약Match score = 1Mismatch score = 0Gap penalty = -1 이라고가정하면…

좀더복잡하게하여…

Page 16: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Gaps in alignment

Simple gap penalty:Assuming match score = 1, mismatch score = 0, gap penalty = -1라고가정할때위의정렬서열들의 score 들은?

Page 17: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

• 위의가정으로두 sequence를비교해보자.• AATCTATA 와 AAGATAAATCTATA AATCTATA AATCTATAAAG-AT-A AA-G-ATA AA--GATA

Page 18: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

• Score를보면첫번째 alignment는 optimal 한것이아닌것으로보임.

• 두번째세번째 alignment와같이 score 가같은경우에는어떤것을선택해야하나????

• 우리는하나의 nucleotide의 insertion이나deletion이여러개의 insertion이나deletion이일어날확률보다더높다는것을알고있음.

• Origination penalty – gap의시작• Length penalty – gap이이어지는것

Page 19: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Origination of gaps• Insertion vs. deletion (indel) events

= one step event in evolution• Origination penalty (open gap penalty) higher penalty value

• Length penalty (gap extension penalty) smaller penalty value

Gap origination penalty= -2, length penalty= -1, Match score= 1, mismatch score= 0

Page 20: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

• origination penalty를만들기보다는 length penalty를더많이만드는것이더좋다고생각됨.

• 예를들어 origination penalty = -2Length penalty = -1 라가정하면

AATCTATA AATCTATA AATCTATAAAG-AT-A AA-G-ATA AA--GATA

Page 21: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Scoring matrices – nucleotide(정렬점수표)

•진화에있어서보존적인치환이일어날가능성이더많다. •일치와불일치의 score를다르게해줄수도있고, •모든불일치가다같은 penalty를갖는것이아니라정도를달리해주는경우도있음 weighted score

transitiontransversion

Page 22: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Amino acid scoring matrix를만드는두가지방법

1) 아미노산들사이의화학적,물리적유사도에기반하여가중치를줌.너무나다양한물리적, 화학적유전적 factor들이존재하기때문에객관적인표현이매우힘듬

2) 실제관찰에의해아미노산간의치환빈도를조사하여경험적인값으로적용

Scoring matrices – amino acid residues

Page 23: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Scoring matrices – amino acid residues PAM (point accepted mutation) matrix –매우유사한단백질서열들을정렬했을때의관찰된치환을계산한값들(상대적 mutation 빈도)로만듬

PAM unit – 100개의 residue당하나의치환이일어나기위해소요되는시간

작은수치의 PAM matrix는보다가까운 protein들의비교에이용됨.

일반적으로 PAM250 을이용 BLOSUM matrix – gap이없는유사한 protein sequence들에대하여통계학적인 clustering method를적용후, 군집간의치환율을계산하여만듬

높은수치의 BLOSUM matrix가가까운 protein 서열들의비교에이용됨.

일반적으로 BLOSUM 62을이용.약 62%의 similarity를갖는 protein sequence들의데이터로만들어낸 matrix

Page 24: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Similarity: Physico-Chemical Properties of Amino Acids

Page 25: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

BLOSUM62A 4R -1 5 N -2 0 6D -2 -2 1 6C 0 -3 -3 -3 9Q -1 1 0 0 -3 5E -1 0 0 2 -4 2 5G 0 -2 0 -1 -3 -2 -2 6H -2 0 1 -1 -3 0 0 -2 8I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1

A R N D C Q E G H I L K M F P S T W Y V X

BLOSUM62

Common amino acids have low weights

Rare amino acids have high weights

Positive for chemically similar substitution

Page 26: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

PAM-1 (Point Accepted Mutation) matrix 만들기1. Multiple sequence alignment를만듬.

ACGCTAFKIGCGCTAFKIACGCTAFKLGCGCTGFKIGCGCTLFKIASGCTAFKLACACTAFKL

2. Matrix로부터 phylogenetic tree를만듬.(계통수를만드는과정은 chap 4, 5에…)

이렇게진화한것을알수있음

3. 각각의 amino acid type에대하여다른 amino acid로바뀐빈도(substitution Frequency; Fij)를조사. 여기서는변화의방향성이없는것으로간주함. 즉AG와 GA는같은빈도임.

FG,A는 AG GA 모두 count.So, FG,A=3

ACGCTAFKI

GCGCTAFKI ACGCTAFKL

GCGCTGFKI GCGCTLFKI ASGCTAFKL ACACTAFKL

4. 각각의 amino acid에대하여 relative mutability (mi) 를모두구함. A 에관계된 mutation = 4전체 mutation의수 = 6-양방향 mutation이니 2를곱합- Frequency of occurrence (mj): 전체

matrix에서 amino acid의수는 63개이고A는 10개 10/63=0.159.

- scaling factor =100 즉 100을곱함(PAM1는 100 residue당 1개의변화를나타냄)

mA = [4 / (6X2)] X0.159 X 100 = 0.0209

5. Mutation probability (Mij) 계산Mij = MG,A = (0.0209 X 3) / 4 = 0.0156

6. 실제 PAM matrix에들어갈수치(Rij)는RG,A = log(Mji / mj)

= log(0.0156 / 0.159) ≒ -1.01

7. 같은 amino acid간의변화는Mjj=1-mj 를계산하여이를 6번에대입하여Rjj를구함즉 Rjj = log(Mjj / mj)

= log((1-MA,A) / mA) ※MA,A =(0.0209X2) / 3

= log((1-0.0139) / 0.0209) ≒ 1.67

Page 27: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k
Page 28: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

• A matrix is a grid used to analyze the optimal alignment of two DNA fragments

Page 29: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

• A matrix is a grid used to analyze the optimal alignment of two DNA fragments

Page 30: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Algorithms for searching the best alignment• 모든경우를다따지는것은매우힘든일임.• Dynamic programming –최종결론을얻기위하여하나의문제를작은문제들로나누어부분해결을한것을이어붙이는방법

• 100nt와 95nt의두 sequence를정렬하려고한다면

모든가능한 alignment들은approximately 55 million!

• 이러한문제를해결하기위해 sequence를작은부분들로나눔.

Page 31: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

Needleman and WunschAlgorithm

CACGACGA

-CACGACGA

CACGA-CGA

1번 position의정렬순서가결정되었으므로이를제외한나머지만갖고정렬가능.이러한방식으로큰문제를작게나누어해결함.

Page 32: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

• Matrix를만들고첫번째줄과열에 0, -1, -2, -3…의숫자를써넣음 (gap penalty의의미)

• (2,2) 위치에대하여1) 수직축에위치한서열에공백감점(-1)더한값2)수평축에위치한서열에공백감점(-1)더한값3)(2,2)에위치한염기들을정렬하였을때일치가점(1) 또는불일치감점(0)과 (1,1)의값을더한값

1), 2), 3)을비교하여가장큰수를선택• 마찬가지로계산하여전체 matrix를채움

Page 33: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k
Page 34: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k
Page 35: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k

• Matrix를완성한후가장아래좌측으로부터• 이점수를만든것은위인가좌인가상좌인가? • 이점수를만든위치로화살표를만듬.• 수직화실표는위의서열에 gap이있는것임.• 수평화실표는좌의서열에 gap이있는것임. • Final alignmentAC--TCGACAGTAG

Page 36: Chap. 2. Data Searches and Pairwise Alignmentamborella.net/2014-Bioinformatics/Week06-Chap2.pdf · 2014. 10. 16. · 2 0 k b 4 0 6 k b 0 p k b 8 0 k b 1 0 k b 1 2 0 k b 1 4 0 b k