21
ک ی ت ورما ف ن وا ی ت درسDecember 2013 وف ک ی مار ف خ م مدل و ی ا"ن مها ی م ع ن دا* ام خ ه ن/ ب

درس بیوانفورماتیک December 2013 مدل مخفی مارکوف و تعمیم های آن به نام خدا

Embed Size (px)

Citation preview

Page 1: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

درس بیوانفورماتیک

December 2013

مدل مخفی مارکوف

تعمیم های آنو

به نام خدا

Page 2: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

2Sharif University of Technology

HMM Concept

Markov Chain:

Observable Markov Model:State=weather condition

S3 S4S1S2 S5S2

S3 S4

S2

S1

S5

State Seri:

Obs. Seri:

Observation:

o2 o3 o6o5o4o1

q2 q3 q6q5q4q1

In a regular Markov model, the state is directly visible to the observer, the state transition probabilities are the only parameters.

Page 3: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

3Sharif University of Technology

HMM Concept

Markov Hidden Model:State=Pressure of Atmosphere

S2

S4S3

S1

S4S3 S2 S2S1S2Markov Chain:

State Seri:

Obs. Seri:

Observation:

q2 q3 q6q5q4q1

o3 o4o2o1 o6o5

In a hidden Markov model, the state is not directly visible, but variables influenced by the state are visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states.

Page 4: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

4

S2

S4S3

S1

v3 v4v1v2 v5v2

q2 q3 q6q5q4q1

o2 o3 o6o5o4o1

S4S3 S2 S2S1S2

Sharif University of Technology

HMM Model

NjiSqSqPa itjtij ,1 ],|[ 1

Mk

NjSqtvPkb jtkj

1

1 ],|at [)(

NiSqP ii 1 ],[ 1

),,( BA

Page 5: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

5Sharif University of Technology

HMM Evaluation

Problem 1: Given an observation sequence and a model, compute the probability of the observation sequence

v3 v4v1v2 v5v2

S4S3 S2 S2S1S2

q2 q3 q6q5q4q1

o2 o3 o6o5o4o1

),,( BA

Page 6: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

6

𝑃 (𝑂|𝜆 )=∑𝑖=1

𝑁

𝛼 𝑡(𝑖)𝛽𝑡(𝑖)

Sharif University of Technology

1)(

)()( ),|()(1

11:1

m

obanSqoPm

T

N

ntnmntmtTtt

)()(

)()(),()(

11

11:1

obm

obanSqoPm

mm

N

ntmnmtmttt

HMM Forward & Backward

t-1 t

SmSn

Ot

t t+1

Sm Sn

Ot+1

Forward

Backward

𝑐𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦 : 𝑁 2𝑇

N

mT

N

mmTTT mSqOPOP

11:1:1 )()|,()|(

Page 7: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

7Sharif University of Technology

HMM Decoding / Classification / Inference

Problem 2: Given an observation sequence and a model, compute the optimal state sequence to produce given observations

Viterbi

v3 v4v1v2 v5v2

S4S3 S2 S2S1S2

q2 q3 q6q5q4q1

o2 o3 o6o5o4o1

),,( BA

N

ntmnmtt obanm

11 )()()(

t-1 t

SmSn

Ot

manm

obanm

nmtn

t

tmnmtn

t

})({maxarg)(

)(})({max)(

1

1

Recursion Backtracking

t

m

𝑄∗=𝑎𝑟𝑔𝑀𝑎𝑥𝑄 𝑃 (𝑄∨𝑂 ,𝜆)   =  argMax𝑄

𝑃 (𝑂 ,𝑄∨𝜆)𝑃 (𝑂∨𝜆)

=argMax𝑄 𝑃 (𝑂 ,𝑄∨𝜆)

Page 8: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

8Sharif University of Technology

HMM Learning

v3 v4v1v2 v5v2

S4S3 S2 S2S1S2

q2 q3 q6q5q4q1

o2 o3 o6o5o4o1

),,( BA

Problem 3: Given an observation sequence estimate the parameters of the model: whether knowing the sequence of states or not

N

m

N

ntjtnmt

tjtnmt

mtntt

obman

obman

oSqSqPmn

1 111

11

1

)()()(

)()()(

)|,(),(

N

mtntt mnoSqPn

1

),()|()(

t t+1

SmSn

Ot

t t

t tij n

mna

)(

),(

t t

tt tj n

onkb

)(

)()(

Expectation/Maximization

𝜋 𝑖=𝛾 1(𝑛)

Page 9: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

9Sharif University of Technology

Protein StructureAPPLICATION

Page 10: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

10Sharif University of Technology

Profile HMMHMM Variants

• Constructing a profile HMM

• each consensus column can exist in 3 states• match, insert and delete states• number of states depends upon length of the

alignment

• A typical profile HMM architecture

• squares represent match states• diamonds represent insert states• circles represent delete states• arrows represent transitions

• transition between match states - • transition from match state to insert state - • transition within insert state -• transition from match state to delete state -• transition within delete state -• emission of symbol at a state -

a M Mj j1

a M Ij j

a I Ij j

a D Dj j

a M Dj j

e as ( )

Page 11: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

11Sharif University of Technology

HMM Variants

There exist a large number of HMM variants that modify and extend the basic model to meet the needs of various applications.

• Adding silent states to the model to represent the absence of certain symbols that are expected to be present at specific locations

• Making the states emit two aligned symbols, instead of a single symbol, so that the resulting HMM simultaneously generates two related symbol sequences

• Make the probabilities at certain states dependent on part of the previous emissions to describe more complex symbol correlations.

Page 12: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

12Sharif University of Technology

Example: CpG islandsProfile HMM

Page 13: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

• protein classification

• motif detection

• finding multiple sequence alignments• Scoring a sequence against a profile

HMM• Comparing two profile HMMs

Sharif University of Technology

Concept and ModelProfile HMM

13

• Stochastic methods to model multiple sequence alignments – proteins and DNA sequences

• Potential application domains:• protein families could be modeled as an HMM or a group of HMMs

• constructing a profile HMM• new protein sequences could be aligned with stored models to detect remote

homology• aligning a sequence with a stored profile HMM

• align two or more protein family profile HMMs to detect homology• finding statistical similarities between two profile HMM models

Page 14: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

14Sharif University of Technology

Example: Problem2Profile HMM

Page 15: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

15Sharif University of Technology

Example: Problem1Profile HMM

Page 16: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

16Sharif University of Technology

ApplicationsProfile HMM• Comparing two multiple sequence alignments or sequence profiles, instead of comparing

a single sequence against a multiple alignment or a profile.

• Comparing sequence profiles can be beneficial for detecting remote homologues: For example:

• COACH allows us to compare sequence alignments, by building a profile-HMM from one alignment and aligning the other alignment to the constructed profile-HMM.

• HHsearch generalizes the traditional pairwise sequence alignment algorithm for finding the alignment of two profile-HMMs.

• PRC (profile comparer) provides a tool for scoring and aligning profile-HMMs produced by popular software tools

• model sequences of protein secondary structure symbols: helix (H), strand (E), and coil (C)

•  feature-based profile-HMM was proposed to improve the performance of remote protein homology detection. Instead of emitting amino acids, emissions of these HMMs are based on `features' that capture the biochemical properties of the protein family of interest.

Page 17: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

17Sharif University of Technology

Concept and ModelPair HMM

• The optimal state sequence y* can be found using dynamic programming, by a simple modification of the Viterbi algorithm.

• The computational complexity of the resulting alignment algorithm is only O(LxLz).

Page 18: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

Sharif University of Technology

ApplicationsPair HMM

• finding pairwise alignment of proteins and DNA sequences. In other words, find the optimal sequence alignment, compute the overall alignment probability, and estimate the reliability of the individual alignment regions.

• Many multiple sequence alignment (MSA) algorithms also make use of pair-HMMs. The most widely adopted strategy for constructing a multiple alignment is the progressive alignment approach, where sequences are assembled into one large multiple alignment through consecutive pairwise alignment steps according to a guide tree

• Gene prediction, For example, a method called Pairagon+N-SCAN_EST : pair-HMM is first used to find accurate alignments of cDNA sequences to a given genome, and these alignments are combined with a gene prediction algorithm for accurate genome annotation.

• Compare two DNA sequences and jointly analyze their gene structures.

• Aligning more complex structures, such as trees.

Page 19: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

19Sharif University of Technology

HsMM model

),,,( DBA

d1

S4S3 S2 S2S1S2

1:1 to

d2 d3 d6d5d4

21 :1 tto 32 :1 tto 43 :1 tto 54 :1 tto 65 :1 tto

Page 20: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

20Sharif University of Technology

Concept & ModelCoupling

S3 S4

S2 S1

S3 S4

S2

S1

S5

S2

S1

S5

)(

)(

)(1

)()(1

)2(1

)1(1

)( )|(),,,|(C

d

dt

ct

Cttt

ct SSPSSSSP

Brand, [3]

),,( )()()( ccc BA

S2

S4S3

S1

Page 21: درس بیوانفورماتیک December 2013 مدل  مخفی مارکوف و تعمیم  های آن به نام خدا

21Sharif University of Technology

Ancestors DiagramCHSMM

Brand 1996

Ferguson 1980

Natarajan 2007

),,( ccc BA

),,,( DBA

Baum 1966

),,( BA

),,,( cccc DBA