Upload
jemimah-hunt
View
226
Download
6
Embed Size (px)
Citation preview
درس بیوانفورماتیک
December 2013
مدل مخفی مارکوف
تعمیم های آنو
به نام خدا
2Sharif University of Technology
HMM Concept
Markov Chain:
Observable Markov Model:State=weather condition
S3 S4S1S2 S5S2
S3 S4
S2
S1
S5
State Seri:
Obs. Seri:
Observation:
o2 o3 o6o5o4o1
q2 q3 q6q5q4q1
In a regular Markov model, the state is directly visible to the observer, the state transition probabilities are the only parameters.
3Sharif University of Technology
HMM Concept
Markov Hidden Model:State=Pressure of Atmosphere
S2
S4S3
S1
S4S3 S2 S2S1S2Markov Chain:
State Seri:
Obs. Seri:
Observation:
q2 q3 q6q5q4q1
o3 o4o2o1 o6o5
In a hidden Markov model, the state is not directly visible, but variables influenced by the state are visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states.
4
S2
S4S3
S1
v3 v4v1v2 v5v2
q2 q3 q6q5q4q1
o2 o3 o6o5o4o1
S4S3 S2 S2S1S2
Sharif University of Technology
HMM Model
NjiSqSqPa itjtij ,1 ],|[ 1
Mk
NjSqtvPkb jtkj
1
1 ],|at [)(
NiSqP ii 1 ],[ 1
),,( BA
5Sharif University of Technology
HMM Evaluation
Problem 1: Given an observation sequence and a model, compute the probability of the observation sequence
v3 v4v1v2 v5v2
S4S3 S2 S2S1S2
q2 q3 q6q5q4q1
o2 o3 o6o5o4o1
),,( BA
6
𝑃 (𝑂|𝜆 )=∑𝑖=1
𝑁
𝛼 𝑡(𝑖)𝛽𝑡(𝑖)
Sharif University of Technology
1)(
)()( ),|()(1
11:1
m
obanSqoPm
T
N
ntnmntmtTtt
)()(
)()(),()(
11
11:1
obm
obanSqoPm
mm
N
ntmnmtmttt
HMM Forward & Backward
t-1 t
SmSn
Ot
t t+1
Sm Sn
Ot+1
Forward
Backward
𝑐𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦 : 𝑁 2𝑇
N
mT
N
mmTTT mSqOPOP
11:1:1 )()|,()|(
7Sharif University of Technology
HMM Decoding / Classification / Inference
Problem 2: Given an observation sequence and a model, compute the optimal state sequence to produce given observations
Viterbi
v3 v4v1v2 v5v2
S4S3 S2 S2S1S2
q2 q3 q6q5q4q1
o2 o3 o6o5o4o1
),,( BA
N
ntmnmtt obanm
11 )()()(
t-1 t
SmSn
Ot
manm
obanm
nmtn
t
tmnmtn
t
})({maxarg)(
)(})({max)(
1
1
Recursion Backtracking
t
m
𝑄∗=𝑎𝑟𝑔𝑀𝑎𝑥𝑄 𝑃 (𝑄∨𝑂 ,𝜆) = argMax𝑄
𝑃 (𝑂 ,𝑄∨𝜆)𝑃 (𝑂∨𝜆)
=argMax𝑄 𝑃 (𝑂 ,𝑄∨𝜆)
8Sharif University of Technology
HMM Learning
v3 v4v1v2 v5v2
S4S3 S2 S2S1S2
q2 q3 q6q5q4q1
o2 o3 o6o5o4o1
),,( BA
Problem 3: Given an observation sequence estimate the parameters of the model: whether knowing the sequence of states or not
N
m
N
ntjtnmt
tjtnmt
mtntt
obman
obman
oSqSqPmn
1 111
11
1
)()()(
)()()(
)|,(),(
N
mtntt mnoSqPn
1
),()|()(
t t+1
SmSn
Ot
t t
t tij n
mna
)(
),(
t t
tt tj n
onkb
)(
)()(
Expectation/Maximization
𝜋 𝑖=𝛾 1(𝑛)
9Sharif University of Technology
Protein StructureAPPLICATION
10Sharif University of Technology
Profile HMMHMM Variants
• Constructing a profile HMM
• each consensus column can exist in 3 states• match, insert and delete states• number of states depends upon length of the
alignment
• A typical profile HMM architecture
• squares represent match states• diamonds represent insert states• circles represent delete states• arrows represent transitions
• transition between match states - • transition from match state to insert state - • transition within insert state -• transition from match state to delete state -• transition within delete state -• emission of symbol at a state -
a M Mj j1
a M Ij j
a I Ij j
a D Dj j
a M Dj j
e as ( )
11Sharif University of Technology
HMM Variants
There exist a large number of HMM variants that modify and extend the basic model to meet the needs of various applications.
• Adding silent states to the model to represent the absence of certain symbols that are expected to be present at specific locations
• Making the states emit two aligned symbols, instead of a single symbol, so that the resulting HMM simultaneously generates two related symbol sequences
• Make the probabilities at certain states dependent on part of the previous emissions to describe more complex symbol correlations.
12Sharif University of Technology
Example: CpG islandsProfile HMM
• protein classification
• motif detection
• finding multiple sequence alignments• Scoring a sequence against a profile
HMM• Comparing two profile HMMs
Sharif University of Technology
Concept and ModelProfile HMM
13
• Stochastic methods to model multiple sequence alignments – proteins and DNA sequences
• Potential application domains:• protein families could be modeled as an HMM or a group of HMMs
• constructing a profile HMM• new protein sequences could be aligned with stored models to detect remote
homology• aligning a sequence with a stored profile HMM
• align two or more protein family profile HMMs to detect homology• finding statistical similarities between two profile HMM models
14Sharif University of Technology
Example: Problem2Profile HMM
15Sharif University of Technology
Example: Problem1Profile HMM
16Sharif University of Technology
ApplicationsProfile HMM• Comparing two multiple sequence alignments or sequence profiles, instead of comparing
a single sequence against a multiple alignment or a profile.
• Comparing sequence profiles can be beneficial for detecting remote homologues: For example:
• COACH allows us to compare sequence alignments, by building a profile-HMM from one alignment and aligning the other alignment to the constructed profile-HMM.
• HHsearch generalizes the traditional pairwise sequence alignment algorithm for finding the alignment of two profile-HMMs.
• PRC (profile comparer) provides a tool for scoring and aligning profile-HMMs produced by popular software tools
• model sequences of protein secondary structure symbols: helix (H), strand (E), and coil (C)
• feature-based profile-HMM was proposed to improve the performance of remote protein homology detection. Instead of emitting amino acids, emissions of these HMMs are based on `features' that capture the biochemical properties of the protein family of interest.
17Sharif University of Technology
Concept and ModelPair HMM
• The optimal state sequence y* can be found using dynamic programming, by a simple modification of the Viterbi algorithm.
• The computational complexity of the resulting alignment algorithm is only O(LxLz).
Sharif University of Technology
ApplicationsPair HMM
• finding pairwise alignment of proteins and DNA sequences. In other words, find the optimal sequence alignment, compute the overall alignment probability, and estimate the reliability of the individual alignment regions.
• Many multiple sequence alignment (MSA) algorithms also make use of pair-HMMs. The most widely adopted strategy for constructing a multiple alignment is the progressive alignment approach, where sequences are assembled into one large multiple alignment through consecutive pairwise alignment steps according to a guide tree
• Gene prediction, For example, a method called Pairagon+N-SCAN_EST : pair-HMM is first used to find accurate alignments of cDNA sequences to a given genome, and these alignments are combined with a gene prediction algorithm for accurate genome annotation.
• Compare two DNA sequences and jointly analyze their gene structures.
• Aligning more complex structures, such as trees.
19Sharif University of Technology
HsMM model
),,,( DBA
d1
S4S3 S2 S2S1S2
1:1 to
d2 d3 d6d5d4
21 :1 tto 32 :1 tto 43 :1 tto 54 :1 tto 65 :1 tto
20Sharif University of Technology
Concept & ModelCoupling
S3 S4
S2 S1
S3 S4
S2
S1
S5
S2
S1
S5
)(
)(
)(1
)()(1
)2(1
)1(1
)( )|(),,,|(C
d
dt
ct
Cttt
ct SSPSSSSP
Brand, [3]
),,( )()()( ccc BA
S2
S4S3
S1
21Sharif University of Technology
Ancestors DiagramCHSMM
Brand 1996
Ferguson 1980
Natarajan 2007
),,( ccc BA
),,,( DBA
Baum 1966
),,( BA
),,,( cccc DBA