Upload
mandel
View
31
Download
3
Embed Size (px)
DESCRIPTION
Chapter 5 Character–Based Methods of Phylogenetics. 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/04/05. 5.1 Parsimony. Mutations are exceedingly rate events. The most unlikely events a model invokes, the less likely the model is to be correct. - PowerPoint PPT Presentation
Citation preview
1
Chapter 5Character–Based Methods of Phylogenetics
暨南大學資訊工程學系黃光璿 (HUANG, Guan-Shieng)2004/04/05
2
5.1 Parsimony
Mutations are exceedingly rate events. The most unlikely events a model
invokes, the less likely the model is to be correct.
The fewest number of mutations to explain a state is the most likely to be correct.
3
Ockham's Razor
the philosophic rule that entities should not be multiplied unnecessarily
4
5
6
5.1.1 Informative and Uninformative Sites
7
8
5.1.1 Informative and Uninformative Sites informative sites
have information to construct a tree uninformative sites
have no information
in the sense of parsimony principle.
9
uninformative
10
uninformative
11
informative
12
informative
13
A position to be informative must have at least two different nucleotides each of these nucleotides to present at
least twice.
14
informative sites synapomorphy: support the internal branches
(true) homoplasy: acquired as a result of parallel evol
ution of convergence (false) 眼睛: humans, flies, mollusks ( 軟體動物 )
15
5.1.2 Unweighted Parsimony
Every possible tree is considered individually for each informative site.
The tree with the minimum overall costs are reported.
16
17
There are several problems: The number of alternative unrooted trees incre
ases dramatically. Calculating the number of substitutions invoke
d by each alternative tree is difficult.
18
The second problem can be solved by intersection: if the intersection of the two
sets of its children is not empty union: if it is empty.
The number of unions is the minimum number of substitutions.
For uninformative site, it is the number of different nucleotides minus one.
19
/* the uth position in the kth sequence */
20
5.1.4 Weighted Parsimony
Not all mutations are equivalent Some sequences (e.g., non-coding seq.) are mo
re prone to indel than others. Functional importance differs from gene to gen
e. Subtle substitution biases usually vary between
genes and between species. Weights (scoring matrices) can be added t
o reflect these differences.
21
22
23
24
25
Calculating the optimal costs
26
Finding the internal nodes
27
5.2 Inferred Ancestral Sequences Can be derived while constructing the tree.
No missing link! 如何取樣本 ? It may be bias.
28
5.3 Strategies for Faster Searches The number of different phylogenetic tree
grows enormously. 10 sequences 2M for exhaustive search
29
5.3.1 Branch and Bound
Provided by Hardy & Penny in 1982.
L: an upper bound (for minimum problem) obtained from random search or by
heuristics (e.g., UPGMA) Incrementally growing a tree. (branch) Prune any branch with cost already
greater than L. (bound)
30
31
Properties complete search efficient w.r.t. exhaustive search
20 sequences are doable.
32
5.3.2 Heuristic Searches
local search Alternative trees are not all independent of
each other. branch swapping (Fig. 5.5)
Properties not complete, may lose the optimal solution fast and efficient local minimal
33
34
5.4 Consensus Trees
Problem Parsimony approaches may yield more than on
e trees. consensus tree
an agreement or a summary of these trees agree bifurcation not agree multi-furcation
35
36
5.5 Tree Confidence
How much confidence can be attached to the overall tree and its component parts
How much more likely is one tree to be correct than a particular or randomly chosen alternative tree?
37
5.5.1 Bootstrap Tests
1. Randomly choose columns to combine into a new alignment of the same order.
2. Reconstruct the tree for the new sample.3. Repeat (1) (2) for many times.4. Consensus the sampled trees w.r.t. the te
sted one.
38
39
40
41
42
Caution Test based on fewer than several hundred
iterations are not reliable. Underestimate the confidence level at high
values and overestimate it at low values. Some results may appear to be statistically
significant by chance simply so many groupings are being considered.
43
Strategy doing thousands of iterations using a correction method to adjust for estimati
on biases collapsing branches to multi-furcations
What happens if a tree-building algorithm always produces the same tree?
44
5.5.2 Parametric Tests (???)
What is the limit of Parsimony Principle? especially for distant sequences the most parsimonious tree v.s. a particular alte
rnative (this can be used to estimate the significance of the built tree)
45
H. Kishino & M. Hasegawa (1989) Assume that informative sites within an alignm
ent are both independent and equivalent. D: difference of minimum number of substitutio
ns invoked by two trees
46
5.6 Comparison of Phylogenetic Methods 用兩種不同的方法 , 如果建構出相同的樹 , 那麼其正確性就很高 .
47
5.7 Molecular Phylogenies
Implications medicine: drug treatment agriculture: disease resistance factors conservation ( 保育 ): 絕種物種之認定
48
5.7.1 The Tree of Life
Carl Woese and his colleagues (1970s) 16S rRNA (all organisms possess)
49
5.7.2 Human Origins
mtDNA The mean difference between two human popu
lations is about 0.33%. The greatest differences are found in Alfrica, no
t across the different continents! out-of-Africa theory
mtRNA & Y chromosome are consistent with this hypothesis
50
They concluded mitochondrial Eve & Y chromosome Adam 200’000 years ago
51
52
參考資料及圖片出處
1. Fundamental Concepts of BioinformaticsDan E. Krane and Michael L. Raymer, Benjamin/Cummings, 2003.
2. Biological Sequence Analysis – Probabilistic models of proteins and nucleic acidsR. Durbin, S. Eddy, A. Krogh, G. Mitchison, Cambridge University Press, 1998.
3. Biology, by Sylvia S. Mader, 8th edition, McGraw-Hill, 2003.