52
1 Chapter 5 Character–Based Metho ds of Phylogenetics 暨暨暨暨暨暨暨暨暨暨 暨暨暨 (HUANG, Guan-Shieng) 2004/04/05

Chapter 5 Character–Based Methods of Phylogenetics

  • Upload
    mandel

  • View
    31

  • Download
    3

Embed Size (px)

DESCRIPTION

Chapter 5 Character–Based Methods of Phylogenetics. 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/04/05. 5.1 Parsimony. Mutations are exceedingly rate events. The most unlikely events a model invokes, the less likely the model is to be correct. - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 5 Character–Based Methods of Phylogenetics

1

Chapter 5Character–Based Methods of Phylogenetics

暨南大學資訊工程學系黃光璿 (HUANG, Guan-Shieng)2004/04/05

Page 2: Chapter 5 Character–Based Methods of Phylogenetics

2

5.1 Parsimony

Mutations are exceedingly rate events. The most unlikely events a model

invokes, the less likely the model is to be correct.

The fewest number of mutations to explain a state is the most likely to be correct.

Page 3: Chapter 5 Character–Based Methods of Phylogenetics

3

Ockham's Razor

the philosophic rule that entities should not be multiplied unnecessarily

Page 4: Chapter 5 Character–Based Methods of Phylogenetics

4

Page 5: Chapter 5 Character–Based Methods of Phylogenetics

5

Page 6: Chapter 5 Character–Based Methods of Phylogenetics

6

5.1.1 Informative and Uninformative Sites

Page 7: Chapter 5 Character–Based Methods of Phylogenetics

7

Page 8: Chapter 5 Character–Based Methods of Phylogenetics

8

5.1.1 Informative and Uninformative Sites informative sites

have information to construct a tree uninformative sites

have no information

in the sense of parsimony principle.

Page 9: Chapter 5 Character–Based Methods of Phylogenetics

9

uninformative

Page 10: Chapter 5 Character–Based Methods of Phylogenetics

10

uninformative

Page 11: Chapter 5 Character–Based Methods of Phylogenetics

11

informative

Page 12: Chapter 5 Character–Based Methods of Phylogenetics

12

informative

Page 13: Chapter 5 Character–Based Methods of Phylogenetics

13

A position to be informative must have at least two different nucleotides each of these nucleotides to present at

least twice.

Page 14: Chapter 5 Character–Based Methods of Phylogenetics

14

informative sites synapomorphy: support the internal branches

(true) homoplasy: acquired as a result of parallel evol

ution of convergence (false) 眼睛: humans, flies, mollusks ( 軟體動物 )

Page 15: Chapter 5 Character–Based Methods of Phylogenetics

15

5.1.2 Unweighted Parsimony

Every possible tree is considered individually for each informative site.

The tree with the minimum overall costs are reported.

Page 16: Chapter 5 Character–Based Methods of Phylogenetics

16

Page 17: Chapter 5 Character–Based Methods of Phylogenetics

17

There are several problems: The number of alternative unrooted trees incre

ases dramatically. Calculating the number of substitutions invoke

d by each alternative tree is difficult.

Page 18: Chapter 5 Character–Based Methods of Phylogenetics

18

The second problem can be solved by intersection: if the intersection of the two

sets of its children is not empty union: if it is empty.

The number of unions is the minimum number of substitutions.

For uninformative site, it is the number of different nucleotides minus one.

Page 19: Chapter 5 Character–Based Methods of Phylogenetics

19

/* the uth position in the kth sequence */

Page 20: Chapter 5 Character–Based Methods of Phylogenetics

20

5.1.4 Weighted Parsimony

Not all mutations are equivalent Some sequences (e.g., non-coding seq.) are mo

re prone to indel than others. Functional importance differs from gene to gen

e. Subtle substitution biases usually vary between

genes and between species. Weights (scoring matrices) can be added t

o reflect these differences.

Page 21: Chapter 5 Character–Based Methods of Phylogenetics

21

Page 22: Chapter 5 Character–Based Methods of Phylogenetics

22

Page 23: Chapter 5 Character–Based Methods of Phylogenetics

23

Page 24: Chapter 5 Character–Based Methods of Phylogenetics

24

Page 25: Chapter 5 Character–Based Methods of Phylogenetics

25

Calculating the optimal costs

Page 26: Chapter 5 Character–Based Methods of Phylogenetics

26

Finding the internal nodes

Page 27: Chapter 5 Character–Based Methods of Phylogenetics

27

5.2 Inferred Ancestral Sequences Can be derived while constructing the tree.

No missing link! 如何取樣本 ? It may be bias.

Page 28: Chapter 5 Character–Based Methods of Phylogenetics

28

5.3 Strategies for Faster Searches The number of different phylogenetic tree

grows enormously. 10 sequences 2M for exhaustive search

Page 29: Chapter 5 Character–Based Methods of Phylogenetics

29

5.3.1 Branch and Bound

Provided by Hardy & Penny in 1982.

L: an upper bound (for minimum problem) obtained from random search or by

heuristics (e.g., UPGMA) Incrementally growing a tree. (branch) Prune any branch with cost already

greater than L. (bound)

Page 30: Chapter 5 Character–Based Methods of Phylogenetics

30

Page 31: Chapter 5 Character–Based Methods of Phylogenetics

31

Properties complete search efficient w.r.t. exhaustive search

20 sequences are doable.

Page 32: Chapter 5 Character–Based Methods of Phylogenetics

32

5.3.2 Heuristic Searches

local search Alternative trees are not all independent of

each other. branch swapping (Fig. 5.5)

Properties not complete, may lose the optimal solution fast and efficient local minimal

Page 33: Chapter 5 Character–Based Methods of Phylogenetics

33

Page 34: Chapter 5 Character–Based Methods of Phylogenetics

34

5.4 Consensus Trees

Problem Parsimony approaches may yield more than on

e trees. consensus tree

an agreement or a summary of these trees agree bifurcation not agree multi-furcation

Page 35: Chapter 5 Character–Based Methods of Phylogenetics

35

Page 36: Chapter 5 Character–Based Methods of Phylogenetics

36

5.5 Tree Confidence

How much confidence can be attached to the overall tree and its component parts

How much more likely is one tree to be correct than a particular or randomly chosen alternative tree?

Page 37: Chapter 5 Character–Based Methods of Phylogenetics

37

5.5.1 Bootstrap Tests

1. Randomly choose columns to combine into a new alignment of the same order.

2. Reconstruct the tree for the new sample.3. Repeat (1) (2) for many times.4. Consensus the sampled trees w.r.t. the te

sted one.

Page 38: Chapter 5 Character–Based Methods of Phylogenetics

38

Page 39: Chapter 5 Character–Based Methods of Phylogenetics

39

Page 40: Chapter 5 Character–Based Methods of Phylogenetics

40

Page 41: Chapter 5 Character–Based Methods of Phylogenetics

41

Page 42: Chapter 5 Character–Based Methods of Phylogenetics

42

Caution Test based on fewer than several hundred

iterations are not reliable. Underestimate the confidence level at high

values and overestimate it at low values. Some results may appear to be statistically

significant by chance simply so many groupings are being considered.

Page 43: Chapter 5 Character–Based Methods of Phylogenetics

43

Strategy doing thousands of iterations using a correction method to adjust for estimati

on biases collapsing branches to multi-furcations

What happens if a tree-building algorithm always produces the same tree?

Page 44: Chapter 5 Character–Based Methods of Phylogenetics

44

5.5.2 Parametric Tests (???)

What is the limit of Parsimony Principle? especially for distant sequences the most parsimonious tree v.s. a particular alte

rnative (this can be used to estimate the significance of the built tree)

Page 45: Chapter 5 Character–Based Methods of Phylogenetics

45

H. Kishino & M. Hasegawa (1989) Assume that informative sites within an alignm

ent are both independent and equivalent. D: difference of minimum number of substitutio

ns invoked by two trees

Page 46: Chapter 5 Character–Based Methods of Phylogenetics

46

5.6 Comparison of Phylogenetic Methods 用兩種不同的方法 , 如果建構出相同的樹 , 那麼其正確性就很高 .

Page 47: Chapter 5 Character–Based Methods of Phylogenetics

47

5.7 Molecular Phylogenies

Implications medicine: drug treatment agriculture: disease resistance factors conservation ( 保育 ): 絕種物種之認定

Page 48: Chapter 5 Character–Based Methods of Phylogenetics

48

5.7.1 The Tree of Life

Carl Woese and his colleagues (1970s) 16S rRNA (all organisms possess)

Page 49: Chapter 5 Character–Based Methods of Phylogenetics

49

5.7.2 Human Origins

mtDNA The mean difference between two human popu

lations is about 0.33%. The greatest differences are found in Alfrica, no

t across the different continents! out-of-Africa theory

mtRNA & Y chromosome are consistent with this hypothesis

Page 50: Chapter 5 Character–Based Methods of Phylogenetics

50

They concluded mitochondrial Eve & Y chromosome Adam 200’000 years ago

Page 51: Chapter 5 Character–Based Methods of Phylogenetics

51

Page 52: Chapter 5 Character–Based Methods of Phylogenetics

52

參考資料及圖片出處

1. Fundamental Concepts of BioinformaticsDan E. Krane and Michael L. Raymer, Benjamin/Cummings, 2003.

2. Biological Sequence Analysis – Probabilistic models of proteins and nucleic acidsR. Durbin, S. Eddy, A. Krogh, G. Mitchison, Cambridge University Press, 1998.

3. Biology, by Sylvia S. Mader, 8th edition, McGraw-Hill, 2003.