26
Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical for many data sets. Need to utilize a way to search for the best tree without evaluating all possible trees. Tree bisection and reconnection

Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Embed Size (px)

Citation preview

Page 1: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Parsimony is Computationally Intensive

• The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical for many data sets.

• Need to utilize a way to search for the best tree without evaluating all possible trees.

• Tree bisection and reconnection

Page 2: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

What if there is a large amount of homoplasy in the data?

• Sequence data may have multiple, “hidden” substitutions.

• Use a model of evolution to correct for different rates of substitutions or unequal base frequencies or other

parameters.

• Maximum-likelihood phylogenetic analysis L = P (DT, M)

Plot of base pairdifferences betweenpairs of mammalianspecies for a representative gene.

CA

A

Seq 1 AGCGAGSeq 2 GCGGAC

Seq 1 C A Seq 2 C T C A

Page 3: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Example: Model of sequence evolution

• Simplest Model = Jukes-Cantor - Assumes all substitutions are equally likely (

A G

C T

K = - 34

ln 43 1 -( )p = 0.27

Total expected = 0.27 x 30 = 8.24

Example: What is the total number of substitutions?

AGGTCG CATTGC CCCGAT CTCTTG ATCGGG

AGATCG CAACGC CCGGAC TTCTTA ATCGGG

total observed = 7 ; p = 7/30 = 0.23

Expected Difference

ObservedDifference

Seq

uen

ce D

iffe

ren

ce

Time

Correction

Page 4: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Phylogenetic Inference Using Maximum Likelihood

• Model of sequence evolution and the estimation of its parameters allows the placement of probabilities on different types of substitutional change.

• Likelihood analysis focuses on the data, not the tree. It is the Probability of the Data given a Tree and a Model of evolution

Seq 1 ATATCSeq 2 CTAGC

The Likelihood (i.e. the probability of observing the data) is a sum over all possible assignments of nucleotides to

the internal nodes

L = P (DT, M)

Page 5: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

• Calculate the Likelihood for each base position in the sequence and summarize across all base positions.

• The ML tree is the tree that produces the highest likelihood.

• Evaluates the branching structure of the tree, and also the branch length, using similar tree-searching strategies as used in parsimony analysis.– This is important, because by using a model-based approach,

mutational change is more probable along longer branches than on shorter branches.

• Can be extremely computationally intensive.

Phylogenetic Inference Using Maximum Likelihood

Page 6: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Phylogenetic Inference Using Maximum Likelihood

• Important point about ML: The model you choose to use can have a large impact on the resulting ML tree.

• If you flip a coin and get a head, what is its likelihood?

– If it’s a 2 sided and fair coin (your model), the likelihood is 0.5

– If it’s a two-headed coin (your model), the likelihood is 1.0

Page 7: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Assessing the Robustness Of Trees

• We can use a number of methods to assess the robustness of particular branches in our trees– Bootstrapping (Jacknifing, Decay-Index)

•Bootstrapping:

• Multiple new data sets are made by resampling from the

original data set.

–Bootstrapping: Sampling done with replacement

• The resampled data sets are subjected to phylogenetic analysis.

• The proportion of times a clade appears in the trees across all

replicate data sets is called its bootstrap proportion.

Page 8: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Taken from Baldauf, S. L. Phylogeny for the faint of heart: a tutorial. Trends in Genetics 19:345-351.

Page 9: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Bootstrapping

• Clades that receive a high bootstrap are considered to be more supported by the data than clades with a lower bootstrap.– 70% or greater is good, but many phylogeneticists will only consider

branches with ≥90% as being strongly supported.

Bootstrap

• Can perform with any type of phylogenetic analysis: parsimony, ML, distance-based

• Important to emphasize that a bootstrap does not reveal the probability that a particular clade is true, but only how well it is supported by the particular dataset.

Page 10: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Molecular Clocks

• The mutation rate for some genes may be relatively constant across species.

• This idea is based on neutral theory (this will be introduced later in the course) - nucleotide or amino acid substitutions occur at a rate equal to the mutation rate.

• Generally in applying a molecular clock, you assume that the mutation rate for a gene does not differ among species.

Page 11: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Molecular Clocks

Outgroup

Species 1

Species 2

Species 3

Species 4

1) Construct A Tree

Fossil for Species 4~1 MY

Outgroup

Species 1

Species 2

Species 3

Species 4

You know that the most recent possible divergence between 3 and 4 is

at least 1 MY

Species 3

Species 4

2) Date a Node in the Tree

}2% Sequence Divergence

3) Calculate Divergence 4) Calculate a Rate

R= 2%/1MY

Page 12: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Molecular Clocks

Outgroup

Species 1

Species 2

Species 3

Species 4

5) Apply Rate to Other Nodes in Tree

1MY

2MY

5MY

• Best applied when dates available for multiple nodes.

• Can utilize solid geological information as well as fossil information.

• Must be aware of possible non-clock behavior of genes.

Page 13: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Phylogeny of North American Black BassesNear et al., 2003. Evolution 57:1610–1621.

Previous hypothesis that speciation within the genus Micropterus occurred during the Pleistocene.

Micropterus has a very good fossil record.

Calibration of a molecular clock and calculation of divergence

times among species reveals that most species diverged well

before the Pleistocene

Page 14: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Species Delimitation in Rapidly Radiating Systems

• Accumulation of species diversity over short periods of time.• Adaptive radiations

• Often of very recent origin

• Difficult to resolve monophyletic species-level lineages.

Salzburger, W. and A. Meyer. 2004. Naturwissenschaften 91:277-290.

Page 15: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Species Delimitation in Rapidly Radiating Systems

Lineage sorting and the retention of ancestral alleles or allelic lineages(Species trees vs gene trees)

Page 16: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Species Delimitation in Rapidly Radiating Systems

Lineage sorting and the retention of ancestral alleles or allelic lineages

Darwin’s Finches East African Cichlid Fish

Takahashi et al. 2001. Mol. Biol. Evol. 18:2057-2066.

Moran and Kornfield. 1993. Mol. Biol. Evol. 10:1015-1029.

Sato et al. 1999. PNAS. 96:5101-5106.

Page 17: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Species Delimitation in Rapidly Radiating Systems

Limited reproductive isolation leads to hybridization and introgression

Page 18: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Gerald and Buff Corsi © California Academy of Sciences

Shaffer & McKnight 1996 Evolution 50:417-433

A. californiense

Ambystoma tigrinum species complex

Page 19: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

An early study found that A. ordinarium was not amonophyletic group.

Page 20: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Indeed, more data shows extensive mtDNA non-monophyly with respect to A. ordinarium.

Page 21: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical
Page 22: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical
Page 23: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

• 4 genes yield A. ordinarium monophyly. • 3 genes yield A. ordinarium paraphyly. (2 are nearly monophylyetic.)• 1 gene yields A. ordinarium polyphyly.

Nuclear Genes Summary

• Nuclear data strongly suggests that A. ordinarium is a monophyletic lineage.

Page 24: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

MtDNA Polyphyly

• mtDNA genealogy offers a strong contrast to the nuclear gene trees.

• MtDNA should achieve monophyly faster than nuclear loci.

• What explains this discrepancy?

Page 25: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Short Internal Branches Phylogenetic Discordance Among Loci

Poe, S., and A. L. Chubb. 2004. Syst. Biol. 58:404-415.

Signatures of Rapid Lineage Diversification

Page 26: Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical

Shared and minimally divergent mtDNA haplotypes strongly indicate recent hybrid introgression.

A. dumerilii