Lec15 phylogeny (3-24-2016)

Embed Size (px)

Citation preview

Lecture 15: Phylogeny

Lecture 15: Phylogeny

ObjectivesDefine phylogenyWhat is a phylogenetic tree, and what does it tell you?How do you read a tree? What are the different parts?How do you construct a tree?How do you root the tree of life?What are some alternatives SSU rRNA analysis?

Phylogenyinferred evolutionary relationships among species based on similarities and differences in their physical or genetic characteristicPhylogeny may also refer to a phylogenetic tree, the illustration of these relationships

Last time we looked at some wrong trees including Haeckels 3 kingdom tree and Whittakers 5 kingdom tree. Why are these problematic? Subjective & qualitative.PCR and sequencing make it possible to understand how organisms are related in an objective and quantitative way.3

PhylogenyLast time we looked at some wrong trees including Haeckels 3 kingdom tree and Whittakers 5 kingdom tree. Why are these problematic? Subjective & qualitative.PCR and sequencing make it possible to understand how organisms are related in an objective and quantitative way.

What is evolution?

Descent with modification

Definition of evolution = descent with modification from a common ancestorThe term became popular when Charles Darwin used it in his book The Descent of Man. Since the term was used in conjunction with his idea that humans were related ancestrally to primates, the term descent with modification has become synonymous with the human lineage on the tree of life. 6

Descent with modification

individual species split into two or more daughter speciesconcept of vertical inheritancecommon ancestor at basal nodes Evolution only occurs when there is a change in gene frequency within a population over time

7

Descent with modification = evolutionDefinition of evolution = descent with modification from a common ancestorThe term became popular when Charles Darwin used it in his book The Descent of Man. Since the term was used in conjunction with his idea that humans were related ancestrally to primates, the term descent with modification has become synonymous with the human lineage on the tree of life. individual species split into two or more daughter speciesconcept of vertical inheritancecommon ancestor at basal nodes Evolution only occurs when there is a change in gene frequency within a population over time

Reading a tree

A. Elements of a tree: root, nodes, branches and tipsB. Same tree as in A, but rotate 90 degrees, so that evolutionary time progresses from left to right.

To read a tree, you need to know the parts. The tips are the extant organisms whose relationship you are trying to discern. There are nodes connecting the tips, which represent a hypothetical common ancestor between the organisms in that clade. Branch lengths correspond the number of differences in sequences, which are also an expression of time if you assume that the rate of evolution is the same across the tree. This is a mostly good assumption. The separation of tips has no meaning, and the trees branches can rotate freely around the axes. 9

Reading a tree

Reading a treeTo read a tree, you need to know the parts. The tips are the extant organisms whose relationship you are trying to discern. There are nodes connecting the tips, which represent a hypothetical common ancestor between the organisms in that clade. Branch lengths correspond the number of differences in sequences, which are also an expression of time if you assume that the rate of evolution is the same across the tree. This is a mostly good assumption. The separation of tips has no meaning.The trees branches can rotate freely around the axes.

Unequal rates of evolution

Similarity between organisms is not necessarily equal to evolutionary relationship. Which one evolved faster? 3 evolved faster than 2Which is most similar to 2? Why? 2 is more similar to 1 than to 3However, 2 and 3 share a common ancestor B

12

Unequal rates of evolutionSimilarity between organisms is not necessarily equal to evolutionary relationship. Which one evolved faster? 3 evolved faster than 2Which is most similar to 2? Why? 2 is more similar to 1 than to 3However, 2 and 3 share a common ancestor B

Similarity between organisms is not necessarily equal to evolutionary relationship. Which one evolved faster? 3 evolved faster than 2Which is most similar to 2? Why? 2 is more similar to 1 than to 3However, 2 and 3 share a common ancestor B

13

Derived vs Ancestral Trait

A derived trait is one that was NOT present in the common ancestor.Ancestral (or primitive traits) are characters that WERE present in a common ancestor.These terms are relative because it depends which common ancestor you are referring to; every node is the last common ancestor for all descendants of that group.

14

Derived vs Ancestral TraitA derived trait is one that was NOT present in the common ancestor.Ancestral (or primitive traits) are characters that WERE present in a common ancestor.These terms are relative because it depends which common ancestor you are referring to; every node is the last common ancestor for all descendants of that group.

A derived trait is one that was NOT present in the common ancestor.Ancestral (or primitive traits) are characters that WERE present in a common ancestor.These terms are relative because it depends which common ancestor you are referring to; every node is the last common ancestor for all descendants of that group.

15

Phylogenetic groups

Monophyletic - derived from the same common ancestor.Paraphyletic - groups which have evolved from a single ancestral species but which donot contain all the descendants of that ancestor.Polyphyletic - a taxonomic group having origin in several different lines of descent - think of the prefix poly- suggesting many common ancestors

What about prokaryotes?

The term prokaryote defines organisms by what do not have (nuclei)Are nuclei an ancestral trait? NO.Prokaryotes are a not a monophyletic groupNorm Pace says, I believe it is critical to shake loose from the prokaryote/eukaryote concept. It is outdated, a guesswork solution to an articulation of biological diversity and an incorrect model for the course of evolution.

17

What about prokaryotes?The term prokaryote defines organisms by what do not have (nuclei)Are nuclei an ancestral trait? NO.Prokaryotes are not a monophyletic group.Norm Pace says, I believe it is critical to shake loose from the prokaryote/eukaryote concept. It is outdated, a guesswork solution to an articulation of biological diversity and an incorrect model for the course of evolution.

18

Constructing a phylogenetic treeAssume you have chosen which species to analyze(1) Decide which gene to use (SSU rRNA gene)

SSU ribosomal RNA geneShort, only 1500 base pairsInformation-dense because it is a non-coding, structural RNAEssential for life so probably not horizontally transferredMultiple copies per genomeCannot resolve close relationships

We like 16S BUT there are alternatives19

Phylogeny with any geneOther RNAsLSU and ITS (aka rRNA spacer) is more popular for fungi, better fine scale resolutionSequence length is variable, unlike SSUProtein genes or sequencesConcatenated genes collection of ~100 single copy housekeeping genes

Phylogeny with other markersFatty acid methyl ester (FAME) measures membrane fatty acidsSpecific to some monophyletic groupsInclude paraphyletic groupingsCan vary by stress conditions

Constructing a phylogenetic treeAssume you have chosen which species to analyze(1) Decide which gene to use (SSU rRNA gene)

SSU ribosomal RNA geneShort, only 1500 base pairsInformation-dense because it is a non-coding, structural RNAEssential for life so probably not horizontally transferredMultiple copies per genomeCannot resolve close relationships

Constructing a phylogenetic tree(2) Determine the gene sequences(3) Use sequence alignment to identify homologous residues, measure sequence similarity and make a distance matrix

Jukes & Cantor method relates sequence similarity to evolutionary distanceIf all sequences are the same, distance is zeroDistances increase as sequence similarity decreases, which means that one or two bases difference does not change the distance muchThe lowest sequence similarity is about 0.25 because all sequences are about 25% similar by chance; there are 4 bases in the genetic code so the chance that one base will match another is 1 in 423

Jukes & Cantor method relates sequence similarity to evolutionary distanceIf all sequences are the same, distance is zeroDistances increase as sequence similarity decreases, which means that one or two bases difference does not change the distance muchThe lowest sequence similarity is about 0.25 because all sequences are about 25% similar by chance; there are 4 bases in the genetic code so the chance that one base will match another is 1 in 4

Sequence alignment

Constructing a phylogenetic tree(4) Perform phylogenetic analysis, which usually means constructing a treeNeighbor-joining method

How can you tell what the branch lengths are? In other words, you need to place the node uYou know how far apart a & b are from eachotherYou know how far apart a is from something else, say c, so measure b from c and you can estimate where node u should be26

Constructing a phylogenetic tree

How can you tell what the branch lengths are? In other words, you need to place the node uYou know how far apart a & b are from eachotherYou know how far apart a is from something else, say c, so measure b from c and you can estimate where node u should be27

Constructing a phylogenetic treeHow can you tell what the branch lengths are? In other words, you need to place the node uYou know how far apart a & b are from each otherYou know how far apart a is from something else, say c, so measure b from c and you can estimate where node u should be

How can you tell what the branch lengths are? In other words, you need to place the node uYou know how far apart a & b are from eachotherYou know how far apart a is from something else, say c, so measure b from c and you can estimate where node u should be28

Tree Construction Complexitieswhich substitution model to use?GC biasLong-branch attractionTree algorithms besides Neighbor JoiningBootstrapping

Substitution modelsTwo-parameter models only care about whether a substitution is a transition or transversion

Transition purine to purine or pyrimidine to pyrimidineTransversion purine to pyrimidine or vice versaTransitions are much more common than transversions, so these are weighted differently in deciding what distance to assign to a mismatchSix-parameter models consider different types of transitions and transversions, weighting each change differentlyGaps are also tricky for example, adjacent gaps are not unrelated30

Substitution modelsTransition purine to purine or pyrimidine to pyrimidineTransversion purine to pyrimidine or vice versaTransitions are much more common than transversions, so these are weighted differently in deciding what distance to assign to a mismatchSix-parameter models consider different types of transitions and transversions, weighting each change differentlyGaps are also tricky for example, adjacent gaps are not unrelated

GC biasThermophiles tend to prefer GC over ATTo solve, ignore transitions and only base tree on transversions

High GC gram positives = ActinobacteriaLow GC gram positives = Firmicutes32

Long-branch attractionArtificial clustering of long branches together, or of short branchesDifferent rates of evolution for different tipsVery long branches are ususally because of bad sequence or poor alignment

Tree algorithms Neighbor-joining starts with a radial tree and joins neighborsFitch starts with two sequences and adds next closest relativesParsimony makes a bunch of trees and find the one that is the most simple, usually based on the fewest mutationsMaximum likelihood trees are the best & most computationally intensive, based on probabilityBayesian inference starts with random tree structure & random parameters, then iterates until an optimal tree is found

BootstrappingA measure of confidence in your sequence alignmentNumbers are from 0-100, with 100 being perfect confidenceRandom sampling with replacement to create new trees

Rooting the tree of life

How do you root a tree? You need an outgroup. This is an organism that you know is very unrelated. What if you want to root the tree of life? In other words, what if you want to put all the organisms of life in the tree? 36

Rooting the tree of lifeHow do you root a tree? You need an outgroup. This is an organism that you know is very unrelated. What if you want to root the tree of life? In other words, what if you want to put all the organisms of life in the tree?

Sequence homologyHomologous genes have a shared ancestry. Two segments of DNA can have shared ancestry because of either a speciation event (orthologs) or a duplication event (paralogs).

Root the tree of life using paralogsDuplication of Elongation Factors occurred prior to divergence (paralogs), so all species have both EF. One gene tree can be rooted with the other gene, and both trees yield the same relationship and are rooted in the same location.

The genes for the protein synthesis elongation factors Tu (EF-Tu) and G (EF-G) are the products of an ancient gene duplication, which appears to predate the divergence of all extant organismal lineages. Thus, it should be possible to root a universal phylogeny based on either protein using the second protein as an outgroup. This approach was also used with the regulatory and catalytic subunits of the proton ATPases All phylogenetic methods used strongly place the root of the universal tree between two highly distinct groups, the archaeons/eukaryotes and the eubacteria. A combined data set of EF-Tu and EF-G sequences favors placement of the eukaryotes within the Archaea, as the sister group to the Crenarchaeotahttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC38819/

39

Root the tree of life using paralogsThe genes for the protein synthesis elongation factors Tu (EF-Tu) and G (EF-G) are the products of an ancient gene duplication, which appears to predate the divergence of all extant organismal lineages. Thus, it should be possible to root a universal phylogeny based on either protein using the second protein as an outgroup. This approach was also used with the regulatory and catalytic subunits of the proton ATPases All phylogenetic methods used strongly place the root of the universal tree between two highly distinct groups, the archaeons/eukaryotes and the eubacteria. A combined data set of EF-Tu and EF-G sequences favors placement of the eukaryotes within the Archaea, as the sister group to the Crenarchaeotahttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC38819/

The genes for the protein synthesis elongation factors Tu (EF-Tu) and G (EF-G) are the products of an ancient gene duplication, which appears to predate the divergence of all extant organismal lineages. Thus, it should be possible to root a universal phylogeny based on either protein using the second protein as an outgroup. This approach was also used with the regulatory and catalytic subunits of the proton ATPases All phylogenetic methods used strongly place the root of the universal tree between two highly distinct groups, the archaeons/eukaryotes and the eubacteria. A combined data set of EF-Tu and EF-G sequences favors placement of the eukaryotes within the Archaea, as the sister group to the Crenarchaeotahttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC38819/

40

Protein-based models of evolution

database of proteins from 420 modern organisms, looking for structures that were common to all. 5 to 11 per cent were universal-- conserved enough to have originated in LUCALUCA had enzymes to break down and extract energy from nutrients, and some protein-making equipmentLUCA lacked the enzymes for making and reading DNA moleculesKim and Caetano-Anolls BMC Evolutionary Biology 2011

41

Protein-based models of evolutiondatabase of proteins from 420 modern organisms, looking for structures that were common to all. 5 to 11 per cent were universal-- conserved enough to have originated in LUCALUCA had enzymes to break down and extract energy from nutrients, and some protein-making equipmentLUCA lacked the enzymes for making and reading DNA moleculesKim and Caetano-Anolls BMC Evolutionary Biology 2011

database of proteins from 420 modern organisms, looking for structures that were common to all. 5 to 11 per cent were universal-- conserved enough to have originated in LUCALUCA had enzymes to break down and extract energy from nutrients, and some protein-making equipmentLUCA lacked the enzymes for making and reading DNA moleculesKim and Caetano-Anolls BMC Evolutionary Biology 2011

42

The root moves depending on whether you use nucleic acids or protein!

Sequenced-based rooting of the tree of life puts the root within the Bacteria.Traditional, or 'canonical' bacterial rooting of the tree of life is derived from analyses of the sequence of ancient gene paralogs (e.g., ATPases, aaRSs, elongation factors)Proteomic analyses for many proteins puts the root of the tree of life within the Archaea.Archaeal rooting has been observed for phylogenetic analyses of tRNA, 5S, Rnase P...

43

The root moves depending on whether you use nucleic acids or protein!Sequenced-based rooting of the tree of life puts the root within the Bacteria.Traditional, or 'canonical' bacterial rooting of the tree of life is derived from analyses of the sequence of ancient gene paralogs (e.g., ATPases, aaRSs, elongation factors)Proteomic analyses for many proteins puts the root of the tree of life within the Archaea.Archaeal rooting has been observed for phylogenetic analyses of tRNA, 5S, Rnase P

Sequenced-based rooting of the tree of life puts the root within the Bacteria.Traditional, or 'canonical' bacterial rooting of the tree of life is derived from analyses of the sequence of ancient gene paralogs (e.g., ATPases, aaRSs, elongation factors)Proteomic analyses for many proteins puts the root of the tree of life within the Archaea.Archaeal rooting has been observed for phylogenetic analyses of tRNA, 5S, Rnase P...

44

Last universal common ancestor

also known as LUCA

One cannot rely on nucleotide gene sequences because these would have mutated beyond recognitionAmino acid sequences mutate more slowly because neutral mutations leave the amino acid sequence fixedThe tertiary folded structure of a protein is even more strongly conserved than the secondary structure

45

Last universal common ancestorOne cannot rely on nucleotide gene sequences because these would have mutated beyond recognitionAmino acid sequences mutate more slowly because neutral mutations leave the amino acid sequence fixedThe tertiary folded structure of a protein is even more strongly conserved than the secondary structure

ObjectivesDefine phylogenyWhat is a phylogenetic tree, and what does it tell you?How do you read a tree? What are the different parts?How do you construct a tree?How do you root the tree of life?What are some alternatives SSU rRNA analysis?