If you can't read please download the document
Upload
kristen-deangelis
View
225
Download
1
Embed Size (px)
Citation preview
Lecture 15: Phylogeny
Lecture 15: Phylogeny
ObjectivesDefine phylogenyWhat is a phylogenetic tree, and what does it tell you?How do you read a tree? What are the different parts?How do you construct a tree?How do you root the tree of life?What are some alternatives SSU rRNA analysis?
Phylogenyinferred evolutionary relationships among species based on similarities and differences in their physical or genetic characteristicPhylogeny may also refer to a phylogenetic tree, the illustration of these relationships
Last time we looked at some wrong trees including Haeckels 3 kingdom tree and Whittakers 5 kingdom tree. Why are these problematic? Subjective & qualitative.PCR and sequencing make it possible to understand how organisms are related in an objective and quantitative way.3
PhylogenyLast time we looked at some wrong trees including Haeckels 3 kingdom tree and Whittakers 5 kingdom tree. Why are these problematic? Subjective & qualitative.PCR and sequencing make it possible to understand how organisms are related in an objective and quantitative way.
What is evolution?
Descent with modification
Definition of evolution = descent with modification from a common ancestorThe term became popular when Charles Darwin used it in his book The Descent of Man. Since the term was used in conjunction with his idea that humans were related ancestrally to primates, the term descent with modification has become synonymous with the human lineage on the tree of life. 6
Descent with modification
individual species split into two or more daughter speciesconcept of vertical inheritancecommon ancestor at basal nodes Evolution only occurs when there is a change in gene frequency within a population over time
7
Descent with modification = evolutionDefinition of evolution = descent with modification from a common ancestorThe term became popular when Charles Darwin used it in his book The Descent of Man. Since the term was used in conjunction with his idea that humans were related ancestrally to primates, the term descent with modification has become synonymous with the human lineage on the tree of life. individual species split into two or more daughter speciesconcept of vertical inheritancecommon ancestor at basal nodes Evolution only occurs when there is a change in gene frequency within a population over time
Reading a tree
A. Elements of a tree: root, nodes, branches and tipsB. Same tree as in A, but rotate 90 degrees, so that evolutionary time progresses from left to right.
To read a tree, you need to know the parts. The tips are the extant organisms whose relationship you are trying to discern. There are nodes connecting the tips, which represent a hypothetical common ancestor between the organisms in that clade. Branch lengths correspond the number of differences in sequences, which are also an expression of time if you assume that the rate of evolution is the same across the tree. This is a mostly good assumption. The separation of tips has no meaning, and the trees branches can rotate freely around the axes. 9
Reading a tree
Reading a treeTo read a tree, you need to know the parts. The tips are the extant organisms whose relationship you are trying to discern. There are nodes connecting the tips, which represent a hypothetical common ancestor between the organisms in that clade. Branch lengths correspond the number of differences in sequences, which are also an expression of time if you assume that the rate of evolution is the same across the tree. This is a mostly good assumption. The separation of tips has no meaning.The trees branches can rotate freely around the axes.
Unequal rates of evolution
Similarity between organisms is not necessarily equal to evolutionary relationship. Which one evolved faster? 3 evolved faster than 2Which is most similar to 2? Why? 2 is more similar to 1 than to 3However, 2 and 3 share a common ancestor B
12
Unequal rates of evolutionSimilarity between organisms is not necessarily equal to evolutionary relationship. Which one evolved faster? 3 evolved faster than 2Which is most similar to 2? Why? 2 is more similar to 1 than to 3However, 2 and 3 share a common ancestor B
Similarity between organisms is not necessarily equal to evolutionary relationship. Which one evolved faster? 3 evolved faster than 2Which is most similar to 2? Why? 2 is more similar to 1 than to 3However, 2 and 3 share a common ancestor B
13
Derived vs Ancestral Trait
A derived trait is one that was NOT present in the common ancestor.Ancestral (or primitive traits) are characters that WERE present in a common ancestor.These terms are relative because it depends which common ancestor you are referring to; every node is the last common ancestor for all descendants of that group.
14
Derived vs Ancestral TraitA derived trait is one that was NOT present in the common ancestor.Ancestral (or primitive traits) are characters that WERE present in a common ancestor.These terms are relative because it depends which common ancestor you are referring to; every node is the last common ancestor for all descendants of that group.
A derived trait is one that was NOT present in the common ancestor.Ancestral (or primitive traits) are characters that WERE present in a common ancestor.These terms are relative because it depends which common ancestor you are referring to; every node is the last common ancestor for all descendants of that group.
15
Phylogenetic groups
Monophyletic - derived from the same common ancestor.Paraphyletic - groups which have evolved from a single ancestral species but which donot contain all the descendants of that ancestor.Polyphyletic - a taxonomic group having origin in several different lines of descent - think of the prefix poly- suggesting many common ancestors
What about prokaryotes?
The term prokaryote defines organisms by what do not have (nuclei)Are nuclei an ancestral trait? NO.Prokaryotes are a not a monophyletic groupNorm Pace says, I believe it is critical to shake loose from the prokaryote/eukaryote concept. It is outdated, a guesswork solution to an articulation of biological diversity and an incorrect model for the course of evolution.
17
What about prokaryotes?The term prokaryote defines organisms by what do not have (nuclei)Are nuclei an ancestral trait? NO.Prokaryotes are not a monophyletic group.Norm Pace says, I believe it is critical to shake loose from the prokaryote/eukaryote concept. It is outdated, a guesswork solution to an articulation of biological diversity and an incorrect model for the course of evolution.
18
Constructing a phylogenetic treeAssume you have chosen which species to analyze(1) Decide which gene to use (SSU rRNA gene)
SSU ribosomal RNA geneShort, only 1500 base pairsInformation-dense because it is a non-coding, structural RNAEssential for life so probably not horizontally transferredMultiple copies per genomeCannot resolve close relationships
We like 16S BUT there are alternatives19
Phylogeny with any geneOther RNAsLSU and ITS (aka rRNA spacer) is more popular for fungi, better fine scale resolutionSequence length is variable, unlike SSUProtein genes or sequencesConcatenated genes collection of ~100 single copy housekeeping genes
Phylogeny with other markersFatty acid methyl ester (FAME) measures membrane fatty acidsSpecific to some monophyletic groupsInclude paraphyletic groupingsCan vary by stress conditions
Constructing a phylogenetic treeAssume you have chosen which species to analyze(1) Decide which gene to use (SSU rRNA gene)
SSU ribosomal RNA geneShort, only 1500 base pairsInformation-dense because it is a non-coding, structural RNAEssential for life so probably not horizontally transferredMultiple copies per genomeCannot resolve close relationships
Constructing a phylogenetic tree(2) Determine the gene sequences(3) Use sequence alignment to identify homologous residues, measure sequence similarity and make a distance matrix
Jukes & Cantor method relates sequence similarity to evolutionary distanceIf all sequences are the same, distance is zeroDistances increase as sequence similarity decreases, which means that one or two bases difference does not change the distance muchThe lowest sequence similarity is about 0.25 because all sequences are about 25% similar by chance; there are 4 bases in the genetic code so the chance that one base will match another is 1 in 423
Jukes & Cantor method relates sequence similarity to evolutionary distanceIf all sequences are the same, distance is zeroDistances increase as sequence similarity decreases, which means that one or two bases difference does not change the distance muchThe lowest sequence similarity is about 0.25 because all sequences are about 25% similar by chance; there are 4 bases in the genetic code so the chance that one base will match another is 1 in 4
Sequence alignment
Constructing a phylogenetic tree(4) Perform phylogenetic analysis, which usually means constructing a treeNeighbor-joining method
How can you tell what the branch lengths are? In other words, you need to place the node uYou know how far apart a & b are from eachotherYou know how far apart a is from something else, say c, so measure b from c and you can estimate where node u should be26
Constructing a phylogenetic tree
How can you tell what the branch lengths are? In other words, you need to place the node uYou know how far apart a & b are from eachotherYou know how far apart a is from something else, say c, so measure b from c and you can estimate where node u should be27
Constructing a phylogenetic treeHow can you tell what the branch lengths are? In other words, you need to place the node uYou know how far apart a & b are from each otherYou know how far apart a is from something else, say c, so measure b from c and you can estimate where node u should be
How can you tell what the branch lengths are? In other words, you need to place the node uYou know how far apart a & b are from eachotherYou know how far apart a is from something else, say c, so measure b from c and you can estimate where node u should be28
Tree Construction Complexitieswhich substitution model to use?GC biasLong-branch attractionTree algorithms besides Neighbor JoiningBootstrapping
Substitution modelsTwo-parameter models only care about whether a substitution is a transition or transversion
Transition purine to purine or pyrimidine to pyrimidineTransversion purine to pyrimidine or vice versaTransitions are much more common than transversions, so these are weighted differently in deciding what distance to assign to a mismatchSix-parameter models consider different types of transitions and transversions, weighting each change differentlyGaps are also tricky for example, adjacent gaps are not unrelated30
Substitution modelsTransition purine to purine or pyrimidine to pyrimidineTransversion purine to pyrimidine or vice versaTransitions are much more common than transversions, so these are weighted differently in deciding what distance to assign to a mismatchSix-parameter models consider different types of transitions and transversions, weighting each change differentlyGaps are also tricky for example, adjacent gaps are not unrelated
GC biasThermophiles tend to prefer GC over ATTo solve, ignore transitions and only base tree on transversions
High GC gram positives = ActinobacteriaLow GC gram positives = Firmicutes32
Long-branch attractionArtificial clustering of long branches together, or of short branchesDifferent rates of evolution for different tipsVery long branches are ususally because of bad sequence or poor alignment
Tree algorithms Neighbor-joining starts with a radial tree and joins neighborsFitch starts with two sequences and adds next closest relativesParsimony makes a bunch of trees and find the one that is the most simple, usually based on the fewest mutationsMaximum likelihood trees are the best & most computationally intensive, based on probabilityBayesian inference starts with random tree structure & random parameters, then iterates until an optimal tree is found
BootstrappingA measure of confidence in your sequence alignmentNumbers are from 0-100, with 100 being perfect confidenceRandom sampling with replacement to create new trees
Rooting the tree of life
How do you root a tree? You need an outgroup. This is an organism that you know is very unrelated. What if you want to root the tree of life? In other words, what if you want to put all the organisms of life in the tree? 36
Rooting the tree of lifeHow do you root a tree? You need an outgroup. This is an organism that you know is very unrelated. What if you want to root the tree of life? In other words, what if you want to put all the organisms of life in the tree?
Sequence homologyHomologous genes have a shared ancestry. Two segments of DNA can have shared ancestry because of either a speciation event (orthologs) or a duplication event (paralogs).
Root the tree of life using paralogsDuplication of Elongation Factors occurred prior to divergence (paralogs), so all species have both EF. One gene tree can be rooted with the other gene, and both trees yield the same relationship and are rooted in the same location.
The genes for the protein synthesis elongation factors Tu (EF-Tu) and G (EF-G) are the products of an ancient gene duplication, which appears to predate the divergence of all extant organismal lineages. Thus, it should be possible to root a universal phylogeny based on either protein using the second protein as an outgroup. This approach was also used with the regulatory and catalytic subunits of the proton ATPases All phylogenetic methods used strongly place the root of the universal tree between two highly distinct groups, the archaeons/eukaryotes and the eubacteria. A combined data set of EF-Tu and EF-G sequences favors placement of the eukaryotes within the Archaea, as the sister group to the Crenarchaeotahttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC38819/
39
Root the tree of life using paralogsThe genes for the protein synthesis elongation factors Tu (EF-Tu) and G (EF-G) are the products of an ancient gene duplication, which appears to predate the divergence of all extant organismal lineages. Thus, it should be possible to root a universal phylogeny based on either protein using the second protein as an outgroup. This approach was also used with the regulatory and catalytic subunits of the proton ATPases All phylogenetic methods used strongly place the root of the universal tree between two highly distinct groups, the archaeons/eukaryotes and the eubacteria. A combined data set of EF-Tu and EF-G sequences favors placement of the eukaryotes within the Archaea, as the sister group to the Crenarchaeotahttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC38819/
The genes for the protein synthesis elongation factors Tu (EF-Tu) and G (EF-G) are the products of an ancient gene duplication, which appears to predate the divergence of all extant organismal lineages. Thus, it should be possible to root a universal phylogeny based on either protein using the second protein as an outgroup. This approach was also used with the regulatory and catalytic subunits of the proton ATPases All phylogenetic methods used strongly place the root of the universal tree between two highly distinct groups, the archaeons/eukaryotes and the eubacteria. A combined data set of EF-Tu and EF-G sequences favors placement of the eukaryotes within the Archaea, as the sister group to the Crenarchaeotahttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC38819/
40
Protein-based models of evolution
database of proteins from 420 modern organisms, looking for structures that were common to all. 5 to 11 per cent were universal-- conserved enough to have originated in LUCALUCA had enzymes to break down and extract energy from nutrients, and some protein-making equipmentLUCA lacked the enzymes for making and reading DNA moleculesKim and Caetano-Anolls BMC Evolutionary Biology 2011
41
Protein-based models of evolutiondatabase of proteins from 420 modern organisms, looking for structures that were common to all. 5 to 11 per cent were universal-- conserved enough to have originated in LUCALUCA had enzymes to break down and extract energy from nutrients, and some protein-making equipmentLUCA lacked the enzymes for making and reading DNA moleculesKim and Caetano-Anolls BMC Evolutionary Biology 2011
database of proteins from 420 modern organisms, looking for structures that were common to all. 5 to 11 per cent were universal-- conserved enough to have originated in LUCALUCA had enzymes to break down and extract energy from nutrients, and some protein-making equipmentLUCA lacked the enzymes for making and reading DNA moleculesKim and Caetano-Anolls BMC Evolutionary Biology 2011
42
The root moves depending on whether you use nucleic acids or protein!
Sequenced-based rooting of the tree of life puts the root within the Bacteria.Traditional, or 'canonical' bacterial rooting of the tree of life is derived from analyses of the sequence of ancient gene paralogs (e.g., ATPases, aaRSs, elongation factors)Proteomic analyses for many proteins puts the root of the tree of life within the Archaea.Archaeal rooting has been observed for phylogenetic analyses of tRNA, 5S, Rnase P...
43
The root moves depending on whether you use nucleic acids or protein!Sequenced-based rooting of the tree of life puts the root within the Bacteria.Traditional, or 'canonical' bacterial rooting of the tree of life is derived from analyses of the sequence of ancient gene paralogs (e.g., ATPases, aaRSs, elongation factors)Proteomic analyses for many proteins puts the root of the tree of life within the Archaea.Archaeal rooting has been observed for phylogenetic analyses of tRNA, 5S, Rnase P
Sequenced-based rooting of the tree of life puts the root within the Bacteria.Traditional, or 'canonical' bacterial rooting of the tree of life is derived from analyses of the sequence of ancient gene paralogs (e.g., ATPases, aaRSs, elongation factors)Proteomic analyses for many proteins puts the root of the tree of life within the Archaea.Archaeal rooting has been observed for phylogenetic analyses of tRNA, 5S, Rnase P...
44
Last universal common ancestor
also known as LUCA
One cannot rely on nucleotide gene sequences because these would have mutated beyond recognitionAmino acid sequences mutate more slowly because neutral mutations leave the amino acid sequence fixedThe tertiary folded structure of a protein is even more strongly conserved than the secondary structure
45
Last universal common ancestorOne cannot rely on nucleotide gene sequences because these would have mutated beyond recognitionAmino acid sequences mutate more slowly because neutral mutations leave the amino acid sequence fixedThe tertiary folded structure of a protein is even more strongly conserved than the secondary structure
ObjectivesDefine phylogenyWhat is a phylogenetic tree, and what does it tell you?How do you read a tree? What are the different parts?How do you construct a tree?How do you root the tree of life?What are some alternatives SSU rRNA analysis?