Upload
andreeadudu
View
219
Download
0
Embed Size (px)
Citation preview
8/6/2019 prezentare filogenie
1/27
Phylogenetics
Todd Scheetz
March 23, 2004
8/6/2019 prezentare filogenie
2/27
Introduction
Common Terms
General Processes
Types of phylogenetic analyses
PHYLIP and PAUP
8/6/2019 prezentare filogenie
3/27
Common TermsPhylogenetics
assessment of the evolutionary relationship betweenspecies, typically utilizing the sequence of a commonmolecule.
DendogramTree-based diagram of phylogenetic structure.
CladeA group of organisms whose members share homologousfeatures derived from a common ancestor.
Taxon (pl. taxa)A category of group such as a phylum, order, species, etc.
8/6/2019 prezentare filogenie
4/27
IntroductionThe use of phylogenetics is an attempt to determine how thesequences might have been derived during evolution.
Can be done with both nucleotide or amino acid sequences.
Branching within the tree indicates the apparent relationship between two sequences. Very similar sequences should be nextto each other in the tree.
human
mouse
fly
8/6/2019 prezentare filogenie
5/27
Trees
Rooted vs. unrooted trees
Example
GA ATC GA GTT
GA (A/G) T(C/T)
8/6/2019 prezentare filogenie
6/27
8/6/2019 prezentare filogenie
7/27
General ProcessThe basic process of phylogenetic analysis is
1. Alignment
2. Determining the substitution model
3. Tree building
4. Tree evaluation
8/6/2019 prezentare filogenie
8/27
What sequence to use?Before performing the analysis, there is a more fundamentalissue to be addressed
What sequences to use?
Guidelines1. Universally present in all organisms to be studied, with
good conservation of sequence amongst many of thespecies.
2. Divergent enough to allow grouping the species into ataxonomic classification.
8/6/2019 prezentare filogenie
9/27
Types of SequencesHomologs - Sequences that have a common origin
Orthologs - Homologs derived from speciation
Paralogs - Homologs that derived from a common ancestralgene that underwent duplication and subsequentdivergence.
Xenologs - Homologs resulting from horizontal transfer acrossspecies.
8/6/2019 prezentare filogenie
10/27
AlignmentThe first step in performing a phylogenetic analysis is to alignthe sequences. Each column within a multiple sequencealignment is referred to as a site.
Because the sites themselves are effectively assumed to behomologous (share a common ancestor), they represent a priori
phylogenetic conclusions.
Two major steps selection of alignment procedure extracting the phylogenetic data set from the alignment
ARABI GCGCCC ---CAAGCCTTCT-GGCCG---- AGGGCACGTCT
LYCOP GCGCCC ---GAAGCCATTT-GGCCG---- AGGGC......
Taxus GGCCCG ---GAG-C---TC-GGCCG---- AGGGC......
HETER GGCCCC TTT--GGT-ATT----CCGA--- AGG-C...C..
VOLVA GGCCTC TTT--GGCCATT----CCGA--- AGAGC.T.C..
8/6/2019 prezentare filogenie
11/27
Alignment ProcedureComputer dependence
unrealistic to do the alignment by handPhylogenetic criteria
does the alignment proceed based upon a tree?EX. clustalw utilizes neighbor joining during sequencealignment
Alignment parameter estimationshould vary dynamically depending on evolutionary distance
Aligned featuressecondary structure -- requires manual intervention
Mathematical optimizationsome programs optimize according to a statistical model, butthis may have unknown effects on further phylogenetic
analysis
8/6/2019 prezentare filogenie
12/27
Alignment --
Extracting Phylogenetic InfoThe difficulty here, as we will see on the following page, isthat of length-variable sequences (or alignments).
alignment ambiguitiesindels
1. can remove sites with indels (but lose phylog. signal)2. assign penalty of 0 to indels (but incorr. interp.)3. can treat gap as an additional character 4. treat gap as a new character (but only count firstindel in a series)
Often necessary to use alignment surgery.
8/6/2019 prezentare filogenie
13/27
Alignment Procedure
ARABI GCGCCC ---CAAGCCTTCT-GGCCG---- AGGGCACGTCT
LYCOP .....C ---GAAGCCATTT-GGCCG---- A..........
Taxus .GC..G ---GAG-C---TC-GGCCG---- A..........
HETER .....C TTT--GGT-ATT----CCGA--- A..-....C..
VOLVA ....TC TTT--GGCCATT----CCGA--- A.A...T.C..
ARABI GCGCCC ???CAAGCCTTCT?GGCCG???? AGGGCACGTCT ??????????????
LYCOP GCGCCC ???GAAGCCATTT?GGCCG???? AGGGCACGTCT ??????????????TAXUS GGCCCG ???GAG-C?-?TC?GGCCG???? AGGGCACGTCT ??????????????
HETER GCGCCC ??????????????????????? AGG-CACGCCT TTTGGT-ATTCCGA
VOLVA GCGCTC ??????????????????????? AGAGCATGCCT TTTGGCCATTCCGA
8/6/2019 prezentare filogenie
14/27
Substitution modelDNA substitution models
Jukes-Cantor - independent probability of substitution at all sites
Kimura - different rates for transitions versus transversionstransition (purine-purine, pyrimidine-pyrimidine, A-G, C-T)transversion (purine-pyrimidine, A-C, A-T, G-C, G-T)
Maximum Likelihood - allows for variations in nucleotide
context, and for different rates for transitions versustransversions.
8/6/2019 prezentare filogenie
15/27
Substitution modelAmino Acid substitution models
PAM - uses a PAM001 matrix to create a transition probabilitymatrix between two sequences.
Kimura - approximates PAM distance asD = - ln (1 - p - 0.2p^2)
p = fraction of amino acids that differ
Categories (PHYLIP)1. categories of amino acids2. selectable transition/transversion rates3. selectable genetic codes
8/6/2019 prezentare filogenie
16/27
Tree buildingThree fundamental strategies
maximum parsimony distance-based maximum likelihood
Maximum parsimony attempts to minimize the number of stepsrequired to generate the observed variations in the sequences.
Distance-based methods utilize distance metrics to determine
neighboring sequences.
Maximum likelihood method searches for the evolutionarymodel (including the tree) that maximizes the likelihood of
producing the observed data.
8/6/2019 prezentare filogenie
17/27
Tree buildingMAXIMUM PARSIMONY
useful for sequence that are very similar, and for small number of sequences.
evaluates all possible trees
only informative sites need to be analyzed,
to be informative, at least two taxa must have the samecharacter at a site, and must support one tree over another...
8/6/2019 prezentare filogenie
18/27
Tree buildingMAXIMUM PARSIMONY
Taxa Sequence Positions1 2 3 4 5 6 7 8 9
1 A A G A G T G C A2 A G C C G T G C G3 A G A T A T C C A4 A G A G A T C C G
1
2
3
4
1 2
3 4
1 2
4 3
8/6/2019 prezentare filogenie
19/27
Tree buildingDISTANCE-BASED
Fitch-Margolis
Neighbor Joining
UPGMA (Unweighted Pair Group Method with Arithmetic Mean)
8/6/2019 prezentare filogenie
20/27
Tree buildingB C D E
A 22 39 39 41B 41 41 43C 18 20
D 10
B C DEA 22 39 40B 41 42
C 19
B CDEA 22 39.66B 41.66
d and e branch lengths= 10/2 = 5 D
E
5
5
DE
C
distance from Cto D and E
= 19/2 = 9.5
5
54.5
9.5
distance from A to B= 22/2 = 11
A
B
11
11
8/6/2019 prezentare filogenie
21/27
Tree building
A
B
11
11
So now we have two composite groups...
To unify these, calculate the average distance between the groups= dAC + dAD + dAE + dBC + dBD + dBE= 39+39+41+41+41+43/6 = 40.7
Distance to the Root of the tree is= 40.7/2 = 20.35
DE
C
5
54.5
9.5
A
B
DE
C
5
59.5
4.510.85
9.35 11
11
8/6/2019 prezentare filogenie
22/27
Tree buildingMAXIMUM LIKELIHOOD
The likelihood for each individual site within the alignment iscalculated, given a particular tree and the overall observed base
frequencies.
The likelihood of the tree is then the product of the likelihoodsat every site.
The run time is MUCH longer for maximum likelihoodanalyses.
8/6/2019 prezentare filogenie
23/27
Tree evaluationThere are two basic strategies for evaluating phylogenetic trees
1. Bootstrap -the original data set is replicated many times, the replicates
are created by sampling the original sites randomly (withreplacement).
2. Jackknife -replicates are created by dropping one or more sites within
each replicate.
A third alternative, is to verify that the tree structure you obtainis consistent among the various construction methods.
8/6/2019 prezentare filogenie
24/27
8/6/2019 prezentare filogenie
25/27
PHYLIPBasic Process
bootseq
consense
tree building program
dnadist
neighbor or fitch
dnapars dnaml
drawtreeor drawgram
8/6/2019 prezentare filogenie
26/27
8/6/2019 prezentare filogenie
27/27
END