prezentare filogenie

Embed Size (px)

Citation preview

  • 8/6/2019 prezentare filogenie

    1/27

    Phylogenetics

    Todd Scheetz

    March 23, 2004

  • 8/6/2019 prezentare filogenie

    2/27

    Introduction

    Common Terms

    General Processes

    Types of phylogenetic analyses

    PHYLIP and PAUP

  • 8/6/2019 prezentare filogenie

    3/27

    Common TermsPhylogenetics

    assessment of the evolutionary relationship betweenspecies, typically utilizing the sequence of a commonmolecule.

    DendogramTree-based diagram of phylogenetic structure.

    CladeA group of organisms whose members share homologousfeatures derived from a common ancestor.

    Taxon (pl. taxa)A category of group such as a phylum, order, species, etc.

  • 8/6/2019 prezentare filogenie

    4/27

    IntroductionThe use of phylogenetics is an attempt to determine how thesequences might have been derived during evolution.

    Can be done with both nucleotide or amino acid sequences.

    Branching within the tree indicates the apparent relationship between two sequences. Very similar sequences should be nextto each other in the tree.

    human

    mouse

    fly

  • 8/6/2019 prezentare filogenie

    5/27

    Trees

    Rooted vs. unrooted trees

    Example

    GA ATC GA GTT

    GA (A/G) T(C/T)

  • 8/6/2019 prezentare filogenie

    6/27

  • 8/6/2019 prezentare filogenie

    7/27

    General ProcessThe basic process of phylogenetic analysis is

    1. Alignment

    2. Determining the substitution model

    3. Tree building

    4. Tree evaluation

  • 8/6/2019 prezentare filogenie

    8/27

    What sequence to use?Before performing the analysis, there is a more fundamentalissue to be addressed

    What sequences to use?

    Guidelines1. Universally present in all organisms to be studied, with

    good conservation of sequence amongst many of thespecies.

    2. Divergent enough to allow grouping the species into ataxonomic classification.

  • 8/6/2019 prezentare filogenie

    9/27

    Types of SequencesHomologs - Sequences that have a common origin

    Orthologs - Homologs derived from speciation

    Paralogs - Homologs that derived from a common ancestralgene that underwent duplication and subsequentdivergence.

    Xenologs - Homologs resulting from horizontal transfer acrossspecies.

  • 8/6/2019 prezentare filogenie

    10/27

    AlignmentThe first step in performing a phylogenetic analysis is to alignthe sequences. Each column within a multiple sequencealignment is referred to as a site.

    Because the sites themselves are effectively assumed to behomologous (share a common ancestor), they represent a priori

    phylogenetic conclusions.

    Two major steps selection of alignment procedure extracting the phylogenetic data set from the alignment

    ARABI GCGCCC ---CAAGCCTTCT-GGCCG---- AGGGCACGTCT

    LYCOP GCGCCC ---GAAGCCATTT-GGCCG---- AGGGC......

    Taxus GGCCCG ---GAG-C---TC-GGCCG---- AGGGC......

    HETER GGCCCC TTT--GGT-ATT----CCGA--- AGG-C...C..

    VOLVA GGCCTC TTT--GGCCATT----CCGA--- AGAGC.T.C..

  • 8/6/2019 prezentare filogenie

    11/27

    Alignment ProcedureComputer dependence

    unrealistic to do the alignment by handPhylogenetic criteria

    does the alignment proceed based upon a tree?EX. clustalw utilizes neighbor joining during sequencealignment

    Alignment parameter estimationshould vary dynamically depending on evolutionary distance

    Aligned featuressecondary structure -- requires manual intervention

    Mathematical optimizationsome programs optimize according to a statistical model, butthis may have unknown effects on further phylogenetic

    analysis

  • 8/6/2019 prezentare filogenie

    12/27

    Alignment --

    Extracting Phylogenetic InfoThe difficulty here, as we will see on the following page, isthat of length-variable sequences (or alignments).

    alignment ambiguitiesindels

    1. can remove sites with indels (but lose phylog. signal)2. assign penalty of 0 to indels (but incorr. interp.)3. can treat gap as an additional character 4. treat gap as a new character (but only count firstindel in a series)

    Often necessary to use alignment surgery.

  • 8/6/2019 prezentare filogenie

    13/27

    Alignment Procedure

    ARABI GCGCCC ---CAAGCCTTCT-GGCCG---- AGGGCACGTCT

    LYCOP .....C ---GAAGCCATTT-GGCCG---- A..........

    Taxus .GC..G ---GAG-C---TC-GGCCG---- A..........

    HETER .....C TTT--GGT-ATT----CCGA--- A..-....C..

    VOLVA ....TC TTT--GGCCATT----CCGA--- A.A...T.C..

    ARABI GCGCCC ???CAAGCCTTCT?GGCCG???? AGGGCACGTCT ??????????????

    LYCOP GCGCCC ???GAAGCCATTT?GGCCG???? AGGGCACGTCT ??????????????TAXUS GGCCCG ???GAG-C?-?TC?GGCCG???? AGGGCACGTCT ??????????????

    HETER GCGCCC ??????????????????????? AGG-CACGCCT TTTGGT-ATTCCGA

    VOLVA GCGCTC ??????????????????????? AGAGCATGCCT TTTGGCCATTCCGA

  • 8/6/2019 prezentare filogenie

    14/27

    Substitution modelDNA substitution models

    Jukes-Cantor - independent probability of substitution at all sites

    Kimura - different rates for transitions versus transversionstransition (purine-purine, pyrimidine-pyrimidine, A-G, C-T)transversion (purine-pyrimidine, A-C, A-T, G-C, G-T)

    Maximum Likelihood - allows for variations in nucleotide

    context, and for different rates for transitions versustransversions.

  • 8/6/2019 prezentare filogenie

    15/27

    Substitution modelAmino Acid substitution models

    PAM - uses a PAM001 matrix to create a transition probabilitymatrix between two sequences.

    Kimura - approximates PAM distance asD = - ln (1 - p - 0.2p^2)

    p = fraction of amino acids that differ

    Categories (PHYLIP)1. categories of amino acids2. selectable transition/transversion rates3. selectable genetic codes

  • 8/6/2019 prezentare filogenie

    16/27

    Tree buildingThree fundamental strategies

    maximum parsimony distance-based maximum likelihood

    Maximum parsimony attempts to minimize the number of stepsrequired to generate the observed variations in the sequences.

    Distance-based methods utilize distance metrics to determine

    neighboring sequences.

    Maximum likelihood method searches for the evolutionarymodel (including the tree) that maximizes the likelihood of

    producing the observed data.

  • 8/6/2019 prezentare filogenie

    17/27

    Tree buildingMAXIMUM PARSIMONY

    useful for sequence that are very similar, and for small number of sequences.

    evaluates all possible trees

    only informative sites need to be analyzed,

    to be informative, at least two taxa must have the samecharacter at a site, and must support one tree over another...

  • 8/6/2019 prezentare filogenie

    18/27

    Tree buildingMAXIMUM PARSIMONY

    Taxa Sequence Positions1 2 3 4 5 6 7 8 9

    1 A A G A G T G C A2 A G C C G T G C G3 A G A T A T C C A4 A G A G A T C C G

    1

    2

    3

    4

    1 2

    3 4

    1 2

    4 3

  • 8/6/2019 prezentare filogenie

    19/27

    Tree buildingDISTANCE-BASED

    Fitch-Margolis

    Neighbor Joining

    UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

  • 8/6/2019 prezentare filogenie

    20/27

    Tree buildingB C D E

    A 22 39 39 41B 41 41 43C 18 20

    D 10

    B C DEA 22 39 40B 41 42

    C 19

    B CDEA 22 39.66B 41.66

    d and e branch lengths= 10/2 = 5 D

    E

    5

    5

    DE

    C

    distance from Cto D and E

    = 19/2 = 9.5

    5

    54.5

    9.5

    distance from A to B= 22/2 = 11

    A

    B

    11

    11

  • 8/6/2019 prezentare filogenie

    21/27

    Tree building

    A

    B

    11

    11

    So now we have two composite groups...

    To unify these, calculate the average distance between the groups= dAC + dAD + dAE + dBC + dBD + dBE= 39+39+41+41+41+43/6 = 40.7

    Distance to the Root of the tree is= 40.7/2 = 20.35

    DE

    C

    5

    54.5

    9.5

    A

    B

    DE

    C

    5

    59.5

    4.510.85

    9.35 11

    11

  • 8/6/2019 prezentare filogenie

    22/27

    Tree buildingMAXIMUM LIKELIHOOD

    The likelihood for each individual site within the alignment iscalculated, given a particular tree and the overall observed base

    frequencies.

    The likelihood of the tree is then the product of the likelihoodsat every site.

    The run time is MUCH longer for maximum likelihoodanalyses.

  • 8/6/2019 prezentare filogenie

    23/27

    Tree evaluationThere are two basic strategies for evaluating phylogenetic trees

    1. Bootstrap -the original data set is replicated many times, the replicates

    are created by sampling the original sites randomly (withreplacement).

    2. Jackknife -replicates are created by dropping one or more sites within

    each replicate.

    A third alternative, is to verify that the tree structure you obtainis consistent among the various construction methods.

  • 8/6/2019 prezentare filogenie

    24/27

  • 8/6/2019 prezentare filogenie

    25/27

    PHYLIPBasic Process

    bootseq

    consense

    tree building program

    dnadist

    neighbor or fitch

    dnapars dnaml

    drawtreeor drawgram

  • 8/6/2019 prezentare filogenie

    26/27

  • 8/6/2019 prezentare filogenie

    27/27

    END