Coalesce Nce Heyer

Embed Size (px)

DESCRIPTION

Coalesce Nce Heyer

Citation preview

  • Thorie de la coalescence

  • Pass

    La coalescence

    Prsent

    Anctre commun tous les gnes (MRCA)Coalescence

  • La coalescence

    La probabilit que deux allles coalescent est de 1/N, la probabilit quils

    viennent de deux gnes diffrents est de 1-1/N.

    La probabilit que 3 allles aient 3 anctres diffrents la gnration prcdente

    est probabilit que lallle 1 et lallle 2 aient deux anctres diffrents multiplie

    par la probabilit que lallle 3 ait un anctre diffrent des 2 autres = (1-1/N)(1-

    2/N).

    La probabilit que k allles aient k anctres distincts la gnration

    prcdente est :

    N2k

    1Ni1)k(P

    1k

    1i

    =

    =

    )!2k(!2!k

    2k

    =

    avec

    2/N).

  • La coalescence

    Probabilit que deux gnes coalescent il y a

    t+1 gnrations

    t 111 Ntt

    eN1

    N11

    N1

    Loi exponentielle pour N petit

    Ce processus suit une loi gomtrique (moyenne 1/p, variance q/p2) avec p=1/N.

    La moyenne est donc de N et la variance de

    22

    N)1N(N)N

    1(N11

    =

  • La coalescence

    On revient lchantillon de k allles. La probabilit de ne pas avoir de

    coalescence pendant t gnrations puis un vnement de coalescence

    est

    2kt kkk

    [ ]t

    N2

    t eN2k

    N2k

    1N2k

    )k(P1)k(P

    =

    )1k(kN2

    La moyenne est alors

    de

  • T2=N

    T3=N/3

    T4=N/6

    T5=N/10

  • La coalescence

    )k11(N2TMRCA =Temps au MRCA

    =

    =

    k

    2iiiTT

    = =

    =

    =

    =

    k

    2i

    k

    2i

    1k

    1 i1N21i

    1N2)1i(iN2iT

    Longueur de larbre

  • Population stationnaire

  • Population en croissance

  • La divergence entre les squences reflte

    leur temps de coalescence

    ATACGTATC

    T2A2A3T3

    ATACCTATC ATACCTATC AAACCTAACATTCGTATGATTAGTATG

    T2A2

    T8A8

    G5C5A3T3

    C9G9

    C4A4

  • Une mesure pratique : la distribution du nombre de

    diffrences entre paires de gnes ( mismatch distribution )

    Pour chaque paire de squences, on compte le nombre de

    chantillon de squences

    ATACCTATC

    ATACCTATC

    AAACCTAAC

    ATTCGTATG

    ATTAGTATG

    Pour chaque paire de squences, on compte le nombre de

    diffrences entre individus.

    On compte le nombre de paires spares par une diffrences,

    deux diffrences

    0%

    10%

    20%

    30%

    0 1 2 3 4 5 6Nombre de diffrences

    F

    r

    q

    u

    e

    n

    c

    e

  • ATACCTATC

    ATACCTATC

    AAACCTAAC

    ATTCGTATG

    ATTAGTATG

    Sample of

    sequences

    Ancestral: ATACGTATC

    Mutation spectrum

    For each mutation we count the number of sequences

    that carry the mutation

    S21, S32, S41, S53, S92 We count the number of mutations that have a given frequency:

    2 mutations of frequency 1

    2 mutations of frequency 2

    1 mutation of frequency 3

  • Populationstationnaire

    Mutation

  • PopulationEn croissance

  • Impact of demographic history on

    population genetics

    Hunter-Gatherer populations

    (source : L. Excoffier)

    Post-Neolithic populations

  • The detection of rapid population growth

    a case study: Birgus latro

    Old constant size populations Young expanding populations

    Lavery, S., Moritz, C. & Fielder, D. R. 1996. Mol. Biol. Evol. 13, 1106-1113.

    Old constant size populations Young expanding populations

  • La coalescence

    Estimer des paramtres dmographiques:

    Taux de migration

    Taux de croissance

    Date des vnements de scission Date des vnements de scission

    Tester lintensit de la slection

  • Origins and Genetic Diversity of Pygmy

    Hunter-Gatherers from Western Central Africa

  • Tracing back population history

    Genetic adaptation to different life style:

    hunter-gatherer versus farmers

    Genetic factors involved in small height Genetic factors involved in small height

    With the teams of L Quintana-Murci (Pasteur Institute), Y LeBouc (Hpital

    Armand-Trousseau), F Austerlitz (Orsay)

  • 22 populations

    10 Pygmy population12 Non-pygmy populations

    Average = 28 individuals/population

    28 nuclear microsatellite loci

    Population Set

  • ACP analysis based on the pairwise FST values

    Pygmy populations clearly differentiated from non-pygmies.

    Pygmy populations more scattered on the graph (more differentiated)

  • Individual Structuring of the Central African Genetic Diversity

    Pygmies and Non-Pygmies cluster in two groups

    Among Pygmies :

    Asymmetric admixture signal from Non-pygmies

    Heterogenous signal : variable admixture intensity with Non-pygmies

    Does it echo some sociocultural rules on intermarriages

    between Pygmies and Non-pygmies ?

  • Social Intermarriage Rules between Pygmies and Non-pygmies

    Unprobable

    Marriages

    Potential

    Marriages

    Patrilocality :

    Married woman livesMarried woman lives

    at her husbands village

    Frequent Divorces :

    The pygmy woman goes back to

    her community of origin

    + Illegitimate childs

    Asymmetrical Admixture from Non-pygmies into Pygmies

    Heterogenous Signal among Pygmies = specific social relationships vs Non-pygmies

    and immediate pygmy neighbours

  • Principle of ABC methods: example of two

    diverging populations

    TN1

    N2

    Na

    Parameters to estimate : N1, N2, Na, T, .

    Observed statistics : He1, He2, FST

    The program draws the values of the parameters in uninformative prior

    distributions and perform simulations with these values, and compute the

    same statistics on the simulated data.

    Only the simulations in which the simulated statistics are close enough from

    the observed statistics are kept, allowing thus an a posteriori estimation of the

    parameters.

  • Prior versus posterior

    distribution

    From the posterior distribution, we can obtain a

    estimate using the mode or the median value.

    We can also obtain 95% confidence intervals.

  • Linear regression method

    Beaumont et al (2002) Genetics 162, 2025-2035.

  • ABC study

    Comparing two scenarios

    Estimating the parameters for the best scenario

    Performed with the software DiY ABCCornuet et al (2008) Bioinformatics (advanced online publication).

  • 35 summary statistics.

    the mean number of alleles per population

    the mean expected heterozygosity He

    the mean allele size variance expressed in

    base pairsbase pairs

    all pairwise FSTs

    All pairwise genetic distances ().

  • The common origin scenario wins

    Ancient separation time between pygmies and non-pygmies: 89,675 YBP (95% CI: 23,025 123,275).

    Recent divergence of the pygmy groups: 2,900 YBP (95% CI: 850 30,050)

    Similar level of admixture as with structure, except for the Baka

  • Estimated population sizes

    N1 (Baka) 8,137 (1,347 9,824) N2 (Bezan) 2,795 (790 9,677) N3 (Kola) 3,302 (603 9,599) N4 (Koya) 3,197 (1,134 9,771) Nnp (Non-pygmies) 77,157 (27,926 97,828) Nap (ancestral pygmy population) 8,007 (960 9,825) NA (ancestral population) 1,071 (202 8,404)

  • Most likely scenarioscenario

    Tested by ABC approach (Verdu et al, 2009 Current Biology)

  • Origins and Genetic Diversity of PygmyHunter-Gatherers from Western Central Africa

  • Inferring the Demographic History of African Farmersand Pygmy HunterGatherers Using a MultilocusResequencing Data SetPatin et al, PLOSGenetics 2009Patin et al, PLOSGenetics 2009

    20 sequences neutres

  • Structure analysis.Best likelihood K=4

  • Summary statitstics

  • Frequency spectrum

  • Distance entre simulation et donnes observes avec les

    Scnarios dmographiques:Tbot Sbot: temps et intensit du bottleneckTrec et Srec temps et intensit du recovery

    avec les statistiques S, pi, D, D*

    WPyg bot 2500-25000yrs (80% decrease) recovery 125yrs later 100-400%EPyg bot 250-2500 90-95% decrease No recovery

  • Modles dhistoire des populations

    A-WE le meilleur

  • Values:Na : 11402 (75000-15000)Tsep 56000 (25000-130000) Tsep Pyg 22000 (14 000-66 000)Gene flow : WPYG-EPYG / WPYG-AGR/ EPYG- AGR 4.4 10-4 1.8 10-4 2.4 10-5