Upload
jordan-thomas
View
217
Download
4
Embed Size (px)
Citation preview
Lecture 21: Tests for Departures from Neutrality
November 9, 2012
Introduction to neutral theory
Molecular clock
Expectations for allele frequency distributions under neutral theory
Last Time
Today Sequence data and quantification of
variation
Infinite sites model
Nucleotide diversity (π)
Sequence-based tests of neutrality
Ewens-Watterson Test
Tajima’s D
Hudson-Kreitman-Aguade
Synonymous versus Nonsynonymous substitutions
McDonald-Kreitman
Expected Heterozygosity with Mutation-Drift Equilibrium under IAM
At equilibrium:
1
1
14
1
ee N
f
1
eH
Remembering that H = 1-f:
set 4Neμ = θ
Allele Frequency Distributions Neutral theory allows a
prediction of frequency distribution of alleles through process of birth and demise of alleles through time
Comparison of observed to expected distribution provides evidence of departure from Infinite Alleles model
Depends on f, effective population size, and mutation rate
Hartl and Clark 2007
Black: Predicted from Neutral Theory
White: Observed (hypothetical)
Ewens Sampling Formula
i
10
2
3
12
0
)(N
i ikE
3211)(
3
0
12
0
i
N
i iikE
.
Probability the i-th sampled allele is new given i alleles already sampled:
Probability of sampling a new allele on the first sample:
eH
1
Probability of observing a new allele after sampling one
allele:
Probability of sampling a new allele on the third and fourth samples:
12...
211
N
Expected number of different alleles (k) in a sample of 2N alleles is:
Example: Expected number of alleles in a sample of 4:
eN4Population mutation rate: index of variability of population:
Ewens Sampling Formula Predicts number of
different alleles that should be observed in a given sample size if neutrality prevails under Infinite Alleles Model
Small θ, E(n) approaches 1
Large θ, E(n) approaches 2N
θ can be predicted from number of observed alleles for given sample size
Can also predict expected homozygosity (fe) under this model
12...
211
)(12
0
N
inE
N
i
where E(n) is the expected number of different alleles in a sample of
N diploid individuals, and = 4Ne.
1
1
14
1
ee N
f
Ewens-Watterson Test
Compares expected homozygosity under the neutral model to expected homozygosity under Hardy-Weinberg equilibrium using observed allele frequencies
Comparison of allele frequency distributions
fe comes from infinite allele model simulations and can be found in tables for given sample sizes and observed allele numbers
2iHW pf
Ewens-Watterson Test Example
Drosophila pseudobscura collected from winery
Xanthine dehydrogenase alleles
15 alleles observed in 89 chromosomes
fHW = 0.366
Generated fe by simulation: mean 0.168
feHartl and Clark 2007
How would you interpret this result?
Most Loci Look Neutral According to Ewens-Watterson Test
Exp
ecte
d H
omoz
ygos
ity
f e
Hartl and Clark 2007
DNA Sequence Polymorphisms DNA sequence is ultimate view of standing genetic
variation: no hidden alleles
Is this really true?
What about back mutation?
Signatures of past evolution are contained in DNA sequence
Neutral theory presents null model
Departures due to:
Selection
Demographic events
- Bottlenecks, founder effects- Population admixture
Sequence Alignment Necessary first step for comparing sequences
within and between species
Many different algorithms
Tradeoff of speed and accuracy
Quantifying Divergence of Sequences
Nucleotide diversity (π) is average number of pairwise differences between sequences
ijij
ji ppN
N
1
where
N is number of sequences in sample,
pi and pj are frequency of sequences i and j in
the sample, and
πij is the proportion of sites that differ between
sequences i and j
Sample Calculation of π
A->B, 1 differenceA->C, 1 differenceB->C, 2 differences
5 10 15 20 25 30 35A
B
C
01867.0
)35/2)(33.0)(33.0()35/1)(33.0)(33.0()35/1)(33.0)(33.0(2
3
ijij
ji ppN
N
1
On average, there are 18.67 polymorphisms per kb between pairs of haplotypes in the population
Tajima’s D Statistic
Infinite Sites Model: each new mutation affects a new site in a sequence
Expected number of polymorphic sites in all sequences:
mE
)(
SaSE 1)(
1
11
1n
i ia
1a
SS
eN4where m is length of sequence, and
where n is number of different sequences compared
m
Sample Calculation of θS
Two polymorphic sitesS=2
5 10 15 20 25 30 35A
B
C
5.12
1
1
111
11
n
i ia 33.1
5.1
2
1
a
SS
01867.0 65.0)35)(01867.0( m
Tajima’s D Statistic Two different ways of estimating same
parameter:
Deviation of these two indicates deviation from neutral expectations
m 1a
SS
Sd
)(dV
dD where V(d) is variance of d
Tajima’s D Expectations
D=0: Neutrality
D>0
Balancing Selection: Divergence of alleles (π) increases
OR
Bottleneck: S decreases
D<0
Purifying or Positive Selection: Divergence of alleles decreases
OR
Population expansion: Many low frequency alleles cause low average divergence
Sd
Balancing Selection
Balancing
selection
‘balanced’ mutation
Neutral mutation
Slide adapted from Yoav Gilad
Should increase nucleotide diversity ()
Decreases polymorphic sites (S) initially.
D>0Sd
Recent Bottleneck
Rare alleles are lost Polymorphic sites (S) more severely
affected than nucleotide nucleotide diversity ()
D>0 Standard neutral model
Sd
Positive Selection and Purifying Selection
sweep
S
Slide adapted from Yoav Gilad
Advantageous mutation
Neutral mutation Should decrease both
nucleotide diversity () and polymorphic sites (S) initially.
S recovers due to mutation recovers slowly: insensitive
to rare alleles D<0
s sTime
recovery
Sd
Standard neutral model
Often two main haplotypes, some
rare alleles
Rapid Population Growth will also result in an excess of rare alleles even for neutral
loci
Slide adapted from Yoav Gilad
Tim
e
Rapid population
size increase
Most alleles are rare eN4
Most alleles are rare Nucleotide diversity
() depressed Polymorphic sites
(S) unchanged or even enhanced : 4Neμ is large
D<0
Sd
How do we distinguish these two forms of divergence (selection vs demography)?
Hudson-Kreitman-Aguade Test
Divergence between species should be of same magnitude as variation within species
Provides a correction factor for mutation rates at different sites
Complex goodness of fit test
Perform test for loci under selection and supposedly neutral loci
Polymorphism
Divergence
Neutral Locus Test Locus A
8/20 ≈ 3/8
Slide adapted from Yoav Gilad
Hudson-Kreitman-Aguade (HKA) test
Polymorphism: Variation within speciesDivergence: Variation between species
Polymorphism
Divergence
Neutral Locus Test Locus B
8 3
20 19
8/20 >> 3/19
Slide adapted from Yoav Gilad
Hudson-Kreitman-Aguade (HKA) test
Conclusion: polymorphism lower than expected in Test Locus B: Selective sweep?
http://www.nsf.gov/news/mmg/media/images/corn-and-teosinte_h1.jpg
http://www.nsf.gov/news/mmg/media/images/corn-and-teosinte_h1.jpgMauricio 2001; Nature Reviews Genetics 2, 376
Teosinte
Maize Maize w/TBR mutation
HKA Example: Teosinte Branched
Lab exercise: test Teosinte-Branched Gene for signature of purifying selection in maize compared to Teosinte relative
Compare to patterns of polymorphism and diversity in Alchohol Dehydrogenase gene