Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Population Genetics I (Introduction + Neutral Theory)
Gurinder Singh “Mickey” Atwal Center for Quantitative Biology
23rd Oct 2015
Summary and definitions • Basic definitions/concepts
• Neutral theory of single loci
• Natural Selection • Haplotype analyses
PART 1
PART 2
DNA Sequence Variation : Single Nucleotide Polymorphisms
CAGCCAGACTGCCTTCCGGGTCACTGCCATGGAGGAGCCGCAGTCAGATCCTAGCGTCGAG
CCCCCTCTGAGTCAGGAAACATTTTCAGACCTATGGAAACTGTGAGTGGATCCATTGGAAGG
GCAGGCCACCACCCCGACCCCAACCCCAGCCCCCTAGCAGAGACCTGTGGGAAGCGAAAA
TTCATGGGACTGACTTTCTGCTCTTGTCTTTCAGACTTCCTGAAAACAACGTTCTGGTAAGGA
CAAGGGTTGGGCTGGGACCTGGAGGGCTGGGGGGGCTGGGGGGCTGGGACCTGGTCCTC
TGACTGCTCTTTTCACCCATCTACAGTCCCCCTTGCCGTCCCAAGCAATGGATGATTTGATGC
TGTCCCCGGACGATATTGAACAATGGTTCACTGAAGACCCAGGTCCAGATGAAGCTCCCAGA
ATGCCAGAGGCTGCTCCCCGCGTGGCCCCTGCACCAGCAGCTCCTACACCGGCGGCCCCT
GCACCAGCCCCCTCCTGGCCCCTGTCATCTTCTGTCCCTTCCCAGAAAACCTACCAGGGCA
GCTACGGTTTCCGTCTGGGCTTCTTGCATTCTGGGACAGCCAAGTCTGTGACTTGCACG
Part of human p53 gene (exons 2-4) • Chromosome 17
C T
C T
G C
C A
C A
G A
C T G
C
EXONS / INTRONS
Correlations in Genomic Studies
GCTCCCCGCGTGGCCCCTGCACC GENOTYPE
1. Correlations amongst alleles
PHENOTYPE e.g. onset of cancer, apoptosis rates
2. Genotype-phenotype correlations
many possible correlation statistics (D, D’, r2, δ,Q)
many possible tests of association (Χ2, fisher exact, cochran-armitage)
Goal of population genetics
• Understand forces that produce and maintain inherited genetic variation
• Forces – Mutation – Recombination – Natural Selection – Population Structure – Random birth/death (drift)
Hardy Weinberg Law • Consider 2 alleles (A,a) with frequency • Allele frequency of A = p • Allele frequency of a = q = 1-p • Randomly-mating large diploid population with
no mutation, migration, selection and drift
Genotype AA Aa aa
Hardy-Weinberg Frequency
p2
2pq
q2
Hardy Weinberg Law • Only need few rounds of random matings to get
HW equilibrium. (How many exactly for hermaphrodite and dioecious populations?)
• Fast time scale
• Deviation from HW equilibrium mainly due to – Strong Selection – Inbreeding – Population Subdivision – *Genotyping Errors *
Population Subdivision Genotype AA Aa aa
Frequency
p2(1-FST)+pFST
2pq(1-FST)
q2(1-FST)+qFST
• Wahlund effect • Effect gets bigger the more different the subpopulations • 0<FST<1, degree of subdivision • Heterozygosity less than expected
Population Inbreeding Genotype AA Aa aa
Frequency
p2(1-FI)+pFI
2pq(1-FI)
q2(1-FI)+qFI
• Effect gets bigger the more related the population • 0<FI<1, inbreeding coefficient • FI=probability that 2 alleles in an individual are identical by descent • Heterozygosity less than expected
Neutral Drift
• What happens when we consider a finite population size ?
• Allele frequencies can change even if there is no natural selection.
Evolution of a neutral mutant allele
Wright-Fisher Process
N in
divi
dual
s 2N
alle
les
mutation Derived allele extinction!
generation
Ancestral allele Derived allele
death time
Stochastic birth/death process (Moran model)
• Overlapping generations • Distribution of time to replication
Evolution of a neutral mutant allele
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12
mutation
alle
le fr
eque
ncy
time/generations
Derived allele
fixation !
N in
divi
dual
s
DIFFUSION Kimura diffusion theory
Natural Selection is more effective in larger populations
Genetic Drift dominates in smaller populations
N, population size
Darwinian evolution Genetic Drift
Neutral drift
Generations/Time
Allele frequency
~4N
Most new mutations are eventually lost Only a small fraction (1/2N) eventually fixate in the population
r = u
Neutral Molecular Evolution
Substitution rate Mutation rate
• Rate of new fixations equals the mutation rate and does not depend on N • Implies substitution rate is constant • Gives a molecular clock for neutral molecular evolution • Molecular divergence between 2 species should be proportional to number of generations since last common recent ancestor
Effective Population Size, Neff
1Neff
=1T
1N1
+1N2
+...+ 1NT
!
"#
$
%& Discrete time steps
T total time steps Ni=Population at
time step i
Human Population Expansion • Neff~10,000 (European Hapmap) • Nonadiabatic expansion
Heterozygosity, H
• Homozygosity, G=1-H
• Probability that 2 alleles drawn at random are different
• E.g. if biallelic then H=2p(1-p)
G=p2+(1-p)2
Heterozygosity decay
• Wright-Fisher
• Moran
⎟⎠
⎞⎜⎝
⎛−=NtHHt exp0
⎟⎠
⎞⎜⎝
⎛−= 202expNtHHt
Different microscopic models are equivalent upto rescaling of time
Mutation-Drift Balance
• Drift decreases H • Mutation increases H • Two forces cancel out to give equilibrium
variation in population
NuG
411
+=
NuNuH414+
=
Homozygosity
Heterozygosity
Mutation-Drift Balance
• Time scale of mutations ~ 1/u • Time scale of drift ~ 4N • Remember, drift eliminates variation and
mutations create variation
• If 4N<<1/u, population mostly devoid of variation
• If 4N>>1/u, population with much variation
4µN>>1
4µN<<1
Human SNP frequency distribution Distribution of allele frequencies in Chromosome 1
f
Non-coding (intergenic)
- 180 Northern European samples (HapMap consortium)
Empirical data
Allele frequency
Coalescent
Present
Time
22 individuals 18 ancestors
16 ancestors
14 ancestors
12 ancestors
9 ancestors 8 ancestors
8 ancestors
7 ancestors
7 ancestors
5 ancestors 5 ancestors
3 ancestors
3 ancestors
3 ancestors 2 ancestors
2 ancestors
1 ancestor
Present
Time
P(k coalesce to k-1)= k(k-1)/4N
P(pair coalesce)=1/2N
Bifurcating Tree
After t generations ?
Present
Time
Most recent common ancestor (MRCA)
Many different trees can produce the present population !
Properties of coalescent
• Random tree with random coalescent interval times ~ Wright-Fisher model
• Time to coalescence gets longer the further we go back in time
• The larger the population size the slower the rate of coalescence
Mutation ?
Present
Time
Most recent common ancestor (MRCA)
Present
Time
mutation
Most recent common ancestor (MRCA)
TCGAGGTATTAAC TCTAGGTATTAAC
Present
Time
Most recent common ancestor (MRCA)
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC
Present
Time
Most recent common ancestor (MRCA)
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC * ** * *
Efficient computer simulations of neutral mutation
1. Generate random genealogy of individuals back in time
2. Superimpose mutation