65
F ST & Some Selection Index ˜T, xlL Yü t 2014 @˜- GSPH, SNU October 29, 2014 @˜- (GSPH, SNU) F ST & Some Selection Index October 29, 2014 1 / 65

Selection index population_genetics

Embed Size (px)

Citation preview

FST & Some Selection Index진화, 인구집단 유전학과 건강 2014

김진섭

GSPH, SNU

October 29, 2014

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 1 / 65

Fst

Contents

1 Fst

Wright’s F -statisticsCockerham’s θ-statistics

2 Selection IndexEHHiHSxp-EHH

3 Practice

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 2 / 65

Fst Wright’s F -statistics

3 types of Heterozygosity[4]

Individual, Subpopulation, Total Population

1 HI = 1n

∑ni=1 Hi

2 HS = 1n

∑ni=1 2piqi

3 HT = 2pq

(Hi : observed heterozygosity in ith subpopulation, 2piqi : averageheterozygosity in ith subpopulation, 2pq: average heterozygosity of totalpopulation)Locus 별로 값 구한다.

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 3 / 65

Fst Wright’s F -statistics

Wright’s F -statistics[4]

1 FIS = HS−HIHS

2 FST = HT−HSHT

3 FIT = HT−HIHT

Example

FST = 0 → Subpopulation의 effect없다!! 차이 없다.

FST = 1 → Subpopulation별로 차이가 크다.

Simple relation

1− FIT = (1− FIS)(1− FST )

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 4 / 65

Fst Wright’s F -statistics

http://academic.reed.edu/biology/professors/srenn/pages/

research/2011_students/sean/SM_thesis.html

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 5 / 65

Fst Wright’s F -statistics

http://www.johnderbyshire.com/Miscellaneous/Other/Fst.jpg

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 6 / 65

Fst Wright’s F -statistics

FST inference[5]

Convenient measure of genetic differentiation.

Most widely used descriptive statistics in population andevolutionary genetics.

Natural selection in particular subpopulation.

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 7 / 65

Fst Wright’s F -statistics

Problem in estimation

HT = 2pq

1 Subpopulation마다 sample수가 다르면??

2 Ex: SASIA 1000명, Oceania 100명..

3 제대로 된 p 추정이 아님.

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 8 / 65

Fst Cockerham’s θ-statistics

ANOVA approach[1, 5]

θ =σPσT

(σP : variance due to population, σT : total variance)

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 9 / 65

Fst Cockerham’s θ-statistics

Wright’s FST = Cockerham’s θ

실제 계산은 대부분 θ

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 10 / 65

Fst Cockerham’s θ-statistics

θ inference

Population > 2

대세와 다른 population이 있다!!

어떤 population인지는 말 안해준다.

Pairwise FST

2 population만 가지고 계산.

상대적인 비교.

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 11 / 65

Fst Cockerham’s θ-statistics

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 12 / 65

Fst Cockerham’s θ-statistics

Figure: FST calculated for each SNP between Tibetan and Han populations[6]

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 13 / 65

Fst Cockerham’s θ-statistics

Figure: Inter-population pairwise comparisons of FST statistics

http://academic.reed.edu/biology/professors/srenn/pages/

research/2011_students/sean/SM_thesis.html김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 14 / 65

Selection Index

Contents

1 Fst

Wright’s F -statisticsCockerham’s θ-statistics

2 Selection IndexEHHiHSxp-EHH

3 Practice

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 15 / 65

Selection Index

특정 인구집단에 특정 haplotype이 많냐??Example: Erik Corona’s slide - Next slide

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 16 / 65

Selection Index

Population Genetics

Glucose

HAPLOTYPESGATTACAGATTACA 22%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 24%AACTACAGATTACC 16%GATTACAGACTACA 7%AATTACAGATTACA 9%

Lactase + H2O

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 17 / 65

Selection Index

Population Genetics

Lactase + H2O

Glucose

HAPLOTYPESGATTACAGATTACA 22%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 24%AACTACAGATTACC 16%GATTACAGACTACA 7%AATTACAGATTACA 9%

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 18 / 65

Selection Index

Population Genetics

Lactase + H2O

Glucose

HAPLOTYPESGATTACAGATTACA 22%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 24%AACTACAGATTACC 16%GATTACAGACTACA 7%AATTACAGATTACA 9%

AATTGCAGATTACA <1%

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 19 / 65

Selection Index

Population Genetics

Lactase + H2O

Glucose

HAPLOTYPESGATTACAGATTACA 22%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 24%AACTACAGATTACC 16%GATTACAGACTACA 7%AATTACAGATTACA 9%

AATTGCAGATTACA <1%

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 20 / 65

Selection Index

Population Genetics

Lactase + H2O

Glucose

HAPLOTYPESGATTACAGATTACA 21% -1%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 24%AACTACAGATTACC 16%GATTACAGACTACA 7%AATTACAGATTACA 8% -1%

AATTGCAGATTACA 2% +2%

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 21 / 65

Selection Index

Population Genetics

Lactase + H2O

Glucose

HAPLOTYPESGATTACAGATTACA 21% -1%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 23% -1%AACTACAGATTACC 15% -1%GATTACAGACTACA 7%AATTACAGATTACA 7% -2%

AATTGCAGATTACA 5% +5%

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 22 / 65

Selection Index

Population Genetics

Lactase + H2O

Glucose

HAPLOTYPESGATTACAGATTACA 20% -2%AATTACAGATTAAA 3%GACTACAGATTACC 19%GATTACCTATTAAC 23% -1%AACTACAGATTACC 15% -1%GATTACAGACTACA 6% -1%AATTACAGATTACA 5% -4%

AATTGCAGATTACA 9% +9%

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 23 / 65

Selection Index EHH

EHH: Sabeti, Reich et al. (2002)[7]

Extended Haplotype Homozygosity

Random으로 2개 haplotype 뽑았을 때 그것이 같을 확률은??

0 → haplotype이 다 다르다.

1 → haplotype이 모두 같다.

관심있는 haplotype을 Core라 한다.

EHHt =

∑si=1

(eti2

)(ct2

)(t: core haplotype, c : the number of samples of a particular corehaplotype, e: the number of samples of a particular extended haplotype, s:the number of unique extended haplotype)

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 24 / 65

Selection Index EHH

How can we detect Pos. Sel.?

AATTACAGATTACA 50 people have thisGATTACAGATTACA 50 people have this---- 50 KB ----

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 25 / 65

Selection Index EHH

50 KB + 20 KB = 70 KB__AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7

GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26

How can we detect Pos. Sel.?

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 26 / 65

Selection Index EHH

Extended Haplotype Homozygosity (EHH)

AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7

GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 27 / 65

Selection Index EHH

( (32

52

72

82)+

Extended Haplotype Homozygosity (EHH)

AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7

GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26

102)+( )+( )+ )+( )+6

2( )+42( )72

)502(

(

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 28 / 65

Selection Index EHH

)+

Extended Haplotype Homozygosity (EHH)

AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7

GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26

102( )+ 8

2( )+72( )+5

2( )+32( )+6

2( )+42( )72(

)502(

0.121

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 29 / 65

Selection Index EHH

EHH Drops Over Genetic Distance

EHH drops off quickly over genetic distanceStarts with 1Ends at 0

Every hap block will eventually be unique

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 30 / 65

Selection Index EHH

AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7

GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26

EHH What It Is & What It Isn’tDetects over‐representation of a haplotype

This will raise the p(two haps are homozygous)Does NOT detect if a haplotype spread quickly

Low recombination != spread quickly

AATTACAGATTACA AACACGC 22AATTACAGATTACA ATGATAG 28

GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 31 / 65

Selection Index EHH

Compare EHH ScoresAATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7

GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26

)+242( )262(

)502(

0.121

0.490

Low RecombinationOver Represented

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 32 / 65

Selection Index EHH

Can EHH Detect Pos. Sel.?

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 33 / 65

Selection Index EHH

Relative EHH

Detects over‐representation of a haplotypeLow recombinationThis will raise the p(two haps are homozygous)

Does detect if a haplotype spread quicklyOther haplotype blocks are controls!

Recombination cold‐spot / hot‐spot agnosticLow score if both alleles are assoc. w/ high or low recombination

AATTACAGATTACA AACACGC 22AATTACAGATTACA ATGATAG 28

GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 34 / 65

Selection Index EHH

Extended Haplotype Homozygosity (EHH)

AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7

GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26

0.121

0.490

0.4900.121

= 4.05REHH =

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 35 / 65

Selection Index EHH

REHH: Problem #1

We get a different REHH value at different genetic distance cutoffs

AATTACAGATTACA 50GATTACAGATTACA 50---- 50 KB ----

REHH = 1.0

AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7

GATTACAGATTACA CACATAG 24 GATTACAGATTACA CACACAG 26---------- 70 KB ---------

REHH = 4.05

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 36 / 65

Selection Index EHH

Which REHH value to use?

Extend to the right

AGTTACAGATTACAAACACGCAAATACAGATTACAATGATAG AATTACAGATTACAAACCCAGAATTTCAGATTACACTGACAGAATTAAAGATTACACAGACAG AATTACCGATTACAAACACAG AATTACAAATTACACACACAGAATTACAGGTTACACACCCAG

GATTACAGATTACACACATAG GATTACAGATTACACACACAG

---------- 70 KB ---------REHH = 4.05

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 37 / 65

Selection Index EHH

…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…

…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------

REHH = 4.05

Which REHH value to use?

Extend to the right

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 38 / 65

Selection Index EHH

Which REHH value to use?

Extend to the right

…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…

…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------

REHH = 4.05

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 39 / 65

Selection Index EHH

Which REHH value to use?

Extend to the right

…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…

…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------

REHH = 4.05

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 40 / 65

Selection Index EHH

Which REHH value to use?

Extend to the right

…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…

…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------

REHH = 4.05

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 41 / 65

Selection Index EHH

Which REHH value to use?

Extend to the right

…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…

…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------

REHH = 4.05

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 42 / 65

Selection Index EHH

Which REHH value to use?

Extend to the left

…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…

…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------

REHH = 4.05

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 43 / 65

Selection Index EHH

Which REHH value to use?

Extend to the left

…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…

…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------

REHH = 4.05

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 44 / 65

Selection Index EHH

Which REHH value to use?

Extend to the left

…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACACACCCAG…

…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG---------- 70 KB ---------

REHH = 4.05

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 45 / 65

Selection Index EHH

REHH: Problem #2

REHH score is heavily biased by allele frequenciesMust normalize

P(REHH | Allele Freq.)

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 46 / 65

Selection Index EHH

REHH: Problem #3

Not possible to detect selection in high frequency allelesSolution requires a X‐population approach (discussed later)

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 47 / 65

Selection Index EHH

Leaves a lot to be desiredPicking the maximum is arbitrary

Why not the mean REHH score?Biased by allele frequency

ln(REHH | allele freq) ~ norm dist.Still widely used and published with

REHH Overview

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 48 / 65

Selection Index EHH

Site-specific EHH[9]

두 allele의 EHH값의 대략적인 평균(weight: squared allele frequencies)

Focal SNP의 대략적인 EHH크기

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 49 / 65

Selection Index iHS

iHS: sabeti(2007)[8]

모든 위치에 대해 적분!!!!해서 비교

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 50 / 65

Selection Index iHS

Integrated Haplotype Score (iHS)

Unstandardized iHS = 

EHH

y  x

y = bwd distancex = fwd distanceEHHD = derived alleleEHHA = ancestral allele

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 51 / 65

Selection Index iHS

…ACAGATTACAGTTACAGATTACAAACACGC……ACAGATTACAAATACAGATTACAATGATAG… …ACAGATTACAATTACAGATTACAAACCCAG……ACAGATTACAATTTCAGATTACACTGACAG……ACAGATTACAATTAAAGATTACACAGACAG… …ACAGATTACAATTACCGATTACAAACACAG… …ACAGATTACAATTACAAATTACACACACAG……ACAGATTACAATTACAGTTACAACACCCAG…

…TACAGATTAGATTACAGATTACACACATAG …TACAGATTAGATTACAGATTACACACACAG

+ 0.5 = 1.20.7

4.0 + 4.4 = 8.4

Unstandardized iHSln(8.4/3.2)  =  0.419 

Integrated Haplotype Score (iHS)

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 52 / 65

Selection Index iHS

iHS Characteristics

As both alleles have the same AUC, iHS zeroLarge negative values indicate selection of allele in the denominatorLarge positive values indicate selection of allele in the numeratorStill heavily biased by allele frequency!

Z‐score normalization

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 53 / 65

Selection Index iHS

Unstandardized iHS ‐ E(iHS | Allele Frequency) SD(iHS | Allele Frequency) 

E(iHS | Allele Freq.):   Estimated from empirical distributionSD(iHS | Allele Freq.): Estimated from empirical distribution

Integrated Haplotype Score (iHS)

= iHS

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 54 / 65

Selection Index iHS

iHS Overview

iHS and REHH are EHH based methods to detect positive selectioniHS outperforms REHH in specific allele frequencies

They don’t completely outperform each other

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 55 / 65

Selection Index iHS

iHS: Problem #1Still can’t detect selection in high frequency (old) alleles

Relatively High EHH values are not present high frequency (old) alleles!Use a reference population

If pos. sel. didn’t take place in ref. population, EHH is high

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 56 / 65

Selection Index xp-EHH

xp-EHH: sabeti(2007)[8]

Population 별, 같은 allele별 integreted EHH를 비교!!

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 57 / 65

Selection Index xp-EHH

Cross Population EHH (XP‐EHH)

AATTACAGATTACA AACACGC 10AATTACAGATTACA ATGATAG 8 AATTACAGATTACA AACCCAG 7AATTACAGATTACA CTGACAG 5AATTACAGATTACA CAGACAG 3 AATTACAGATTACA AACACAG 6 AATTACAGATTACA CACACAG 4AATTACAGATTACA CACCCAG 7

Same allele but diff populationAATTACAGATTACA CACATAG 20 AATTACAGATTACA CACACAG 30

0.5

XP‐EHH = ln(3.3/0.5) = 1.89  Z‐score Norn

Integrate EHH over distance from alleleCalculated for fwd/rev sides independentlyIntegrate until EHH = 0.04 in e.a. population

3.3

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 58 / 65

Selection Index xp-EHH

REHH and iHS are more or less complementarye.a. is better at detecting pos. sel. at diff freqs.

XP‐EHHCan detect pos. sel. in high freq. allelesSusceptible to population variation in recombination rate

Overview

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 59 / 65

Selection Index xp-EHH

Final Verdict: REHH vs iHS vs XP‐EHH

REHHiHS testXP‐EHH

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 60 / 65

Selection Index xp-EHH

Rsb[9]

Population끼리 비교하는 또다른 지표.

Population별로만 비교.

Locus별로 두 allele의 integrated EHH의 average: iES

Locus의 대략적인 selection정도를 population끼리 비교.

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 61 / 65

Practice

Contents

1 Fst

Wright’s F -statisticsCockerham’s θ-statistics

2 Selection IndexEHHiHSxp-EHH

3 Practice

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 62 / 65

Practice

FST

hierfstat[3]

PER3 gene in HGDP(Human Genome Diversity Panel): 289 SNPs &7 population

EHH, iHS

rehh[2]

패키지 자체 제공 예제

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 63 / 65

Practice

Reference I

[1] Cockerham, C. C. (1969). Variance of gene frequencies. Evolution, pages 72–84.

[2] Gautier, M. and Vitalis, R. (2012). rehh: an r package to detect footprints of selection in genome-wide snp data fromhaplotype structure. Bioinformatics, 28(8):1176–1177.

[3] Goudet, J. (2005). Hierfstat, a package for r to compute and test hierarchical f-statistics. Molecular Ecology Notes,5(1):184–186.

[4] Hamilton, M. (2011). Population genetics. John Wiley & Sons.

[5] Holsinger, K. E. and Weir, B. S. (2009). Genetics in geographically structured populations: defining, estimating andinterpreting fst. Nature Reviews Genetics, 10(9):639–650.

[6] Huerta-Sanchez, E., Jin, X., Bianba, Z., Peter, B. M., Vinckenbosch, N., Liang, Y., Yi, X., He, M., Somel, M., Ni, P., et al.(2014). Altitude adaptation in tibetans caused by introgression of denisovan-like dna. Nature, 512(7513):194–197.

[7] Sabeti, P. C., Reich, D. E., Higgins, J. M., Levine, H. Z., Richter, D. J., Schaffner, S. F., Gabriel, S. B., Platko, J. V.,Patterson, N. J., McDonald, G. J., et al. (2002). Detecting recent positive selection in the human genome from haplotypestructure. Nature, 419(6909):832–837.

[8] Sabeti, P. C., Varilly, P., Fry, B., Lohmueller, J., Hostetter, E., Cotsapas, C., Xie, X., Byrne, E. H., McCarroll, S. A.,Gaudet, R., et al. (2007). Genome-wide detection and characterization of positive selection in human populations. Nature,449(7164):913–918.

[9] Tang, K., Thornton, K. R., and Stoneking, M. (2007). A new approach for using genome scans to detect recent positiveselection in the human genome. PLoS biology, 5(7):e171.

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 64 / 65

Practice

END

Email : [email protected]: (02)880-2743H.P: 010-9192-5385

김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 65 / 65