21
Haplotype 에 에에에 genetic linkage/ association 에 에에에에 에에에에 에에 1) 에에에에에 에에에에 에에에에에에 2) 에에에에에 에에에에에에 3) 에에에에에 에에에 에에에에 에에에 1) , 에에에 2) , 에에에 1) , 에에에 3) , 에에에 3) , 에에에 3) , 에에에 3) , 에에 1)

Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구

  • Upload
    diem

  • View
    47

  • Download
    1

Embed Size (px)

DESCRIPTION

Haplotype 에 기초한 genetic linkage/ association 의 통계학적 알고리즘 연구. 남정모 1) , 김진흠 2) , 강대룡 1) , 신선미 3) , 이윤경 3) , 박정용 3) , 허남욱 3) , 서일 1). 1) 연세대학교 의과대학 예방의학교실 2) 수원대학교 통계정보학과 3) 연세대학교 대학원 보건학과. Background. Linkage Association - PowerPoint PPT Presentation

Citation preview

Page 1: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Haplotype 에 기초한 genetic linkage/association 의 통계학적 알고리즘 연구

1) 연세대학교 의과대학 예방의학교실2) 수원대학교 통계정보학과3) 연세대학교 대학원 보건학과

남정모 1), 김진흠 2), 강대룡 1), 신선미 3), 이윤경 3), 박정용 3), 허남욱 3), 서일 1)

Page 2: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Background

Linkage Association Recombinant fraction Allelic disequilibrium

Family Population

- Trio (TDT) - Case-Control

- (Affected) Sib-Pair

Single locus

Tightly linked multi-locus (Haplotype)

Page 3: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Background

Why haplotype-based? But …

• Haplotype set of closely linked genetic markers present on one chromosome which tends to be inherited together

• Many markers are genotyped within a very short physical distance

• More informative

• Haplotype information is not usually available from genotype information (Haplotype reconstruction)

eg. When # of heterozygous loci=c, # of possible haplotype pairs=2c-1

Page 4: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Background (Haplotype ambiguity)

• Hypothetical family trio

• Probable haplotype pairs

Sample Father (F) Mother (M) Child (c)

1 12 / 12 / 11 22 / 12 / 22 12 / 12 / 12

2 12 / 22 / 11 22 / 12 / 22 12 / 22 / 12

Sample Father (F) Mother (M) Child (c)

1 {111, 221} {222, 212} {111, 222}

or {121, 211} {212, 222} {121, 212 }

2 {121, 221} {212, 222} {121, 212}

• In Sample 1, F’s haplotype uncertainty exists !

• In Sample 2, haplotypes of parents are deducible !

Page 5: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Background

Previous researches

• Wilson (1997, AHG), Clayton & Jones (1999, AJHG) :

Discard families with ambiguous haplotypes

• Clayton (1999, AJHG) :

likelihood-based but not robust to population admixture

• Zhao et al. (2000, AJHG) :

How to resolve haplotype ambiguity ?

Allocate a conditional probability to each of haplotype group corresponding to a set of genotypes

Page 6: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Haplotype-based TDT (Zhao et al., 2000)

• For each g, estimate the number of families :

F with {Hi , Hj} transmits Hi and M with {Hk , Hl} transmits Hk

, ˆ i j k lik jlg g s s s s

i j k lg

h h h ht n

h h h h

• Transmission / Non-transmission table

, ,

g

ˆ ˆ ˆ k l i jg g

k l i j

t t t

ˆ ˆ( )T t

Page 7: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

• Based on Spielman & Ewens(1996)’s multi-allelic TDT

2

&1

ˆ ˆ( )-1ˆ ˆ ˆ2

h

s e

t thT

h t t t

• Remark

Ts&e follows the chi-square distribution with df = h-1 asy

mptotically ? No except h=2

Why ? Sham(1997,AJHG) & Lazzeroni and Lange(1998, Hum Hered) + dependency between cell counts

⇒ obtain empirical p-value by randomization process

Haplotype-based TDT (Zhao et al., 2000)

Page 8: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Purpose

• Propose new haplotype-based linkage / association test

• Investigate empirical levels and powers of proposed test statistics by simulation

• Test linkage / association between 4 SNPs (A-240T, T-93C, I/D, G2350A ) in ACE (Angiotensin-I Converting Enzyme) gene and hypertension

Page 9: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Proposed haplotype-based Score and LR test

• Stuart (1955)

1

1 1 1 1

,

ˆ ˆ ˆ ˆ ( , , )

ˆ ˆ ˆ2 , ( )

ˆ ˆ( ),

h

i

s

h

i iiij

ij ji

T

t t t t

t t t i j

t t i j

• Bradley & Terry (1952)

& 1 02(log log )

through a logistic model such as log( / )b t

ij ji i j

T L L

p p b b

Page 10: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Fig.1 Genealogy of four mutations at four loci used in simulation study

H1 H2 H3 H4 H5 H6 H7 H8

D → d

A3 → a3

A3 → a3

A3 → a3

A3 → a3

A2 → a2

A2 → a2

A1 → a1

(A1, A2, D, A3)

Simulation studies

# of loci = 3

Page 11: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Simulation studies• Types of haplotype frequencies

Type Pop. Frequencies of (H1 , H2 , H3 , H4 , H5 , H6 , H7 , H8)

1 1 (0.343, 0.147, 0.147, 0.063, 0.147, 0.063, 0.063, 0.027)

2 (0.490, 0.000, 0.210, 0.000, 0.210, 0.000, 0.090, 0.000)

2 1 (0.343, 0.147, 0.147, 0.063, 0.147, 0.063, 0.063, 0.027)

2 (0.343, 0.147, 0.147, 0.063, 0.147, 0.063, 0.063, 0.027)

3 1 (0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125)

2 (0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125)

• Haplotype with disease susceptible mutant allele = H7 , H8

• Background risk of disease occurrence = 0.1, 0.2

• RR=1(level); 1.2, 1.6, 2.0, 4.0, 6.0(power)

• # of subjects in each population = 200

• Generate genotype data for Case-Control(1 : 2) and Trio

• # of replication = 200 ; # of resampling = 100

Page 12: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Configuration of genetic data generating

ConfigurationHaplotype frequ

ency

Background risk

Pop1 Pop2

I Pop1 ≠ Pop2 0.1 0.1

II Pop1 ≠ Pop2 0.2 0.1

III Pop1 = Pop2 0.1 0.1

IV Pop1 = Pop2 0.2 0.1

V Pop1 = Pop2 0.1 0.1

VI Pop1 = Pop2 0.2 0.1

Page 13: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Empirical levels (5%)

ApproachSingle locus Haplotype

Conf. Locus1 Locus2 Locus3 LR § (Ts & e) Ts

Population-based

I 0.040 0.045 0.030 0.070

II 0.030 0.050 0.270 0.170

III 0.065 0.040 0.025 0.045 NA

IV 0.055 0.045 0.060 0.080

V 0.055 0.055 0.070 0.115

VI 0.055 0.070 0.050 0.080

Family-based

I 0.030 0.040 0.035 0.055 0.060

II 0.015 0.035 0.045 0.045 0.050

III 0.015 0.025 0.015 0.050 0.050

IV 0.055 0.010 0.045 0.075 0.060

V 0.035 0.055 0.070 0.080 0.075

VI 0.045 0.050 0.040 0.035 0.040§ Zhao, Curtis and Sham(2000, Hum Hered)’s x2 test

Page 14: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Empirical powers for Conf. I

ApproachSingle locus Haplotype

RR Locus1 Locus2 Locus3 LR § (Ts & e) Ts

Population-based

1.2 0.060 0.070 0.055 0.085

1.6 0.080 0.075 0.045 0.140

2.0 0.155 0.125 0.035 0.220 NA

3.0 0.370 0.340 0.065 0.555

4.0 0.665 0.640 0.050 0.820

6.0 0.975 0.975 0.035 0.995

Family-based

1.2 0.030 0.050 0.020 0.055 0.065

1.6 0.050 0.040 0.010 0.100 0.100

2.0 0.090 0.115 0.025 0.115 0.120

3.0 0.230 0.195 0.040 0.315 0.300

4.0 0.415 0.405 0.040 0.615 0.595

6.0 0.760 0.705 0.025 0.960 0.940

Page 15: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Empirical powers for Conf. V

ApproachSingle locus Haplotype

RR Locus1 Locus2 Locus3 LR § (Ts & e) Ts

Population-based

1.2 0.040 0.065 0.030 0.090

1.6 0.115 0.105 0.035 0.185

2.0 0.225 0.245 0.090 0.300 NA

3.0 0.645 0.675 0.055 0.760

4.0 0.940 0.940 0.030 0.990

6.0 1.000 1.000 0.030 1.000

Family-based

1.2 0.035 0.060 0.030 0.050 0.060

1.6 0.070 0.065 0.025 0.110 0.095

2.0 0.110 0.130 0.045 0.170 0.155

3.0 0.340 0.365 0.040 0.485 0.495

4.0 0.630 0.595 0.045 0.810 0.795

6.0 0.920 0.930 0.050 0.985 0.980

Page 16: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Kangwha study

• 783 students who were aged of 15 at Kangwha in 1995 were monitored up to 1997 every year

• Phenotype : High BP

• Case : students experienced at least once SBP>130mmHg or DBP>85mmHg

Control : selected from the student having the lowest BP sequentially

Case : Control = 101 : 176

• Trio : students whose parents’ genotypes are available among students in Case group, 40 trios

• 4 SNPs : A-240T, T-93C, I/D, G2350A of ACE in region 17q23

Page 17: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

,

Schematic diagram of the human ACE gene illustrating the location of 10 biallelic polymorphisms.

Ploymorphisms are numbered in base pairs relative to the start of transcription of the ACE gene.

Exons 1-26 are indicated with vertical bars and are numbered intermittently for clarity.

Keavney et al., Human Molecular Genetics 1998

Page 18: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Estimated haplotype frequencies

† Denote haplotypes corresponding to pair of (I/D, A-240T, T-93C, G2350A)

‡ EH, SAS, Proposed algorithm

Control Case

Bayesian EM ‡ Bayesian EM

IATG † 0.003 0.003 - -

IATA - - - -

IACG 0.017 0.016 0.010 0.010

IACA 0.594 0.595 0.604 0.604

ITTG - 0.002 - -

ITTA 0.034 0.033 0.015 0.015

ITCG - - 0.005 0.005

ITCA - - 0.010 0.010

DATG - - 0.005 0.005

DATA - - - -

DACG 0.011 0.012 - -

DACA 0.006 0.006 - -

DTTG 0.298 0.297 0.337 0.336

DTTA 0.026 0.026 0.010 0.010

DTCG 0.011 0.011 0.005 0.005

DTCA - - - -

Page 19: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

P-value of association and linkage test

† Normal approximation and Yate’s continuity correction

‡ Zhao et al. (2000, AJHG)’s and Score tests

§ Zhao, Curtis and Sham (2000, Hum Hered)’s χ2 test

Method Case-Control Trio

Single locus

I/D 0.988 0.217 / 0.280†

A-240T 0.899 0.132 / 0.175

T-93C 0.852 0.140 / 0.185

G2350A 0.828 0.170 / 0.223

Haplotype 0.184 § 0.152 / 0.095 ‡

Page 20: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

• Investigate single locus-based and haplotype-based association/ linkage test

- single locus-based < haplotype-based

- population-based <? > family-based

• Hypertension is NOT linked with the markers on ACE gene , but…

• In the future …

- how to reduce the haplotype uncertainty

- how to include observations with only one parent or with only sibship

- how to combine all types of observations into one test statistic

Summary

Page 21: Haplotype 에 기초한  genetic linkage/ association 의 통계학적 알고리즘 연구

Thank you