25
1 Dan Geiger Computer Science Department, Technion PEDTOOL: Gene hunting based on high- throughput computing

Dan Geiger Computer Science Department, Technion

  • Upload
    osman

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

PEDTOOL: Gene hunting based on high-throughput computing. Dan Geiger Computer Science Department, Technion. חיפוש גנים החושפים או גורמים למחלות. מדוע לחפש ? 1. בדיקות טרום לידתיות לאוכלוסיה בסיכון גבוהה 2. בדיקת סיכון והתאמת אורך החיים לגורמי סיכון 3. מציאת החלבונים המוטנטים ופיתוח תרופות - PowerPoint PPT Presentation

Citation preview

Page 1: Dan Geiger Computer Science Department, Technion

1

Dan GeigerComputer Science Department, Technion

PEDTOOL: Gene hunting based on high-throughput

computing

Page 2: Dan Geiger Computer Science Department, Technion

2

חיפוש גנים החושפים או גורמים למחלות

? מדוע לחפשבדיקות טרום לידתיות לאוכלוסיה בסיכון גבוהה .1בדיקת סיכון והתאמת אורך החיים לגורמי סיכון .2

מציאת החלבונים המוטנטים ופיתוח תרופות .3הבנת תהליכים ביולוגיים בסיסיים .4

? כיצד ניתן לחפש

מציאת משפחות בהם קיימת מחלה המועברת מדור .1לדור

לקיחת בדיקת דם פשוטה ממספר חולים ובריאים .2ניתוח מעבדתי של הדנא על כל הכרומוזומים .3

שלוש בעיות ניתוח באמצעים אלגוריתמים. אדגיש .4.חישוביות

Page 3: Dan Geiger Computer Science Department, Technion

4

Usage of our system in Israeli Hospitals

Rabin Hospital, by Motti Shochat’s group New locus for mental retardation (2003) Infantile bilateral striatal necrosis (2004)

Soroka Hospital, by Ohad Birk’s group Lethal congenital contractural syndrome

(2004) Congenital cataract (2005)

Rambam Hospital, by Eli Shprecher’s group

Congenital recessive ichthyosis (2005) CEDNIK syndrome (2005)

Galil Ma’aravi Hospital, by Tzipi Falik’s group

Familial Onychodysplasia and dysplasia Familial juvenile hypertrophy (2005)

Page 4: Dan Geiger Computer Science Department, Technion

5

Steps in Gene Hunting

Linkageanalysis

(106~107 bp)

Identifygenes

(104~105 bp)Resequencing

(100 bp)

Page 5: Dan Geiger Computer Science Department, Technion

6

Recombination During Recombination During MeiosisMeiosis

Recombinant gametes

Male or female

Page 6: Dan Geiger Computer Science Department, Technion

7

Family Pedigree

Page 7: Dan Geiger Computer Science Department, Technion

8

Familial Onychodysplasia and dysplasia of distal phalanges (ODP)

III-15 IV-10

IV-7

Page 8: Dan Geiger Computer Science Department, Technion

9

Familial juvenile hypertrophy of the breast (JHB)

IV-3

Page 9: Dan Geiger Computer Science Department, Technion

10

Marker Information Added סמנים)גנטיים)

Id, dad, mom, sex, affMarker 1Marker 2III-21 II-10 II-11 f h0000

II-5 I-3 I-4 f h15515713

III-7 II-4 II-5 f a15515711

III-13 II-4 II-5 m a15115511

III-14 II-1 II-2 f h15115523

III-15 II-4 II-5 m a15115511

III-16 II-10 II-11 f h15115914

III-5 II-4 II-5 f h15115511

IV-1 III-13 III-14 f h15115513

IV-2 III-13 III-14 f a15115513

IV-3 III-13 III-14 female a15515513

.

M1 M2

Chromosome pair:

Page 10: Dan Geiger Computer Science Department, Technion

11

Maximum Likelihood Maximum Likelihood Evaluation- Two Point Evaluation- Two Point

Analysis (Task 1)Analysis (Task 1)

III-15 151,159III-16 151,155

202,209202,202

ah

139,141139,146

1,23,3

M1 M2 M3 M4D1

θ

The first computational problem: find a value of θ that maximizes Pr(data|θ,Mode-Of-Iheritance)

Data means here one marker data at a time.LOD score (to quantify how confident we are):

Z(θ)=log10[Pr(data|θ) / Pr(data|θ=½)].

D2

Page 11: Dan Geiger Computer Science Department, Technion

12

Marker information

Recombination fraction

IdName0.000.010.050.100.20

9 m93.96 3.90 3.62 3.27 2.51

Results of Two-Point Analysis

Page 12: Dan Geiger Computer Science Department, Technion

13

Marker information

Recombination fraction

IdName0.000.010.050.100.20

4 m4-14.82 -1.57 -0.13 0.42 0.72

9 m93.96 3.90 3.62 3.27 2.51

13 m13-2.91 -2.31 -1.37 -0.86 -0.37

Results of Two-Point Analysis

Page 13: Dan Geiger Computer Science Department, Technion

14

Marker information

Recombination fraction

IdName0.000.010.050.100.20

4 m4-14.82 -1.57 -0.13 0.42 0.72

5 m53.67 3.60 3.35 3.02 2.31

6 m62.27 2.23 2.08 1.86 1.38

9 m93.96 3.90 3.62 3.27 2.51

10 m101.96 2.20 2.42 2.35 1.92

11 m111.09 1.08 1.04 0.98 0.80

12 m12-0.84 -0.56 -0.14 0.03 0.14

13 m13-2.91 -2.31 -1.37 -0.86 -0.37

Results of Two-Point Analysis

Page 14: Dan Geiger Computer Science Department, Technion

15

Maximum Likelihood Maximum Likelihood Evaluation Approach Evaluation Approach

(Task 2)(Task 2)Most probable Haplotype Configuration

of some or all persons:

Which alleles came from the mother and which from the father ?

The second computational problem: argmax Pr(h1,h2,…,h 2n-1, h2n |

data,θ,MOI)

For each person, there are 2k possible haplotypes, where k is the number of markers considered.

Page 15: Dan Geiger Computer Science Department, Technion

16

ID M3 M4 M5 M6 M9 M10 M11 M12 M13

III-71212  1  x2222

3  3  x3133122

IV-3152252123

443133122

IV-433  1  x252123

443133122

IV-71315  6  x2413

443133122

IV-10

131432413

443133122

Results of Haplotyping Analysis(Affected persons)

Page 16: Dan Geiger Computer Science Department, Technion

17

Results of Haplotyping Analysis(Healthy persons)

ID M3 M4 M5 M6 M9 M10 M11 M12 M13

II-5212122222121211111

III-14152252123331341132

III-16131432413131564314

IV-61315643144  4  x12111    1  x2

IV-81314324131212  1  x3122

Page 17: Dan Geiger Computer Science Department, Technion

18

Maximum Likelihood Maximum Likelihood Evaluation Multipoint Evaluation Multipoint

Analysis(Task 3)Analysis(Task 3)

III-15 151,159III-16 151,155

202,209202,202

ah

139,141139,146

1,23,3

M1 M2 M3 M4D1

θ

The third computational problem: find a value of θ that maximizes Pr(data|θ,MOI)

Data now means considering several markers at once.

Page 18: Dan Geiger Computer Science Department, Technion

19

Results of Multipoint Results of Multipoint AnalysisAnalysis

Position in centi-MorgansLn(Likelihood)LOD0.0000( Marker 3)-216.0217-14.74

0.5500-192.2385-4.41 1.1000( Marker 4)-216.0210-14.74

3.6000-176.38102.47 6.1000( Marker 5)-174.33923.35

8.6500-173.97433.51 11.2000( Marker 6)-173.70303.63

16.5500-173.31063.80 21.9000( Marker 9)-172.94973.96

25.2500 -173.65403.65 28.6000( Marker 10)-177.56221.95

40.3001-178.99461.33

23

Page 19: Dan Geiger Computer Science Department, Technion

20

The Computational TaskThe Computational Task

Computing P(data|θ) for a specific value of θ :

kx x x

n

iii paxPP

3 1 1

)|()|( data

ij ikl kjm lmnm n l k

Y A B C

This problem is equivalent to finding the best order for sum-product operations for high dimensional matrices :

Page 20: Dan Geiger Computer Science Department, Technion

21

Stochastic Greedy Ordering Stochastic Greedy Ordering Algorithm(s)Algorithm(s)

• Iteration i: – three indices yielding minimal table size are

found.– a coin (biased according to the resulting

table size) is flipped to choose between them.

• The algorithm is repeated many times unless a low cost elimination sequence is found.

Repeat these steps with several cost functions.

Page 21: Dan Geiger Computer Science Department, Technion

22

When intermediate tables become too large

for a given RAM, computation virtually halts:

ij ikl kjm lmnm n l k

Y A B C

iljm ikl kjmk

Y A B

But we can fix the value of the index m,

namely, condition on m’s value, and do each

part as a separate job:m milj ikl kj

k

Y A B

Page 22: Dan Geiger Computer Science Department, Technion

23

The Pedtool SystemThe Pedtool SystemDivides the computation of a single likelihood to hundreds of computers.Uses Condor at UW-Madison research pool.Simple user interface – used by novicesAble to compute a highly inbred pedigree with 250 individuals sent by NIH.

ij ikl kjm lmnm n l k

Y A B C Faster by 1-5 orders of magnitude

over other linkage programs.

Page 23: Dan Geiger Computer Science Department, Technion

24

Running times improvements

Files No. of Run Time Run Time Run Time Run TimeLoci V1.0 V1.1 V1.4 Online

A6 12 2.72 1.26 2.36 localA7 14 1.84 1.36 1.48 localA8 18 4.32 3.14 0.51 localA9 37 8231.04 265.32 28.56 local

A10 38 9871.33 3543.46 36.12 localA11 40 57.85 local

Mira-46 14 >6000m* 100m

Ginat-115 1 >1500m 70mEric-105 1 40m 3mEric-105 2 >1000m 15m

bioinfo.cs.technion.ac.il/pedtool

Page 24: Dan Geiger Computer Science Department, Technion

25

The Main Goals of future The Main Goals of future ResearchResearch

EfficiencySimplicityAvailability online to all Israeli researchers.More functionalities

bioinfo.cs.technion.ac.il/pedtool

Page 25: Dan Geiger Computer Science Department, Technion

26

Students:Ma’ayan Fishelson, Ph.D (Graduated 2004) Dmitry Rusakov, Ph.D (Graduated 2004)Anna Tzemach, M.ScNickolay Dovgolevsky, B.Sc (Graduated, 2004)Mark Silberstein, M.ScJulia StolinEdward Vitkin

Collaborators from medical genetics:Motti Shochat and Tami Shochat (Rabin)Ohad Birk and Rivka Ophir (Soroka)Tzipi Falik and Morad Khayat (Galil Ma’aravi)

Collaborators from distributed systems:Assaf Schuster

Pedtool is to be hosted by DSL at the CS/Technion and supported by IBM, ISF, Israeli Science Ministry

Acknowledgements