Upload
osman
View
39
Download
0
Embed Size (px)
DESCRIPTION
PEDTOOL: Gene hunting based on high-throughput computing. Dan Geiger Computer Science Department, Technion. חיפוש גנים החושפים או גורמים למחלות. מדוע לחפש ? 1. בדיקות טרום לידתיות לאוכלוסיה בסיכון גבוהה 2. בדיקת סיכון והתאמת אורך החיים לגורמי סיכון 3. מציאת החלבונים המוטנטים ופיתוח תרופות - PowerPoint PPT Presentation
Citation preview
1
Dan GeigerComputer Science Department, Technion
PEDTOOL: Gene hunting based on high-throughput
computing
2
חיפוש גנים החושפים או גורמים למחלות
? מדוע לחפשבדיקות טרום לידתיות לאוכלוסיה בסיכון גבוהה .1בדיקת סיכון והתאמת אורך החיים לגורמי סיכון .2
מציאת החלבונים המוטנטים ופיתוח תרופות .3הבנת תהליכים ביולוגיים בסיסיים .4
? כיצד ניתן לחפש
מציאת משפחות בהם קיימת מחלה המועברת מדור .1לדור
לקיחת בדיקת דם פשוטה ממספר חולים ובריאים .2ניתוח מעבדתי של הדנא על כל הכרומוזומים .3
שלוש בעיות ניתוח באמצעים אלגוריתמים. אדגיש .4.חישוביות
4
Usage of our system in Israeli Hospitals
Rabin Hospital, by Motti Shochat’s group New locus for mental retardation (2003) Infantile bilateral striatal necrosis (2004)
Soroka Hospital, by Ohad Birk’s group Lethal congenital contractural syndrome
(2004) Congenital cataract (2005)
Rambam Hospital, by Eli Shprecher’s group
Congenital recessive ichthyosis (2005) CEDNIK syndrome (2005)
Galil Ma’aravi Hospital, by Tzipi Falik’s group
Familial Onychodysplasia and dysplasia Familial juvenile hypertrophy (2005)
5
Steps in Gene Hunting
Linkageanalysis
(106~107 bp)
Identifygenes
(104~105 bp)Resequencing
(100 bp)
6
Recombination During Recombination During MeiosisMeiosis
Recombinant gametes
Male or female
7
Family Pedigree
8
Familial Onychodysplasia and dysplasia of distal phalanges (ODP)
III-15 IV-10
IV-7
9
Familial juvenile hypertrophy of the breast (JHB)
IV-3
10
Marker Information Added סמנים)גנטיים)
Id, dad, mom, sex, affMarker 1Marker 2III-21 II-10 II-11 f h0000
II-5 I-3 I-4 f h15515713
III-7 II-4 II-5 f a15515711
III-13 II-4 II-5 m a15115511
III-14 II-1 II-2 f h15115523
III-15 II-4 II-5 m a15115511
III-16 II-10 II-11 f h15115914
III-5 II-4 II-5 f h15115511
IV-1 III-13 III-14 f h15115513
IV-2 III-13 III-14 f a15115513
IV-3 III-13 III-14 female a15515513
.
M1 M2
Chromosome pair:
11
Maximum Likelihood Maximum Likelihood Evaluation- Two Point Evaluation- Two Point
Analysis (Task 1)Analysis (Task 1)
III-15 151,159III-16 151,155
202,209202,202
ah
139,141139,146
1,23,3
M1 M2 M3 M4D1
θ
The first computational problem: find a value of θ that maximizes Pr(data|θ,Mode-Of-Iheritance)
Data means here one marker data at a time.LOD score (to quantify how confident we are):
Z(θ)=log10[Pr(data|θ) / Pr(data|θ=½)].
D2
12
Marker information
Recombination fraction
IdName0.000.010.050.100.20
9 m93.96 3.90 3.62 3.27 2.51
Results of Two-Point Analysis
13
Marker information
Recombination fraction
IdName0.000.010.050.100.20
4 m4-14.82 -1.57 -0.13 0.42 0.72
9 m93.96 3.90 3.62 3.27 2.51
13 m13-2.91 -2.31 -1.37 -0.86 -0.37
Results of Two-Point Analysis
14
Marker information
Recombination fraction
IdName0.000.010.050.100.20
4 m4-14.82 -1.57 -0.13 0.42 0.72
5 m53.67 3.60 3.35 3.02 2.31
6 m62.27 2.23 2.08 1.86 1.38
9 m93.96 3.90 3.62 3.27 2.51
10 m101.96 2.20 2.42 2.35 1.92
11 m111.09 1.08 1.04 0.98 0.80
12 m12-0.84 -0.56 -0.14 0.03 0.14
13 m13-2.91 -2.31 -1.37 -0.86 -0.37
Results of Two-Point Analysis
15
Maximum Likelihood Maximum Likelihood Evaluation Approach Evaluation Approach
(Task 2)(Task 2)Most probable Haplotype Configuration
of some or all persons:
Which alleles came from the mother and which from the father ?
The second computational problem: argmax Pr(h1,h2,…,h 2n-1, h2n |
data,θ,MOI)
For each person, there are 2k possible haplotypes, where k is the number of markers considered.
16
ID M3 M4 M5 M6 M9 M10 M11 M12 M13
III-71212 1 x2222
3 3 x3133122
IV-3152252123
443133122
IV-433 1 x252123
443133122
IV-71315 6 x2413
443133122
IV-10
131432413
443133122
Results of Haplotyping Analysis(Affected persons)
17
Results of Haplotyping Analysis(Healthy persons)
ID M3 M4 M5 M6 M9 M10 M11 M12 M13
II-5212122222121211111
III-14152252123331341132
III-16131432413131564314
IV-61315643144 4 x12111 1 x2
IV-81314324131212 1 x3122
18
Maximum Likelihood Maximum Likelihood Evaluation Multipoint Evaluation Multipoint
Analysis(Task 3)Analysis(Task 3)
III-15 151,159III-16 151,155
202,209202,202
ah
139,141139,146
1,23,3
M1 M2 M3 M4D1
θ
The third computational problem: find a value of θ that maximizes Pr(data|θ,MOI)
Data now means considering several markers at once.
19
Results of Multipoint Results of Multipoint AnalysisAnalysis
Position in centi-MorgansLn(Likelihood)LOD0.0000( Marker 3)-216.0217-14.74
0.5500-192.2385-4.41 1.1000( Marker 4)-216.0210-14.74
3.6000-176.38102.47 6.1000( Marker 5)-174.33923.35
8.6500-173.97433.51 11.2000( Marker 6)-173.70303.63
16.5500-173.31063.80 21.9000( Marker 9)-172.94973.96
25.2500 -173.65403.65 28.6000( Marker 10)-177.56221.95
40.3001-178.99461.33
23
20
The Computational TaskThe Computational Task
Computing P(data|θ) for a specific value of θ :
kx x x
n
iii paxPP
3 1 1
)|()|( data
ij ikl kjm lmnm n l k
Y A B C
This problem is equivalent to finding the best order for sum-product operations for high dimensional matrices :
21
Stochastic Greedy Ordering Stochastic Greedy Ordering Algorithm(s)Algorithm(s)
• Iteration i: – three indices yielding minimal table size are
found.– a coin (biased according to the resulting
table size) is flipped to choose between them.
• The algorithm is repeated many times unless a low cost elimination sequence is found.
Repeat these steps with several cost functions.
22
When intermediate tables become too large
for a given RAM, computation virtually halts:
ij ikl kjm lmnm n l k
Y A B C
iljm ikl kjmk
Y A B
But we can fix the value of the index m,
namely, condition on m’s value, and do each
part as a separate job:m milj ikl kj
k
Y A B
23
The Pedtool SystemThe Pedtool SystemDivides the computation of a single likelihood to hundreds of computers.Uses Condor at UW-Madison research pool.Simple user interface – used by novicesAble to compute a highly inbred pedigree with 250 individuals sent by NIH.
ij ikl kjm lmnm n l k
Y A B C Faster by 1-5 orders of magnitude
over other linkage programs.
24
Running times improvements
Files No. of Run Time Run Time Run Time Run TimeLoci V1.0 V1.1 V1.4 Online
A6 12 2.72 1.26 2.36 localA7 14 1.84 1.36 1.48 localA8 18 4.32 3.14 0.51 localA9 37 8231.04 265.32 28.56 local
A10 38 9871.33 3543.46 36.12 localA11 40 57.85 local
Mira-46 14 >6000m* 100m
Ginat-115 1 >1500m 70mEric-105 1 40m 3mEric-105 2 >1000m 15m
bioinfo.cs.technion.ac.il/pedtool
25
The Main Goals of future The Main Goals of future ResearchResearch
EfficiencySimplicityAvailability online to all Israeli researchers.More functionalities
bioinfo.cs.technion.ac.il/pedtool
26
Students:Ma’ayan Fishelson, Ph.D (Graduated 2004) Dmitry Rusakov, Ph.D (Graduated 2004)Anna Tzemach, M.ScNickolay Dovgolevsky, B.Sc (Graduated, 2004)Mark Silberstein, M.ScJulia StolinEdward Vitkin
Collaborators from medical genetics:Motti Shochat and Tami Shochat (Rabin)Ohad Birk and Rivka Ophir (Soroka)Tzipi Falik and Morad Khayat (Galil Ma’aravi)
Collaborators from distributed systems:Assaf Schuster
Pedtool is to be hosted by DSL at the CS/Technion and supported by IBM, ISF, Israeli Science Ministry
Acknowledgements