Upload
evan
View
40
Download
2
Embed Size (px)
DESCRIPTION
Datamining: Turning Biological Data into Gold. Limsoon Wong KRDL. Jonathan’s blocks. Jessica’s blocks. Whose block is this?. What is Datamining?. Jonathan’s rules: Blue or Circle Jessica’s rules: All the rest. What is Datamining?. Question: Can you explain how?. - PowerPoint PPT Presentation
Citation preview
Show & Tell
Limsoon WongKRDL
Datamining: Turning Biological Data
into Gold
Show & Tell
Jonathan’s rules : Blue or CircleJessica’s rules : All the rest
What is Datamining?What is Datamining?
Whose block is this?
Jonathan’s blocks
Jessica’s blocks
Show & Tell
What is Datamining?What is Datamining?
Question: Can you explain how?
Show & Tell
What are the Benefits?What are the Benefits? To the patient:
Better drug, better treatment To the pharma:
Save time, save cost, make more $ To the scientist:
Better science
Show & Tell
The Datamining ProcessThe Datamining Process
Show & Tell
Epitope PredictionEpitope Prediction
TRAP-559AAMNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYSEEVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLNLNDNAIHLYVNVFSNNAKEIIRLHSDASKNKEKALIIIRSLLSTNLPYGRTNLTDALLQVRKHLNDRINRENANQLVVILTDGIPDSIQDSLKESRKLSDRGVKIAVFGIGQGINVAFNRFLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAVCVEVEKTASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQCEEERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENIIDNNPQEPSPNPEEGKDENPNGFDLDENPENPPNPDIPEQKPNIPEDSEKEVPSDVPKNPEDDREENFDIPKKPENKHDNQNNLPNDKSDRNIPYSPLPPKVLDNERKQSDPQSQDNNGNRHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREEHEKPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVPGAATPYAGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN
Show & Tell
Epitope Prediction ResultsEpitope Prediction Results
Prediction by our ANN model for HLA-A11 29 predictions 22 epitopes 76% specificity
1 66 100Rank by BIMAS
Number of experimental binders 19 (52.8%) 5 (13.9%) 12 (33.3%)
Prediction by BIMAS matrix for HLA-A*1101
Show & Tell
Gene Expression AnalysisGene Expression Analysis
Clustering gene expression profiles Classifying gene expression profiles
find stable differentially expressed genes
Show & Tell
Gene Expression Analysis ResultsGene Expression Analysis Results
The Discovery System• Correlation test• Voter selection• Class prediction
Show & Tell
Protein Interaction ExtractionProtein Interaction Extraction
“What are the protein-protein interaction pathwaysfrom the latest reported discoveries?”
Show & Tell
Protein Interaction Extraction ResultsProtein Interaction Extraction Results
Rule-based system for processing free texts in scientific abstracts
Specialized in extracting
protein names extracting
protein-protein interactions
Show & Tell
Transcription Start PredictionTranscription Start Prediction
Show & Tell
Transcription Start Prediction ResultsTranscription Start Prediction Results
Show & Tell
Medical Record AnalysisMedical Record Analysis
Looking for patterns that are valid novel useful understandable
age sex chol ecg heart sick49 M 266 Hyp 171 N64 M 211 Norm 144 N58 F 283 Hyp 162 N58 M 284 Hyp 160 Y58 M 224 Abn 173 Y
Show & Tell
Medical Record Analysis ResultsMedical Record Analysis Results
DeEPs, a novel “emerging pattern’’ method
Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI benchmarks
Works for gene expressions
Show & Tell
Under the HoodUnder the Hood
Artificial neural network
Neighbourhood analysis
Non-linear analysis
Template matching
Emerging pattern Hidden markov models
Bayesian inference
Decision tree induction
...
Show & Tell
Behind the SceneBehind the Scene
Epitope Prediction Vladimir Brusic Judice Koh Seah Seng Hong Zhang Guanglan Yu Kun
Transcription Start Prediction Vladimir Bajic Seah Seng Hong
Gene Expression Analysis Zhang Louxin Zhang Zhuo Zhu Song
Medical Records Li Jinyan
Protein Interaction Extraction Ng See Kiong Zhang Zhuo