20
Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Embed Size (px)

Citation preview

Page 1: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Applying AI to Human Genome

Part 1 : Collecting data

Prof. M. EmbrechtsRobert BressBram Heyns

Page 2: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Overview

Basics of DNA Collecting the data Collection : my application Perl Goal

Page 3: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Basics of DNA

DNA = polymer of 4 molecules : bases or nucleotides

A = Adenine , C = Cytosine , G = Guanine , T = Thymine Replication ( copying ) and translation ( reading )

=> double helix : AT , GC ( copying ) 3 letter combination = codon RNA : U = Uracil in place of T => Transcribing Protein = polymer composed of 20 amino acids

( reading )=> more complex structure than DNA

Page 4: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Transition DNA RNA Protein

Page 5: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Intron – Exon - Splicejunction

• exon 200 characters intron thousands

• 30,000 genes identified out of possible 100,000

• Identification gene patent

Page 6: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns
Page 7: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns
Page 8: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns
Page 9: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Summary

Human : 23 chromosomes Chromosomes thousands of genes Gene info : exons , comments : introns Exons and introns codons Codon bases

Page 10: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Datacollection

Human Genome Project NCBI website : http//www.ncbi.nlm.nih.gov Entrez-Nucleotide.htm NCBI Sequence Viewer.htm

Page 11: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns
Page 12: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Datacollection

Human Genome Project NCBI website : http//www.ncbi.nlm.nih.gov Entrez-Nucleotide.htm NCBI Sequence Viewer.htm

Page 13: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Datacollection : my application

BioBrowser

Download HTML ExtractLinks() Download HTML - data

ExtractData()

TranslateData()

Page 14: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns
Page 15: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Datacollection : my application

BioBrowser

Download HTML ExtractLinks() Download HTML - data

ExtractData()

TranslateData()

Page 16: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Perl

Practical Extraction and Report Language POD – files -> web Portability Free – CPAN modules String manipilation Extremely powerfull regex-engine Glue language designed for short and simple tasks, not

equal to lack of power or “serious” features

Tutorial : http://www.netcat.co.uk/rob/perl/win32perltut.html

Page 17: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Regular Expression – Pattern Matching

Practical Extraction and Report Language Scan through data and extract useful

information m/PATTERN/ s/PATTERN/REPLACEMENT/ 1 line Perl = 100 lines C or Java Complex, but easy

Page 18: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Regex examples

/[KCZ]arl^sa/ /<I>/(.*?)<\/I>/i $1,$2,… i , g , c , … . , * , + , ? /([0-9a-zA-Z])+/ or /([\w])+/ s/us[^a-z]/them/g or s/us\W/them/g /([acc|act][ttt|ttc|att])/ TIMTOWTDT

Page 19: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Part 2 : Applying AI

Our choice : evolutionary computing First part : identify exon part Second part : identify splicejunctions Third part : combine previous parts Hope to reach +90% accuracy

Page 20: Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

Questions

?