26
Eesti Teaduste Akadeemia 25. september 2007 TEADUSE UUED SUUNAD – “Arvutiteadus ja arvutusteadus” Bioinformaatika Jaak Vilo Arvutiteaduse Instituut Tartu Ülikool

Bioinformaatika - Eesti Teaduste Akadeemia · Bioinformaatika Jaak Vilo Arvutiteaduse Instituut ... Statistika Vigaste andmetega ... Kuluneen puolivuotiskauden aikana tietoliikenne-

  • Upload
    doannga

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Eesti Teaduste Akadeemia25. september 2007

TEADUSE UUED SUUNAD –“Arvutiteadus ja arvutusteadus”

BioinformaatikaJaak Vilo

Arvutiteaduse InstituutTartu Ülikool

Mis on bioinformaatika?■ Bioinformaatika: aitab kaasa bioloogia

mõistmisele arvutuslike meetodite abil– Andmed– Analüüs– Ennustamine ja modelleerimine– “Musta kasti sisu” lahti mõtestamine

■ Bioinformaatika areneb sedavõrd, kuidas koguneb andmeid, informatsiooni ja teadmisi, ja kuidas arenevad arvutid ja meetodid.

Ei ole just päris uusKuid “plahvatus” viimase 10 aastaga

■ Geneetika arvutuslikud mudelid (n. Mendel) avastati vaatlusandmetelt

■ Bioinformaatika tõlgendab (vaatlus)andmeidja aitab luua uusi teadmisi

■ Metodoloogia arendamine + praktika

■ Areneb väga kiiresti

Andmete tüüpe■ Järjestused (DNA, RNA, valgud, …)■ Molekulide struktuurid (3D)■ Puud (evolutsioon, sugupuud, … )■ Võrgud/graafid (rajad, interaktsioonid)■ Fenotüübi kirjeldused (rakutüübid, haigused,

jne)■ Uued tehnoloogiad => uued toorandmed

– Pildid,videod, … (imaging)■ …

Arvutuslikke meetodeid

■ Võrdlemine, otsimine, päringud■ Mustrite ja reeglite ennustamine■ Masinõpe, andmekaevandus■ Statistika■ Vigaste andmetega toime tulemine■ Arvutuste kiirus, …

– Palju probleeme arvutuslikult keerulised, kuid vajavad praktilisi lahendusi…

Kuidas on info kodeeritud?

■ Kus ja kuidas on oluline info “kirjas”■ Kuidas toimub selle lugemine ja

töötlemine rakkudes? ■ Mis on kõigi osade funktsioon?■ Millised on seosed (võrgud)■ ja dünaamika kõikide osade vahel?■ Süsteemi modelleerimine

Systems Biology(Bio-IT World)

■ The label -- systems biology -- is pretty awful, except, of course, for all the other even more awful labels that have been tried.

■ More important than what it's called is what systems biology seeks to do: transform biology and healthcare into a rigorous, predictive science whose fruits, it is hoped, will be a deeper, more-detailed understanding of biology and a vastly improved approach to drug development and the practice of medicine.

■ SB would build on the molecular biology revolution and elucidate the wiring diagrams (and their rules) buried in the data.

New data■ Yeast genome: 1996 (12Mb)■ Expression data: 1997 (10 years ago)■ Microarrays (expression, ChIP, tiling,..,)■ Human sequence 2002 (3.2Gb)■ Large-scale sequencing and resequencing

– Individuals– Metagenomics: sequence everything

■ Cellular and medical imaging■ …

Data growth, complex algorithms

■ Exceeds growth of CPU (Moore law)– Intel co-founder Gordon E. Moore in a 1965

■ Calculations can grow superlinearly

■ Combining different data sources inmore imaginative and complex ways

■ CPU, RAM, Disk, Net + programs!!!

Sequencing■ http://www.bio-itworld.com/issues/2007/may/cover-story/

■ Archon Genomics X Prize. This $10 million bounty will be awarded to the first group that can sequence 100 human genomes over ten days for $10,000 apiece. (The prize money was put up by Canadian geologist Stewart Blusson, who discovered the world’s third-largest diamond mine.)

■ “What you’re looking at right now is something in the $0.5-1 million genome range,” says Harkins. “We have so much more biology to uncover before we get to that $1000 level.”

■ James Watson personal genome (June’07; using 454)– 50 disease genes

■ Craig Venter personal genome – chromosomes from both mother and father sequenced (Sep’07)

Variations (e.g.):

A

C

SNPCNVrepeats

Individuals: small differences

Species in evolution: conserved biological function

Mis viib edasi?

■ Uued ideed!■ Robootika, laserid, foto, keemia,…■ Tehnoloogia miniaturiseerimine■ Eksperimentide “paralleliseerimine”■ Massiivsete andmehulkade

genereerimine■ => Arvutuslikud meetodid

Computer science and bioinformatics(Jacques Cohen, CACM 2005)

■ In barely half a century computer science has grown from infancy to maturity.

■ Computer scientists should be encouraged to learn biology and biologists computer scienceto prepare themselves for an intellectually stimulating and financially rewarding future in bioinformatics.

What's So Super About Supercomputers, Anyway?

Chronicle of Higher Education (09/21/07) Vol. 54, No. 4, P. A24; Carnevale, Dan

NSF symposium on "Cyber-Enabled Discovery and Innovation“ - how supercomputers could be used to solve many of the world's problems and why more computing resources have not been devoted to these problems.

Computers are still limited in the amount of information they can process

These shortcomings have led to researchers taking short cuts andfocusing on small samples of data instead of the big picture

Other speakers at the symposium said supercomputers are often limited because of computer scientists' ignorance of subjects other than computer science.

State University of New York at Buffalo professor of computer science and engineering Russ Miller says computer science departments need to recruit graduate students with undergraduate degrees in fields such as biology, physics, and humanities in order to broaden the scope of computer science.

■ Donald Knuth: Bioloogias jätkub relevantseid uurimisprobleeme veel vähemalt 500 aastaks

– Computer Literacy Interview With Donald Knuth By Dan Doernberg December 7th, 1993

Intel co-founder Gordon E. Moore (2007)

■ Moore: computing has changed the world so much that not only has it changed the industry itself, but also other sciences that now closely interact with PCs.

■ He cited biology and the life sciences as one area that has changed since the modern PC moved into the laboratory.

■ If he could do his career over again, Moore said, he might have studied biology.

■ (JV: Watson, Crick, Brenner: study the brain)

Communications of the ACM Volume 50, Number 7 (2007), Pages 13-18

The profession of IT: Computing is a natural science Peter J. Denning

■ By the 1980s, computation had become utterly indispensable in many fields. It had advanced from a tool to exploit existing knowledge to a means of discovering new knowledge.

■ Nobel Physics Laureate Ken Wilson was among the first to say that computation had become a third leg of science, joining the traditions of theory and experiment.

■ He and others coined the term "computational science" to refer to the search for new discoveries using computation as the main method.

■ This idea was so powerful that, in 1989, the U.S. Congress passed into law the High Performance Computing and Communication Initiative to stimulate technological advances through high-performance computation.

Computing is universal

■ Not only artificial■ Happens in nature!

■ Information processes and computation continue to be found abundantly in the deep structures of many fields. Computing is not---in fact, never was---a science only of the artificial.

BIIT membersPhD students■ Meelis Kull■ Jüri Reimand ■ Hedi Peterson ■ Marion Reuter ■ Jelena Zaitseva■ Darja Krushevskaja■ Liina Kamm■ Reina Käärik■ Priit Adler ■ Kristo TammeojaPostdoc■ Maarika Traat

MSc students■ Konstantin Tretjakov■ Raivo Kolde■ Jaanus Hansen ■ Jaanus Uri ■ Ilja Livenson■ Marko JõemetsBSc students■ Laur Tooming■ Hendrik Nigul■ Jürgen Jänes■ Gert Palok■ Aleksandr Tkatchenko

Mida me teeme?

■ Geenide regulatsioon– Signaalide otsimine (sekventsidest)– Masinõpe, andmekaevandus, klasterdus– Visualiseerimine, andmebaasid

■ Süsteemibioloogia: võrgustike struktuuri ennustamine

■ Haiguse geenide ja variantide otsimine

Main “products”

■ Algorithms – HappieClust; Trie*Tools, DiDASE, PWM Optimization; G-MAT, …

■ Tools: g:Profiler; MEM; KEGGAnim; Sally; TreeViewer; Motif discovery & viz.

■ Databases – BiGeR; COBRED; …■ Data Analysis: ATD, ENFIN,

FunGenES, COBRED, …

Kokkuvõte■ Bioinformaatika tegeleb paljude erinevate

bioloogiliste andmetega ning arvutuslike meetodite ja lahenduste pakkumisega

■ Bioinformaatika aitab lahti muukida bioloogia sügavamat olemust, edendada tervishoidu

■ Bioinformaatikas tekib palju uusi arvutiteaduse jaoks huvitavaid ülesandeid

Human resource

■ More expensive than CPU

■ Automatization of human labor…

■ MDA

■ http://www.eweek.com/article2/0,1895,2178319,00.asp?kc=EWKNLBOE090807STR1

■ "When graduates join organizations [after college] they are often shocked to realize they are dealing with limited resources, deadlines, fuzz requirements, requirements that change weekly, applications that scale, the use of frameworks and libraries, existing code—that may be bad code with bad design decisions, issues of interaction within and among teams, and having to develop code that is secure," Scherlis said.

■ Those are some of the challenges students are faced with that they may not have faced in school, he said. "And we are craftingresponses into the curriculum," Scherlis said. "We have to introduce our students to the mission of real engineering and collaboration."

■ http://www.itviikko.fi/page.php?page_id=46&news_id=200720229&rss=18

■ Tekes jakoi it-rahaa 70 miljoonaa■■ Kuva: Outi Järvinen/Taloussanomat■ 21.8. 15:32 Tekes jakoi tammi-kesäkuussa 297 miljoonaa euroa muun

muassa yliopistojen ja yritysten t&k-toimintaan ja innovointiin. Rahoituksen kysyntä kasvoi viime vuodesta.

■ Kuluneen puolivuotiskauden aikana tietoliikenne- ja elektroniikka-alanyrityksille myönnettiin 50,1 miljoonaa euroa eli selvästi eniten kaikistasektoreista. Ohjelmisto- ja digitaalisen median yritykset saivat 20,4 miljoonaa. Jälkimmäisessä rahoituksen kysyntä kasvoi selvästi.

■ Yrityksille heltisi kaikkiaan 179 miljoonaa euroa 911 rahoitetun projektinmuodossa.

■ Tekes huomioi, että hakemuksissa ja rahoitetuissa projekteissa oliselvästi aiempaa enemmän sen peräänkuuluttamaa kansainvälistäyhteistyötä.