1
Tae-Hyung Kim 1 Gil-Mi Ryu 1,2 [email protected] [email protected] InSong Koh 2 Jong Park 3 [email protected] [email protected] 1 Department of Bioinformatics, Bioinformatics Cooperative Course, Pusan National Un iversity, Pusan, Korea 2 Section of Bioinformatics, Central Genome Center, National Institute of Health, No kbun-Dong 5, Seoul, Korea 3 MRC-DUNN, Hills Road Cambridge CB2, 2XY, England, UK Figure 3. Ontological class ification based on methodol ogy. The methodology for D NA sequence determination c an be classified according to work procedure such as m apping, sequencing, assembl y, and searching. RNA analy sis is classified according to cDNA chip procedure resu lting in expression analysi s. Protein analysis methodo logy can be classified as comparative and predictive methods. 1 Introduction One of the major obstacle of bioinformatics is the difficulty in computation with literature informati on. Unlike sequence and structure, it is impossible to establish homology, similarity, interaction and function criteria for literature information. To ea se this problem, attempts to clarify the ontologica l problems have become bioinformatic projects. The idea of ontology is to define terms and concepts in a mechanical and computable units. The result will be clear classification and mapping of text element s for computers. We have applied this ontological a dvantage of classifying elements to the very bioinf ormatics field. This project has an important merit of efficient understanding and dissemination of bio informatics knowledge to this fast growing field. A ny intuitive classification system of bioinformatic s itself can provide us with valuable project ideas and future directions. There are three main compone nts of ontology of bioinformatics field: 1) classif ication based on methodology, 2) knowledge based cl assification (database systems) and 3) classificati on based on biological data types. These components overlap and they are different aspects of the same or similar information. However, depending on the u sers interest, the certain view can be more relevan t to design and organize a bioinformatics project Figure 4. Classification o f databases. Biological databases can b e classified according to the data features. The pop ular databases used in the biological community were included in this schematic map.. Figure 5. Classification a ccording to biological data types. According to t his classification map, bi ological data can be ident ified through prediction o f sequence structure and f unction. As information ac quired from data flows fro m right to left, it become s more and more clear. References [1] Patricia G. Baker, Carole A. Goble, Sean Bechho fer, Norman W. Paton, Robert Stevens, Andy Brass, An ontology for bioinformatics applications, Bioinf ormatics vol 15, no 6, 510-520, 1999 [2] Robert Stevens, Patricia Baker, Sean Bechhofer, Gary Ng, Alex Jacoby, Norman W. Paton, Carole A. G oble, Andy Brass, TAMBIS: Transparent Access to Mu ltiple Bioinformatics Information Sources, Bioinfor matics vol. 16 no. 2, 184-185, 2000 [3] Andreas D. Baxevanis, The Molecular Biology Dat abase Collection: an online compilation of relevan t database resources, Nucleic Acid Research, vol. 28. No. 1 , 2000 [4] The Gene Ontology Consortium, Gene ontology : T ool for the unification of biology, Nature America In c http://genetics.nature.com., nature genetics volume 25, 2000 2 Method and Results 2.1Classification based on methodology. We tried to classify bioinformatics field according to analysis method of biological data(DNA, RNA, Pr otein). In this way, bioinformatics can be underst ood intuitively through a schematic map. Figure 1. main wi ndow Figure 2. sub win dows 2.3 Classification based on biological data types We categorized the component fields of bioinformatics according to the implementation types used by the biologists after data acquisition. We differentiate them by the common procedures used and tools applied to the biological knowledge, which is a usual procedure carried out by biologists 3 Discussion In this classification of the components of bioinform atics, we introduced our ontology schema in classifyi ng and mapping the bioinformatics field itself. This ontological procedure was designed to represent the methodology, features of databases and data content. So it allows us to find projects and relate the probl em domain in bioinformatics in the much more systemat ic way. Also it can be used to cluster biological sequence data based on their bioinformatics ontology characteristics and it can provide us computation on the specific elements such as sequence and database. In addition, schematic maps are drawn to show a visual tree so that one can get the global picture on bioinformatics field, and obtain more precise information intuitively and efficiently. The lower levels of each classification criterion is linked to the web pages.(http://nihcgc.re.kr/BioinfoMap and htt p://interaction.mrc-dunn.cam.ac.uk/BioinfoMap/). The classification system is still being developed and wi ll be stored in an SQL based database for more dynami c navigation between different component concepts of bioinformatics field. Acknowledgement We thank Mi-Ae Yoo and Heui-Soo Kim(Pusan National Un iversity) for support. This work was funded in part b y the Bioinformatics Training Grant of Ministry of He alth & Welfare, Korea and supported by Pusan National University, Korea and MRC, UK 2.2 Knowledge based classification (database systems). These databases can be classified according to data features, thus classified as 1) sequence, 2) protein, 3) metabolic pathway, 4) organism and 5) RNA groups.

Tae-Hyung Kim 1 Gil-Mi Ryu 1,2 [email protected] [email protected] InSong Koh 2 Jong Park 3 [email protected] [email protected] 1

Embed Size (px)

Citation preview

Page 1: Tae-Hyung Kim 1 Gil-Mi Ryu 1,2 Kth2001@hyowon.pusan.ac.kr gmryu@hyowon.pusan.ac.kr InSong Koh 2 Jong Park 3 insong@nih.go.kr jong@mrc-dunn.cam.ac.uk 1

Tae-Hyung Kim1 Gil-Mi Ryu1,2

[email protected] [email protected] Koh2 Jong Park3

[email protected] [email protected]

1 Department of Bioinformatics, Bioinformatics Cooperative Course, Pusan National University, Pusan, Korea2 Section of Bioinformatics, Central Genome Center, National Institute of Health, Nokbun-Dong 5, Seoul, Korea3 MRC-DUNN, Hills Road Cambridge CB2, 2XY, England, UK

Figure 3. Ontological classification based on methodology. The methodology for DNA sequence determination can be classified according to work procedure such as mapping, sequencing, assembly, and searching. RNA analysis is classified according to cDNA chip procedure resulting in expression analysis. Protein analysis methodology can be classified as comparative and predictive methods.

1 Introduction

One of the major obstacle of bioinformatics is the difficulty in computation with literature information. Unlike sequence and structure, it is impossible to establish homology, similarity, interaction and function criteria for literature information. To ease this problem, attempts to clarify the ontological problems have become bioinformatic projects. The idea of ontology is to define terms and concepts in a mechanical and computable units. The result will be clear classification and mapping of text elements for computers. We have applied this ontological advantage of classifying elements to the very bioinformatics field. This project has an important merit of efficient understanding and dissemination of bioinformatics knowledge to this fast growing field. Any intuitive classification system of bioinformatics itself can provide us with valuable project ideas and future directions. There are three main components of ontology of bioinformatics field: 1) classification based on methodology, 2) knowledge based classification (database systems) and 3) classification based on biological data types. These components overlap and they are different aspects of the same or similar information. However, depending on the users interest, the certain view can be more relevant to design and organize a bioinformatics project

Figure 4. Classification of databases.Biological databases can be classified according to the data features. The popular databases used in the biological community were included in this schematic map..

Figure 5. Classification according to biological data types. According to this classification map, biological data can be identified through prediction of sequence structure and function. As information acquired from data flows from right to left, it becomes more and more clear.

References

[1] Patricia G. Baker, Carole A. Goble, Sean Bechhofer, Norman W. Paton, Robert Stevens, Andy Brass, An ontology for bioinformatics applications, Bioinformatics vol 15, no 6, 510-520, 1999

[2] Robert Stevens, Patricia Baker, Sean Bechhofer, Gary Ng, Alex Jacoby, Norman W. Paton, Carole A. Goble, Andy Brass, TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources, Bioinformatics vol. 16 no. 2, 184-185, 2000

[3] Andreas D. Baxevanis, The Molecular Biology Database Collection: an online compilation of relevant database resources, Nucleic Acid Research, vol. 28. No. 1, 2000

[4] The Gene Ontology Consortium, Gene ontology : Tool for the unification of biology, Nature America Inc

http://genetics.nature.com., nature genetics volume 25, 2000

2 Method and Results

2.1Classification based on methodology.We tried to classify bioinformatics field according to analysis method of biol

ogical data(DNA, RNA, Protein). In this way, bioinformatics can be understood intuitively through a schematic map.

Figure 1. main window Figure 2. sub windows

2.3 Classification based on biological data typesWe categorized the component fields of bioinformatics according to the implementation types used by the biologists after data acquisition. We differentiate them by the common procedures used and tools applied to the biological knowledge, which is a usual procedure carried out by biologists

3 Discussion

In this classification of the components of bioinformatics, we introduced our ontology schema in classifying and mapping the bioinformatics field itself. This ontological procedure was designed to represent the methodology, features of databases and data content. So it allows us to find projects and relate the problem domain in bioinformatics in the much more systematic way. Also it can be used to cluster biological sequence data based on their bioinformatics ontology characteristics and it can provide us computation on the specific elements such as sequence and database. In addition, schematic maps are drawn to show a visual tree so that one can get the global picture on bioinformatics field, and obtain more precise information intuitively and efficiently. The lower levels of each classification criterion is linked to the web pages.(http://nihcgc.re.kr/BioinfoMap and http://interaction.mrc-dunn.cam.ac.uk/BioinfoMap/). The classification system is still being developed and will be stored in an SQL based database for more dynamic navigation between different component concepts of bioinformatics field.

Acknowledgement

We thank Mi-Ae Yoo and Heui-Soo Kim(Pusan National University) for support. This work was funded in part by the Bioinformatics Training Grant of Ministry of Health & Welfare, Korea and supported by Pusan National University, Korea and MRC, UK

2.2 Knowledge based classification (database systems).These databases can be classified according to data features, thus classified as 1) sequence, 2) protein, 3) metabolic pathway, 4) organism and 5) RNA groups.