50
生命科学のための情報統合と テキストマイニング Junichi TSUJII University of Tokyo, Japan University of Manchester UK National Centre for Text Mining, UK © 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

生命科学のための情報統合とテキストマイニング

Junichi TSUJIIUniversity of Tokyo, JapanUniversity of Manchester

UK National Centre for Text Mining, UK 

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 2: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

• 背景

• 生命科学での情報統合

• テキスト処理と情報統合

• 研究の一例

• 今後の課題

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 3: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

• 背景

• 生命科学での情報統合

• テキスト処理と情報統合

• 研究の一例

• 今後の課題

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 4: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

4

Increments 

:accumulation

Increase in Medline

0

100,000

200,000

300,000

400,000

500,000

600,000

increm

ents

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

accumulation

G‐protein coupled receptor

Before 19889 papers

1992256 papers2005

14,000 papers

MEDLINE alone

More than 0.5 million per year More than 1.3 thousand per day

Articles added

Medline Access

1997: 0.163 M accesses/month2006: 82.027 M accesses/month

[D.L.Banville 2006]

500 times more

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 5: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

5

NaCTeMwww.nactem.ac.uk

• First such centre in the world • Funding: JISC, BBSRC, EPSRC• Consortium investment

• Chair in TM (Prof. J. Tsujii, Univ. Tokyo)

• Location: Manchester Interdisciplinary Biocentre (MIB) www.mib.ac.uk funded by the Wellcome Trust

• Initial focus: biomedical academic community• Extend services to industry• Extend focus to other domains (social sciences) 

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 6: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Semantic Web

Tim Berners‐Lee

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 7: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

The Semantic Web is an extension of the current web in which information is given well‐defined meaning, better enabling computers and people to work in cooperation.

‐‐ Tim Berners‐Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001 

Autonomous Processing of Meaning by Agents:

The Semantic Web will bring structure to the meaningful content ofweb pages, creating an environment where software agents roamingfrom page to page can readily carry out sophisticated tasks for users.

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 8: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

The Semantic Web is an extension of the current web in which information is given well‐defined meaning, better enabling computers and people to work in cooperation.

‐‐ Tim Berners‐Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001 

Expressing Meanings explicitly:

Human language thrives when using the same term to mean somewhat different things, but automation does not.

Using a different URI – Universal Resource Identifier – for each specificconcept solves that problem. An address that is a mailing address can be distinguished from one that is a street address, and both can bedistinguished from an address that is a speech.

Concept ID

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 9: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

The Semantic Web is an extension of the current web in which information is given well‐defined meaning, better enabling computers and people to work in cooperation.

‐‐ Tim Berners‐Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001 

Ontologies:

A program that wants to compare or combine information acrossthe two databases has to know that these two terms are being usedto mean the same thing.

The most typical kind of ontology for the Web has a taxonomyand a set of inference rules. 

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 10: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

• 背景

• 生命科学での情報統合

• テキスト処理と情報統合

• 研究の一例

• 今後の課題

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 11: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Entity, Concept, Ontologyを使った情報統合

ー生命科学からの例ー

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 12: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Normalization of entitiesSurface named entities are mapped to unique IDs in ontology

Named Entity recognition + Disambiguation

MEDIE生命事象に基づく検索システム

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 13: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 14: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 15: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 16: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Normalization of entitiesSurface named entities are mapped to unique IDs in ontology

Named Entity recognition + Disambiguation

MEDIE生命事象に基づく検索システム

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 17: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 18: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 19: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 20: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

• 背景

• 生命科学での情報統合

• テキスト処理と情報統合

• 研究の一例

• 今後の課題

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 21: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

テキスト処理の必要性

電子文書の急激な増大と使用の一般化– PubMedから電子ジャーナルのフルペーパへ– 機関アーカイブ、電子カルテなど論文以外のテキスト– 論文とサプリメント・データ– 生データとメタデータ、テキスト

• 膨大な人手による作業(Curation)の軽減• 粒度の細かな情報統合

• テキスト以外の構造化データ、実験データとの統合

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 22: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Semantics‐based, Fine‐Grained Information Access

Document Retrieval, Information Retrieval– Unit of retrieval : Article, Document

– Expression of User Intention: • Controlled or non‐controlled keywords

– Indexes: character sequences, keywords

Question Answering

Semantics‐based,  Fine‐Grained Information Access system

Unit of retrieval : paragraphs, sentences, phrasesExpression of user intention:  Simple but semantically enriched Indexes:  Semantics‐based structured meta‐data

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 23: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Coarse‐grained text retrieval

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 24: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Fine‐grained information access

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 25: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

EXAMPLE:PATHTEXTNACTEM (U‐MANCHESTER), U‐TOKYO, SBI

B.Kemper,T.Matsuzaki,Y.Matsuoka,Y.Tsuruoka,H.Kitano,S. Ananiadou, J.Tsujii :PathText: a text mining integrator for biological pathway visualizations, Bioinformatics, Vol.26 (12), Oxford University Press, 2010

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 26: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Toll‐Like Receptor (TLR) pathway

Oda K, Matsuoka Y, Funahashi A, Kitano H: A comprehensive pathway map of epidermal growth factor receptor signaling. Mol Syst Biol 2005, 1:2005 

0010.

Nodes : 652 

Links:  444

600 papers were read to

construct the pathway

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 27: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Knowledge Integration Pathways and Literature

Pathways integrate biological knowledge pieces into coherent interpretations

Pathways have been recognized as important means of representing biological knowledge.

Medline contains over 18 million articles

More than 0.5 million articles are being added every year, which means 1.3 thousand articles per day   

Pathways

Literature

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 28: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Pathways and Literature• Pathways construction and literature

– Pathway construction mostly relies on literature• Most important discoveries are reported by paper publications.

• The full context of each discovery is described by  the paper reporting it.

• Pathway maintenance and literature– New discovery should lead to revisions of the relevant portions of pathways.

– However, rapidly growing amount of literature makes it extremely difficult to identify relevant new discoveries.

PathTextNaCTeM, U‐Tokyo, SBI

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 29: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Network for Simulation : Quantitative Model

SBML 

Pathway : Qualitative Model

Cell Designer

Literature : Piecewise Knowledge

Interpretation, Abstraction

Enrichment, Grounding

University of TokyoNaCTeM/University of ManchesterSystems Biology Institute/OIST 

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 30: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

IKK IKK_p

TAK1

SBML  Network Network by Cell Designer 

Text Mining Resources

KLEIO

MEDIEInfo‐

Pubmed

FACTA

GUI

Visualization

Kineticparameters

Textual SemanticsUser Semantics

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 31: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 32: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 33: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 34: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 35: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 36: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

• 背景

• 生命科学での情報統合

• テキスト処理と情報統合

• 研究の一例

• 今後の課題

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 37: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

(214)

(287)

(494)

(4)

(567)

(10,411 / 1,250)

(1,568 / 2)

(2,448)

(6,030 / 114)

(464 / 32)

(26)

(5)

(84)

(343)

(3,633)

(671 / 58)

(154)

(415 / 28)

(2)

(1)

(326)

(12)

(40)

(0)

(6)

(1,122)

(44)

(683)

(632 / 388)

(244)

(567)

(1,733)

(21,616 / 4,552)

(4,712)

(12,352)

GENIA event ontology

• GENIA event ontology– 30 GO terms 

under Biological Process

– Regulation

• Regulatory events

• Causal relationship

– Artificial process (experimental)• Artificially performed processes.

• E.g. Transfection, treatment, …

– Correlation (experimental)

• meaning ‘any’ relation between events.

Events of the Shared Tasks(BioNLP 09)

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 38: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

EvaluationBioNLP 2009 Shared Task Data

• BioNLP ST 2009 evaluation server

Top system at the 2009 evaluation campaign

Our current system

Simple 70.21 72.91

Binding 44.41 51.63

Regulation 40.11 44.00

ALL 51.95 55.96

24 teams joined the campaign. The performances of the other systems were less than 45.00.

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 39: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

39

S1 = We found that Y activates the expression of XS2 = We examined the effect of Y on expression of XS3 = These results suggest that Y has no effect on

expression of XS4 = Y is known to increase expression of XS5 = Addition of Y slightly increased the expression of XS6 = These results suggest that Y might affect the

expression of X

The same events

Nawaz, R., Thompson, P. and Ananiadou, S.. (2010). Evaluating a meta‐knowledge annotation scheme for bio‐events. In: Proceedings ofthe Workshop on Negation and Speculation in Natural Language Processing, pp. 69‐‐77

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 40: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Annotation Scheme

10 July 2010Evaluating a Meta‐Knowledge Annotation 

Scheme for Bio‐Events40

Class / Type(Grounded to an event ontology)

Bio‐Event(Centred on an Event 

Trigger)

Knowledge Type• Investigation• Observation• Analysis• General

Manner• High• Low• Neutral

Certainty Level•L3•L2•L1

Hyper‐Dimensions1)New Knowledge (Yes/No)

2) Hypothesis (Yes/No)

Polarity• Negative• Positive

Source• Other• Current

Participants• Theme(s)• Actor(s)

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 41: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Annotation Scheme

10 July 2010Evaluating a Meta‐Knowledge Annotation 

Scheme for Bio‐Events41

Class / Type(Grounded to an event ontology)

Bio‐Event(Centred on an Event 

Trigger)

Knowledge Type• Investigation• Observation• Analysis • General

Participants• Theme(s)• Actor(s)

examinedinvestigatedstudied

foundobservedreport

(past tense)

suggestindicateconclude

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 42: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Annotation Scheme

10 July 2010Evaluating a Meta‐Knowledge Annotation 

Scheme for Bio‐Events42

Class / Type(Grounded to an event ontology)

Bio‐Event(Centred on an Event 

Trigger)

Polarity• Negative• Positive

Participants• Theme(s)• Actor(s)

nonot

fail, lack, unableindependent exception

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 43: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

BioNLP 2011 Shared Task• Scientific Committee

Jun’ichi Tsujii (Univ. Tokyo, NaCTeM) - ChairSophia Ananiadou (NaCTeM, Manchester)Kevin Cohen (Corolado)Claire Nedellec (INRA)Andrey Rzhetsky (Univ. Chicago)Bruno Sobral (Virginia Bioinformatics Inst.)Tapio Salakoski (Univ. Turku)Toshihisa Takagi (DBCLS)

• Organizing CommitteeJin-Dong Kim (DBCLS) -ChairSampo Pyysalo (Univ. Tokyo) -ChairTomoko Ohta (Univ. Tokyo)Robert Bossy (INRA)Chunhong Mao (Virginia Bioinformatics Inst.)Dan Sullivan (Virginia Bioinformatics Inst.)Rafal Rak (NaCTeM, Manchester)Nguyen Luu Thuy Ngan (Univ. Tokyo)

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 44: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Main TaskEpigenetics and post‐translational modifications

U‐Tokyo • Basic task setting and data following BioNLP'09 shared task format

• DNA modification and PTM events similar to '09 Phosphorylation events• Existing retrainable systems can be applied with little modification

• New event types: DNA methylation, six PTM types, reverse reactions (e.g. deacetylation) and catalysis: 15 event types in total

• New PTM‐specific participant roles (optional subtask) • Side chain attached to proteins in Glycosylation• Context gene affected by histone modifications

• Annotation for PubMed abstracts relevant to these events• No further subdomain restrictions, data selected to avoid biasRepresentative of general distribution of epigenetics and PTM‐related 

publications in the whole literature

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 45: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Main Task Epigenetics and post‐translational modifications

• Epigenetic control of gene expression without changes in DNA sequence major focus of recent study

• Key events DNA methylation and histone post‐translational modifications (acetylation and methylation) • Important roles in many biological processes, implicated in cancer

• Phosphorylation, a protein post‐translational modification (PTM), most reliably extracted event at the BioNLP'09 shared task

• 76% F‐score for extraction of phosphorylated protein and site

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 46: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Main Task : Infectious DiseasesNaCTeM, U‐Tokyo, Virginia Tech

• Task setup and core events following BioNLP'09 Shared Task• Expression, Catabolism, Localization, Binding, etc.• New event type: Process

• High‐level biological processes such as “virulence” frequently discussed without stating specific participants (e.g. Theme)

• New entity types (given, NER not required) • Chemical, Organism, Two‐component system

• New subtask (optional) • Identification of environmental variables (Acidity and Temperature) specifying the conditions in which events are stated to occur 

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 47: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

• 背景

• 生命科学での情報統合

• テキスト処理と情報統合

• 研究の一例

• 今後の課題

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 48: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Normalization of eventSTAT protein nuclear translocation (GO:0007262)

In the training set (800 abstracts), there are no occurrences of “STAT protein nuclear translocation”.  However, one  found 10 occurrences of this concept.

• nuclear translocation of STAT6• nuclear translocation of the latent transcription factor, STAT6• nuclear translocation of STAT6• translocation into nucleus of signal transducers and activators of transcription (STAT)

• STAT5A and STAT5B containing complexes . . . these complexes rapidly translocated(within 1 min) into the nucleus

• STAT5B containing complexes . . . these complexes rapidly translocated (within 1 min) into the nucleus

• STAT1 nuclear import• nuclear import of NF‐kappa B, AP‐1, NFAT, and STAT1

• STAT1 in Jurkat T lymphocytes is significantly inhibited by a cell‐permeable peptide carrying the NLS of the NF‐kappa B p50 subunit.  NLS peptide‐mediated disruption of the nuclear import ...

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 49: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

MEDLINE →Medie Workflow

• Input: All abstracts in PubMED10,630,000 abstracts19,950,000  bibliographic units 

• Processing: POS tagging, NERs, Deep parsing, Event recognition, Indexing for MEDIE (GCL)

• Complex workflowMore than 10 modules

• Computing EnvironmentGRID with 300‐1000 processors

MEDLINE→Medie workflow

Processing modules

Index files

Intermediate processing results

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本

Page 50: Deep parsing and Semantic search - biosciencedbc.jpGENIA event ontology • GENIA event ontology – 30 GO terms under Biological Process – Regulation ... BioNLP 2011 Shared Task

Thank you !

[U‐Tokyo] Yusuke Miyao (NII) , Takuya Matsuzaki, Tomoko Ohta, Jin‐Dong Kim (DBCLS), Rune Saetre, Yoshinobu Kano, Naoaki Okazaki, Makoto Miwa, Sampo Pyysalo, Tadayoshi Hara, Yue Wang

[NaCTeM]Sophia Ananiadou, John McNaught, William Blak, Balakrishna Kolluru, Tingting Mu,Chikashi Nobata, Rafal Rak, Angel Restificar, C.J. Rupp, Paul Thompson, Xinglong Wang,Rahead Nawaz

© 2010 辻井 潤一 (東大・マンチェスター大) licensed by CC表示2.1日本