49
Ontology Ontology HowNet HowNet 和和和 和和和 和和 和和 [email protected] [email protected] dongqiang dongqiang @keenage.com @keenage.com www.keenage.com www.keenage.com Research Centre of Computer & Language Engineering Research Centre of Computer & Language Engineering Chinese Academy of Sciences Chinese Academy of Sciences 和和和 和和和 2003.08 2003.08

Ontology 和 HowNet 董振东 董强 [email protected] [email protected] [email protected] [email protected] Research Centre of Computer & Language

Embed Size (px)

Citation preview

Ontology Ontology 和 和 HowNetHowNet

董振东董振东 董强董强

[email protected]@keenage.com dongqiang [email protected]@keenage.com www.keenage.comwww.keenage.com

Research Centre of Computer & Language EngineeringResearch Centre of Computer & Language Engineering

Chinese Academy of SciencesChinese Academy of Sciences

哈尔滨哈尔滨2003.082003.08

提纲

Ontology

HowNet vs SUMO/WordNet/VerbNet

Ontology

什么是 Ontology

Ontology 与 IT/NLP

什么是 Ontology

Ontology 是学问 Ontology 是资源

Ontology 是学问 哲学上的 Ontology AI/KR 上的 Ontology 数学上的 Ontology 软件工程上的 Ontology 语言学上的 Ontology IT 上的 Ontology

Ontology 定义涉及的问题

内在的涵义 外在的表示 作为术语的中文翻译

Ontology 与 IT/NLP similar to a dictionary or glossary, but with greater

detail and structure that enables computers to process its content. An ontology consists of a set of concepts, axioms, and relationships that describe a domain of interest. An upper ontology is limited to concepts that are meta, generic, abstract and philosophical …

-- Standard Upper Ontology (SUO) Working

Group

是一个以汉语和英语的词语所代表的概念为描述对象,以揭示概念与概念之间以及概念所具有的属性之间的关系为基本内容的常识知识库。

-- 《知网》

典型的 Ontology Cyc: http:// www.cyc.com IFF: The IFF Foundation Ontology WordNet: http://www.cogsci.princeton.edu EuroWordNet: http: //www.hum.uva.nl/ewn/ HowNet: http://www.keenage.com SUMO: http://ontology.teknowledge.com EDR: http://www.iijnet.or.jp VerbNet: http://www.cis.upenn.edu/verbnet/ Prototype(sinica):

http://ckip.iis.sinica.edu.tw/CKIP/ontology/

HowNet vs SUMO/WordNet/VerbNet

SUMO –

Suggested Upper Merged Ontology

Mapping WordNet to SUMO

SUMO – Suggested Upper Merged Ontology

SUMO Sources

SUMO Subclass Hierarchy Tree

SUMO Subclass Hierarchy Treemaking constructing manufacture publication cooking searching pursuing investigating diagnostic process social interaction change of possession giving unilateral giving lending getting unilateral getting borrowing

Motivation for Mapping

How can a formal ontology be used effectively by those who lack extensive training in logic and mathematics?

How can an ontology be used automatically by

applications?

How can we know when an ontology is complete?

《知网》的架构

Basic Data

(Concept Definitions / Taxonomies)

S-relation Trigger

(Browser)

D-relation Trigger

(Application Tools)

Basic Data – Sememes

Sememes 2219

Entity 154

thing (physical, mental, fact)

component (part, fitting)

time

space (direction, location)

Event (relation, state 、 action) 818

Attribute 248

Value 892

Secondary feature 107

Basic Data – Concept Definition

NO.=020957

W_C= 大学生G_C=N

E_C=

W_E=college student

G_E=N

E_E=

DEF={human| 人 :{study| 学习 :agent={~},location={InstitutePlace|场 所 :domain={education| 教 育 },modifier={HighRank| 高 等 },{study| 学习 :location={~}},{teach| 教 :location={~}}}}}

Basic Data – Taxonomies- {thing| 万物 } {entity| 实体 :{ExistAppear| 存现 :existent={~}}}

- {physical| 物质 } {thing| 万物 :HostOf={Appearance| 外观 }, {perception| 感知 :content={~}}}

- {animate| 生物 } {physical| 物质 :HostOf={Age| 年龄 }, {alive| 活着 :experiencer={~}},{die| 死 : experiencer={~}},

{metabolize| 代谢 : experiencer={~}}, {reproduce| 生殖 :agent={~},PatientProduct={~}}}

- {AnimalHuman| 动物 } {animate| 生物 :HostOf={Sex| 性别 }, {AlterLocation| 变空间位置 :agent={~}},{StateMental| 精神

状态 :experiencer={~}}}

- {human| 人 } {AnimalHuman| 动物 :HostOf={Name| 姓名 } {Wisdom| 智慧 }{Ability| 能力 }, {think| 思考 :agent={~}},{speak| 说 :agent={~}}}

S-relation Trigger -- Browser

D-relation Trigger -- Application Tools

Relevant Concept Field Builder ( 相关概念场构造器 ) Cf. “seed list” Bonnie Dorr & Tiejun Zhao: “ 化学” /“ 射

击”

Sense Similarity Calculator ( 语义相似度计算器 )

“ 毛衣” Vs“ 手套” /“ 醋”

Chinese Chunk Extractor ( 中文语块抽取器 )

知网在海内外的应用 ( 1 )

Semantic Web ontology annotation thesaurus

陈文鋕 : Semantic Processing && Semantic Web Service

( 台湾财团法人资讯工业策进会 )

Named Entity Recognition

Tianfang Yao, Wei Ding, Gregor Erbach: CHINERS: A

Chinese Named Entity Recognition System for the

Sports Domain

知网在海内外的应用 ( 2 )

Word Sense Disambiguation Chi-Yung Wang: Knowledge-based Sense Pruning using the

HowNet: an Alternative to Word Sense

Disambiguation

Wong Ping Wai: A Maximum Entropy Approach to HowNet-

Based Chinese Word sense Disambiguation

Word Similarity Computing

Liu Qun Li Su Jian: Word Similarity Computing Based on

HowNet

知网在海内外的应用 ( 3 ) Sense Annotation

Dependency Relation Annotation Li MingQin, LI Juanzi : Building A Large Chinese Corpus

Annotated with Semantic Dependency

Cross-language Developing 授权给台湾中央研究院资讯所合作开发 HowNet Big5+ 版

数位典藏国家型计划( NDAP )http://ndap.org.tw/NewsLetter/content.html?subuid=559&uid=26

Thank youThank you

当前研究的趋势

理论或哲学上的探索 做 mapping 、 linking 、 merging 在应用中研究 建设常识性的或专门领域的知识体系

关于建设知识体系的一些看法

理论与工程的关系 – 把工程放在首位 研究与应用的关系 – 着眼于应用 分清什么是接轨和什么是“接鬼”

五年前有人建议我们把知网改成 WordNet 最近有人建议我们按 SUMO 来改知网的义原 把知网这件旗袍改成两件套的西服裙 – 就是接鬼

Chinese WordNet or English Hownet?在中文方面,也已有了一个类似词汇网路的资源,叫做《知网》( HowNet, http://www.keenage.com )。由大陆的董振东先

生在1995 年自力着手进行。它是中英/英中的一个双语词汇网路。早期版是开放不用收费的。 2002起新版改由中国科学院软件所管理后,就需要付费使用了。《知网》做法的特色是独树一帜;不采用英文词汇网路的架构只要采取他自己的架构。而且他先把世界知识本体做个定义,在这定义里再去做区分。这个由上而下的方法,与英语与欧语词汇网路由下而上的方法不同,当然有其可取之处。可惜的是,由于当年资源与讯息的限制,董振东教授与它的儿子董强,基本上是凭着信念与热诚完成《知网》的,过程中绝少外界的奥援,也并未与世界相关的研究接轨。他跟他儿子花了约有七、八年的功夫来做这个事。但是,基本上跟其他语言的词汇网路连接,并无架构上的基础,而其上层知识分类,也是两人的自由心证,不能说错,却也缺乏理论的基础,面临一些其他系统互通性( inter-

operability )的问题。

Records in WordNet / HowNet

Record in WordNet 03592879 06 n 02 watch 0 ticker 1 012 @ 03506835 n 0000 ~ 02187181

n 0000 %p 02529205 n 0000 ~ 02570752 n 0000 %p 02659936 n 0000 ~ 02841320 n 0000 %p 03021820 n 0000 ~ 03104263 n 0000 ~ 03150171 n 0000 ~ 03410656 n 0000 %p 03593482 n 0000 ~ 03636122 n 0000 | a small portable timepiece

Record in HowNetNO.=007738W_C= 表G_C=NE_C= 手 ~ ,怀 ~ ,钟 ~ ,电子 ~ ,机械 ~ ,带钻石的 ~ ,这块 ~ 不防水

W_E=watchG_E=NE_E=DEF={tool| 用具 :{tell|告诉 :content={time|时间 },instrument={~}}}

Axiom in SUMO / HowNet (1)

See SUMO_buy.doc

Cf. HowNet Event Relation & Role shifting

{buy|买 } <----> {obtain|得到 } [consequence];

agent OF {buy|买 }=possessor OF {obtain|得到 };

possession OF {buy|买 }=possession OF {obtain|得到 }.

{buy|买 } (X) <----> {sell|卖 } (Y) [mutual implication];

agent OF {buy|买 }=target OF {sell|卖 };

source OF {buy|买 }=agent OF {sell|卖 };

possession OF {buy|买 }=possession OF {sell|卖 };

cost OF {buy|买 }=cost OF {sell|卖 }.

Axiom in SUMO / HowNet (2)

{buy|买 } [entailment] <----> {choose|选择 };

agent OF {buy|买 }=agent OF {choose|选择 };

possession OF {buy|买 }=content OF {choose|选择 };

source OF {buy|买 }=location OF {choose|选择 }.

{buy|买 } [entailment] <----> {pay|付 };

agent OF {buy|买 }=agent OF {pay|付 };

cost OF {buy|买 }=possession OF {pay|付 };

source OF {buy|买 }=taget OF {pay|付 }.

Thematic Roles in VerbNet / HowNetSee VerbNet_buy.doc

Thematic Roles Agent[+animate OR +organization]Asset[+currency]Beneficiary[+animate OR +organization]Source[+concrete]Theme[]

 

Cf. HowNet Event Role with Typical Actors

│ ├ {buy|买 } {take| 取 :agent={human| 人 }{group|群体 ->}, possession={artifact| 人工物 ->},source={human|

人 } {InstitutePlace| 场所 },cost={money|货币 }, beneficiary={human| 人 }{group|群体 ->},

domain={economy|经济 }}

Components of HowNet

Taxonomy (义原层级规范) Roles and Features (角色与特征规范) Specifications of KDML (知识描述语言规范) Knowledge Database (知识库) Event Relations & Role Shifting

(事件关系与角色转换) Maintenance Tools (维护管理工具) APIs (应用接口)

Nature of HowNet

An online knowledge-base which reveals

the relationship among concepts, and the

relationship among attributes of concepts

-- Dong Zhendong, "Knowledge Description: What, How and who?", Proceedings of International Symposium on Electronic Dictionary, Tokyo, 1988, p.18

Theory of HowNet

Knowledge is a system of relationships among

concepts and among attributes of concepts

Everything is constantly changing in a specific

time and space, and converts from one state to another. The conversion embodies the change of its attributes

Guidelines of Design

Computer-oriented Relationship is the key; to reveal the

relationship is the main objective of HowNet Based on sememes Use of KDML Defining concepts in a static & isolate way Relationship is activated in a dynamic way

Concept Definitions in HowNet (1)

医生: DEF={human| 人 :domain={medical|医 }, HostOf={Occupation|职位 },{doctor| 医治 : agent={~}}}

患者: DEF={human| 人 :domain={medical|医 }, {SufferFrom|罹患 :experiencer={~}}, {doctor|医治 :patient={~}}}

医院: DEF={InstitutePlace| 场所 :{doctor|医治 : location={~},content={disease|疾病 }}, domain={medical|医 }}

Concept Definitions in HowNet (2)

病历: DEF={document| 文书 :{record|记录 : content={disease|疾病 },LocationFin={~}}, domain={medical|医 }}

健康: DEF={Health|健康 : host={AnimalHuman| 动物 }}

多病: DEF={unhealthy| 不健 }

│ │ ├ {HealthValue|健康值 } │ │ │ ├ {healthy|康健 } │ │ │ └ {unhealthy| 不健 }

Concept Definitions in HowNet (3)

病: {disease|疾病 } {phenomena| 现象 :

{doctor|医治 :content={~}},{SufferFrom|罹患 :content={~}},RelateTo={medicine|药物 }

{Health|健康 }{HealthValue|健康值 },

domain={medical|医 }}

药: {medicine|药物 } {artifact| 人工物 :{doctor|医治 :instrument={~}},RelateTo={disease|疾病 },

domain={medical|医 }{chemistry| 化学 }}

Identity of description in differentlanguage structures (1)

W_C=劫 W_C=飞机G_C=V G_C=N

E_C= E_C=

W_E=rob W_E=plane

G_E=V G_E=N

E_E= E_E=

DEF={rob|抢 } DEF={aircraft|飞行器 }

Identity of description in differentlanguage structures (2)

W_C=劫机G_C=V

E_C=

W_E=hijack a plane

G_E=V

E_E=

DEF={rob|抢 :possession={aircraft|飞行器 }}

Identity of description in differentlanguage structures (3)

W_C=劫机犯G_C=NE_C=W_E=hijackerG_E=NE_E=DEF={human| 人 :{rob|抢 :agent={~},

possession={aircraft|飞行器 }}}

Identity of description in differentlanguage structures (4)

W_C=抓获劫机犯G_C=V

E_C=

W_E=catch a hijacker

G_E=V

E_E=

DEF={catch|捉住 :patient={human| 人 :

{rob|抢 :agent={~},

possession={wealth|钱财 }}}}

Identity of description in differentlanguage structures (1)

W_C=机敏地抓获女劫机犯G_C=V

E_C=

W_E=catch a woman hijacker cleverly

G_E=V

E_E=

DEF={catch|捉住 :manner={clever| 灵 },

patient={human| 人 :{rob|抢 :agent={~},

possession={wealth|钱财 }},

modifier={female|女 }}}

Applications of HowNet

1. Semantic tagging

2. WSD , Sense Pruning

3. Sensitive information detection

4. Information filtering

5. Similarity of words

6. Semantic Web

7. Match of WordNet

Future work

Construction of resouces English HowNet Chinese message structure bank Increase of languages

Developing more APIs and tools Administration

Membership

Ontology 定义的附录 (1)

a specification of a conceptualization

the theory of objects and their ties similar to a dictionary or glossary, but with greater

detail and structure that enables computers to process its content. An ontology consists of a set of concepts, axioms, and relationships that describe a domain of interest. An upper ontology is limited to concepts that are meta, generic, abstract and philosophical …

Ontology 定义的附录 (2)

the study of what there is, an inventory of what exists

…What we may call ontology is the attempt to say what entities exist. Metaphysics, by contrast, is the attempt to say, of those entities, what they are.

the study of the categories of things that exist or may exist in some domain

The word ontology comes from the Greek ontos for being and logos for word.

Cost for French in EuroWordNetFor the development of French language, here were 2 partners:

Avignon (AVI) and Memodata (MEM). The following was requested :

                                            AVI        MEMPersonnel                             72000     85000Equipment                             3000             0Travel & assistance               5000         1500Consumables & computing    3000           300Overheads                          16600       17100Total                                   99600    104400

Since Memodata was a private company, only50% of its request could be funded by the EC. So the total of the request was:                                           

AVI        MEMTotal                                   99600      52200

Notes: 1) validation is not included in this table. This has be done by Xerox and Bertin globallyfor several languages.

2) These amounts constitued a previsional budget corresponding to some

20 000 synsets.

Demo of Tools

(1) Relevant Concept Field

(2) Similarity of Words

(3) Chinese Chunk Extractor

(4) Smart Word finder

Overview of HowNet

Components of HowNet Nature of HowNet Theory of HowNet Guidelines of Design Sememes and Relations

需要的备用文件

HowNet Browser ( 桌面 )

Relevant concept field ( 桌面 ) – “ 行”Similarity computing ( 桌面 ) – 数位典藏计划 (目录“ ontology” )

Prof. Huang’s comment on HowNet ( 桌面 )

U32 下: Taxonomy Event Relation & Role Shifting

Taxonomy Typical Actors

Papers (Applications about HowNet)