Upload
amber-cain
View
334
Download
0
Embed Size (px)
Citation preview
Ontology Ontology 和 和 HowNetHowNet
董振东董振东 董强董强
[email protected]@keenage.com dongqiang [email protected]@keenage.com www.keenage.comwww.keenage.com
Research Centre of Computer & Language EngineeringResearch Centre of Computer & Language Engineering
Chinese Academy of SciencesChinese Academy of Sciences
哈尔滨哈尔滨2003.082003.08
Ontology 是学问 哲学上的 Ontology AI/KR 上的 Ontology 数学上的 Ontology 软件工程上的 Ontology 语言学上的 Ontology IT 上的 Ontology
Ontology 与 IT/NLP similar to a dictionary or glossary, but with greater
detail and structure that enables computers to process its content. An ontology consists of a set of concepts, axioms, and relationships that describe a domain of interest. An upper ontology is limited to concepts that are meta, generic, abstract and philosophical …
-- Standard Upper Ontology (SUO) Working
Group
是一个以汉语和英语的词语所代表的概念为描述对象,以揭示概念与概念之间以及概念所具有的属性之间的关系为基本内容的常识知识库。
-- 《知网》
典型的 Ontology Cyc: http:// www.cyc.com IFF: The IFF Foundation Ontology WordNet: http://www.cogsci.princeton.edu EuroWordNet: http: //www.hum.uva.nl/ewn/ HowNet: http://www.keenage.com SUMO: http://ontology.teknowledge.com EDR: http://www.iijnet.or.jp VerbNet: http://www.cis.upenn.edu/verbnet/ Prototype(sinica):
http://ckip.iis.sinica.edu.tw/CKIP/ontology/
SUMO Subclass Hierarchy Treemaking constructing manufacture publication cooking searching pursuing investigating diagnostic process social interaction change of possession giving unilateral giving lending getting unilateral getting borrowing
Motivation for Mapping
How can a formal ontology be used effectively by those who lack extensive training in logic and mathematics?
How can an ontology be used automatically by
applications?
How can we know when an ontology is complete?
《知网》的架构
Basic Data
(Concept Definitions / Taxonomies)
S-relation Trigger
(Browser)
D-relation Trigger
(Application Tools)
Basic Data – Sememes
Sememes 2219
Entity 154
thing (physical, mental, fact)
component (part, fitting)
time
space (direction, location)
Event (relation, state 、 action) 818
Attribute 248
Value 892
Secondary feature 107
Basic Data – Concept Definition
NO.=020957
W_C= 大学生G_C=N
E_C=
W_E=college student
G_E=N
E_E=
DEF={human| 人 :{study| 学习 :agent={~},location={InstitutePlace|场 所 :domain={education| 教 育 },modifier={HighRank| 高 等 },{study| 学习 :location={~}},{teach| 教 :location={~}}}}}
Basic Data – Taxonomies- {thing| 万物 } {entity| 实体 :{ExistAppear| 存现 :existent={~}}}
- {physical| 物质 } {thing| 万物 :HostOf={Appearance| 外观 }, {perception| 感知 :content={~}}}
- {animate| 生物 } {physical| 物质 :HostOf={Age| 年龄 }, {alive| 活着 :experiencer={~}},{die| 死 : experiencer={~}},
{metabolize| 代谢 : experiencer={~}}, {reproduce| 生殖 :agent={~},PatientProduct={~}}}
- {AnimalHuman| 动物 } {animate| 生物 :HostOf={Sex| 性别 }, {AlterLocation| 变空间位置 :agent={~}},{StateMental| 精神
状态 :experiencer={~}}}
- {human| 人 } {AnimalHuman| 动物 :HostOf={Name| 姓名 } {Wisdom| 智慧 }{Ability| 能力 }, {think| 思考 :agent={~}},{speak| 说 :agent={~}}}
D-relation Trigger -- Application Tools
Relevant Concept Field Builder ( 相关概念场构造器 ) Cf. “seed list” Bonnie Dorr & Tiejun Zhao: “ 化学” /“ 射
击”
Sense Similarity Calculator ( 语义相似度计算器 )
“ 毛衣” Vs“ 手套” /“ 醋”
Chinese Chunk Extractor ( 中文语块抽取器 )
知网在海内外的应用 ( 1 )
Semantic Web ontology annotation thesaurus
陈文鋕 : Semantic Processing && Semantic Web Service
( 台湾财团法人资讯工业策进会 )
Named Entity Recognition
Tianfang Yao, Wei Ding, Gregor Erbach: CHINERS: A
Chinese Named Entity Recognition System for the
Sports Domain
知网在海内外的应用 ( 2 )
Word Sense Disambiguation Chi-Yung Wang: Knowledge-based Sense Pruning using the
HowNet: an Alternative to Word Sense
Disambiguation
Wong Ping Wai: A Maximum Entropy Approach to HowNet-
Based Chinese Word sense Disambiguation
Word Similarity Computing
Liu Qun Li Su Jian: Word Similarity Computing Based on
HowNet
知网在海内外的应用 ( 3 ) Sense Annotation
Dependency Relation Annotation Li MingQin, LI Juanzi : Building A Large Chinese Corpus
Annotated with Semantic Dependency
Cross-language Developing 授权给台湾中央研究院资讯所合作开发 HowNet Big5+ 版
数位典藏国家型计划( NDAP )http://ndap.org.tw/NewsLetter/content.html?subuid=559&uid=26
关于建设知识体系的一些看法
理论与工程的关系 – 把工程放在首位 研究与应用的关系 – 着眼于应用 分清什么是接轨和什么是“接鬼”
五年前有人建议我们把知网改成 WordNet 最近有人建议我们按 SUMO 来改知网的义原 把知网这件旗袍改成两件套的西服裙 – 就是接鬼
Chinese WordNet or English Hownet?在中文方面,也已有了一个类似词汇网路的资源,叫做《知网》( HowNet, http://www.keenage.com )。由大陆的董振东先
生在1995 年自力着手进行。它是中英/英中的一个双语词汇网路。早期版是开放不用收费的。 2002起新版改由中国科学院软件所管理后,就需要付费使用了。《知网》做法的特色是独树一帜;不采用英文词汇网路的架构只要采取他自己的架构。而且他先把世界知识本体做个定义,在这定义里再去做区分。这个由上而下的方法,与英语与欧语词汇网路由下而上的方法不同,当然有其可取之处。可惜的是,由于当年资源与讯息的限制,董振东教授与它的儿子董强,基本上是凭着信念与热诚完成《知网》的,过程中绝少外界的奥援,也并未与世界相关的研究接轨。他跟他儿子花了约有七、八年的功夫来做这个事。但是,基本上跟其他语言的词汇网路连接,并无架构上的基础,而其上层知识分类,也是两人的自由心证,不能说错,却也缺乏理论的基础,面临一些其他系统互通性( inter-
operability )的问题。
Records in WordNet / HowNet
Record in WordNet 03592879 06 n 02 watch 0 ticker 1 012 @ 03506835 n 0000 ~ 02187181
n 0000 %p 02529205 n 0000 ~ 02570752 n 0000 %p 02659936 n 0000 ~ 02841320 n 0000 %p 03021820 n 0000 ~ 03104263 n 0000 ~ 03150171 n 0000 ~ 03410656 n 0000 %p 03593482 n 0000 ~ 03636122 n 0000 | a small portable timepiece
Record in HowNetNO.=007738W_C= 表G_C=NE_C= 手 ~ ,怀 ~ ,钟 ~ ,电子 ~ ,机械 ~ ,带钻石的 ~ ,这块 ~ 不防水
W_E=watchG_E=NE_E=DEF={tool| 用具 :{tell|告诉 :content={time|时间 },instrument={~}}}
Axiom in SUMO / HowNet (1)
See SUMO_buy.doc
Cf. HowNet Event Relation & Role shifting
{buy|买 } <----> {obtain|得到 } [consequence];
agent OF {buy|买 }=possessor OF {obtain|得到 };
possession OF {buy|买 }=possession OF {obtain|得到 }.
{buy|买 } (X) <----> {sell|卖 } (Y) [mutual implication];
agent OF {buy|买 }=target OF {sell|卖 };
source OF {buy|买 }=agent OF {sell|卖 };
possession OF {buy|买 }=possession OF {sell|卖 };
cost OF {buy|买 }=cost OF {sell|卖 }.
Axiom in SUMO / HowNet (2)
{buy|买 } [entailment] <----> {choose|选择 };
agent OF {buy|买 }=agent OF {choose|选择 };
possession OF {buy|买 }=content OF {choose|选择 };
source OF {buy|买 }=location OF {choose|选择 }.
{buy|买 } [entailment] <----> {pay|付 };
agent OF {buy|买 }=agent OF {pay|付 };
cost OF {buy|买 }=possession OF {pay|付 };
source OF {buy|买 }=taget OF {pay|付 }.
Thematic Roles in VerbNet / HowNetSee VerbNet_buy.doc
Thematic Roles Agent[+animate OR +organization]Asset[+currency]Beneficiary[+animate OR +organization]Source[+concrete]Theme[]
Cf. HowNet Event Role with Typical Actors
│ ├ {buy|买 } {take| 取 :agent={human| 人 }{group|群体 ->}, possession={artifact| 人工物 ->},source={human|
人 } {InstitutePlace| 场所 },cost={money|货币 }, beneficiary={human| 人 }{group|群体 ->},
domain={economy|经济 }}
Components of HowNet
Taxonomy (义原层级规范) Roles and Features (角色与特征规范) Specifications of KDML (知识描述语言规范) Knowledge Database (知识库) Event Relations & Role Shifting
(事件关系与角色转换) Maintenance Tools (维护管理工具) APIs (应用接口)
Nature of HowNet
An online knowledge-base which reveals
the relationship among concepts, and the
relationship among attributes of concepts
-- Dong Zhendong, "Knowledge Description: What, How and who?", Proceedings of International Symposium on Electronic Dictionary, Tokyo, 1988, p.18
Theory of HowNet
Knowledge is a system of relationships among
concepts and among attributes of concepts
Everything is constantly changing in a specific
time and space, and converts from one state to another. The conversion embodies the change of its attributes
Guidelines of Design
Computer-oriented Relationship is the key; to reveal the
relationship is the main objective of HowNet Based on sememes Use of KDML Defining concepts in a static & isolate way Relationship is activated in a dynamic way
Concept Definitions in HowNet (1)
医生: DEF={human| 人 :domain={medical|医 }, HostOf={Occupation|职位 },{doctor| 医治 : agent={~}}}
患者: DEF={human| 人 :domain={medical|医 }, {SufferFrom|罹患 :experiencer={~}}, {doctor|医治 :patient={~}}}
医院: DEF={InstitutePlace| 场所 :{doctor|医治 : location={~},content={disease|疾病 }}, domain={medical|医 }}
Concept Definitions in HowNet (2)
病历: DEF={document| 文书 :{record|记录 : content={disease|疾病 },LocationFin={~}}, domain={medical|医 }}
健康: DEF={Health|健康 : host={AnimalHuman| 动物 }}
多病: DEF={unhealthy| 不健 }
│ │ ├ {HealthValue|健康值 } │ │ │ ├ {healthy|康健 } │ │ │ └ {unhealthy| 不健 }
Concept Definitions in HowNet (3)
病: {disease|疾病 } {phenomena| 现象 :
{doctor|医治 :content={~}},{SufferFrom|罹患 :content={~}},RelateTo={medicine|药物 }
{Health|健康 }{HealthValue|健康值 },
domain={medical|医 }}
药: {medicine|药物 } {artifact| 人工物 :{doctor|医治 :instrument={~}},RelateTo={disease|疾病 },
domain={medical|医 }{chemistry| 化学 }}
Identity of description in differentlanguage structures (1)
W_C=劫 W_C=飞机G_C=V G_C=N
E_C= E_C=
W_E=rob W_E=plane
G_E=V G_E=N
E_E= E_E=
DEF={rob|抢 } DEF={aircraft|飞行器 }
Identity of description in differentlanguage structures (2)
W_C=劫机G_C=V
E_C=
W_E=hijack a plane
G_E=V
E_E=
DEF={rob|抢 :possession={aircraft|飞行器 }}
Identity of description in differentlanguage structures (3)
W_C=劫机犯G_C=NE_C=W_E=hijackerG_E=NE_E=DEF={human| 人 :{rob|抢 :agent={~},
possession={aircraft|飞行器 }}}
Identity of description in differentlanguage structures (4)
W_C=抓获劫机犯G_C=V
E_C=
W_E=catch a hijacker
G_E=V
E_E=
DEF={catch|捉住 :patient={human| 人 :
{rob|抢 :agent={~},
possession={wealth|钱财 }}}}
Identity of description in differentlanguage structures (1)
W_C=机敏地抓获女劫机犯G_C=V
E_C=
W_E=catch a woman hijacker cleverly
G_E=V
E_E=
DEF={catch|捉住 :manner={clever| 灵 },
patient={human| 人 :{rob|抢 :agent={~},
possession={wealth|钱财 }},
modifier={female|女 }}}
Applications of HowNet
1. Semantic tagging
2. WSD , Sense Pruning
3. Sensitive information detection
4. Information filtering
5. Similarity of words
6. Semantic Web
7. Match of WordNet
Future work
Construction of resouces English HowNet Chinese message structure bank Increase of languages
Developing more APIs and tools Administration
Membership
Ontology 定义的附录 (1)
a specification of a conceptualization
the theory of objects and their ties similar to a dictionary or glossary, but with greater
detail and structure that enables computers to process its content. An ontology consists of a set of concepts, axioms, and relationships that describe a domain of interest. An upper ontology is limited to concepts that are meta, generic, abstract and philosophical …
Ontology 定义的附录 (2)
the study of what there is, an inventory of what exists
…What we may call ontology is the attempt to say what entities exist. Metaphysics, by contrast, is the attempt to say, of those entities, what they are.
the study of the categories of things that exist or may exist in some domain
The word ontology comes from the Greek ontos for being and logos for word.
Cost for French in EuroWordNetFor the development of French language, here were 2 partners:
Avignon (AVI) and Memodata (MEM). The following was requested :
AVI MEMPersonnel 72000 85000Equipment 3000 0Travel & assistance 5000 1500Consumables & computing 3000 300Overheads 16600 17100Total 99600 104400
Since Memodata was a private company, only50% of its request could be funded by the EC. So the total of the request was:
AVI MEMTotal 99600 52200
Notes: 1) validation is not included in this table. This has be done by Xerox and Bertin globallyfor several languages.
2) These amounts constitued a previsional budget corresponding to some
20 000 synsets.
Demo of Tools
(1) Relevant Concept Field
(2) Similarity of Words
(3) Chinese Chunk Extractor
(4) Smart Word finder
Overview of HowNet
Components of HowNet Nature of HowNet Theory of HowNet Guidelines of Design Sememes and Relations