Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Xiao Yanghua (肖仰华)
Fudan University (复旦大学)
kw.fudan.edu.cn
基于知识图谱的可解释人工智能机遇与挑战
以深度学习为代表的大数据人工智能进展迅速
• 大数据催生深度学习• 大规模标注数据是的训练深度的神经网络模型成为可能
• 以云计算为代表的大规模分布式计算平台和以及GPU、FPGA等硬件能力的提升为深度学习奠定了必要的计算基础设施。
• 大数据喂养下的深度学习取得显著进展
• 机器在图像、语音识别等感知能力方面达到甚至超越人类水平
• 在自然语言处理方面取得显著进展 超过6千万个参数,65万个神经元的CNN网络,用于训练ImageNet。
深度学习模型:具有较深层次
的神经网络
模拟人脑的深度结构;模仿人
类逐层进行,逐步抽象的认知
过程
使得类人学习成为可能无需传统机器学习的专家特征
定义
“Black box” Problem of Deep Learning
“If we can't ask … why they do something and get a reasonable response back, people will just put it back on the shelf”
Mark Riedl, Georgia Institute of Technology
3
How AI detectives are cracking open the black box
of deep learning
By Paul VoosenJul. 6, 2017 , 2:00 PM
Deep Learning Is a Black Box, but Health Care Won’t Mind
New algorithms are able to diagnose disease as accurately as expert
physicians.
•by Monique Brouillette, April 27, 2017
Can we open the black box of AI?Artificial intelligence is everywhere. But before scientists trust it,
they first need to understand how machines learn.
•Davide Castelvecchi, 05 October 2016
Open the “black box”:XAI
4
Mr. David Gunning 所领导的美国DAPRA可解释AI项目
https://www.darpa.mil/program/explainable-artificial-intelligence
Explainable AI(XAI): A key component of an artificially intelligent system is the
ability to explain the decisions, recommendations, predictions or actions made by it andthe process through which they are made.— O. Biran and C. Cotton, Explanation and Justification in Machine Learning: A Survey, IJCAI, 2017
The Explainable AI (XAI) program aims to
create a suite of machine learning
techniques that:
•Produce more explainable models, while
maintaining a high level of learning
performance (prediction accuracy); and
•Enable human users to understand,
appropriately trust, and effectively
manage the emerging generation of
artificially intelligent partners.
理解与解释是后深度学习时代AI的主要使命
• 后深度学习时代AI的主要使命• 从“只知其然”走向“知其所以然”• 从感知走向认知
• XAI是增强人机互信的关键技术• 解释结果• 解释过程
• XAI具有重大应用前景• Justify agent behavior• Debug a machine learning model• Explain medical decision-making • Persuasive recommendation
5
“To explain an event is to provide some information about its causal history.”— D. Lewis, Causal explanation, Philosophical Papers, 1986
Explanation
Understanding
Interpretation
Justificationcausation
知识图谱的出现为XAI带来机遇
• 知识图谱是智能化的核心技术之一• 让搜索通往答案
• From string to things
• 大数据精准分析
• 机器智脑中的知识库
• 知识图谱富含实体、概念、属性、关系等信息,使得解释成为可能
是一种大规模语义网络,表达
了实体/概念及其之间的各类
语义关系
海量规模、语义丰富、结构友
好、质量精良
为语义理解提供丰富的背景知
识
是能够满足机器认知需求的知
识表示
Knowledge Graph片段示例
“Knowledge is the power in AI system.”— Edward Feigenbaum
Why knowledge graph is useful for explanation
7
鲨鱼为什么那么可怕?因为它们是食肉动物
鸟儿为何能够飞翔?因为它们有翅膀
鹿晗关晓彤最近为何刷屏?因为关晓彤是鹿晗女朋友
概念
属性
关系
解释取决于人类认知的基本框架;概念、属性、关系是认知的基石
“Concepts are the glue that holds our mental world
together” --Gregory Murphy
Explanation with knowledge graph
Explain a concept/category (IJCAI2016)
Explain a set of entities (IJCAI 2017)
Explain a bag of words (IJCAI2015)
Explain hyperlinked entities in Wiki (CIKM2017)
8
• Extracted by Hearst pattern• NP such as NP, NP, ..., and|or NP such NP
as NP,* or|and NP
• NP, NP*, or other NP
• NP, NP*, and other NP
• NP, including NP,* or | and NP NP, especially NP,* or|and NP
• domestic animals such as cats and dogs ...
• China is a developing country.
• Life is a box of chocolate.
9
Probase and Probase+
Probate+: Completing Probase by Collaborative Filtering based Framework
Idea: if most similar terms of c have h as the hypernym, c is likely to have the hypernym h.
Jailing Liang, Yanghua Xiao, et al., Probase+: Inferring Missing Links in Conceptual Taxonomies, TKDE 2017
• DBpedia• Extract structured information from
Wikipedia• Make this information available on the
Web under an open license• Interlink the DBpedia dataset with other
datasets on the Web
• Contributors• Freie Universität Berlin (Germany)• Universität Leipzig (Germany)• OpenLink Software (UK)• Linking Open Data
• CN-DBpedia: a Chinese counterpart of DBpedia
• Developed by Knowledge Works at Fudan
• Rich for entities’ structured information.
• Contains many category, tags for entities
10
DBPedia and CN-DBPedia
CN-DBpedia @kw.fudan.edu.cn
What is Defining Features of Categories? Defining features are assumed to establish the necessary and sufficient conditions to characterize the meaning of the category.
• Any entity with the defining features should belong to the category
• Any entity belonging to the category must contain the defining features
11
Explain a Concept/Category
Problem:How do we understand a concept/category?
Example:Bachelor: Sex=man, Marriage status=unmarried
E.g., Category “Jay Chou albums”Defining Features{(Type, album), (Singer, Jay Chou)}
Non-Defining Features{(Type, album), (Singer, Jay Chou), (genre, Pop music)}{(Type, single), (Singer, Jay Chou)}
Bo Xu, et al, Learning Defining Features for Categories. (IJCAI 2016)
• The search space of candidate feature set is of exponential
• Using frequent pattern mining to find candidate defining feature sets that are frequent enough
• Repeat until no new DFs can be found• Step 1: Using a score function to find DFs
of some categories
• Step 2 & 3: Using a rule-based method to get more DFs of categories
• Step 4: Populate DBpedia by using DFs of categories discovered so far
12
Solution and Results
How to measure the goodness of a set of features
Challenge and Solutions
Solution Framework
Results
We finally obtain 60,247 new C-DFs with average quality score 2.82
13
Explain a Set of Entities
Problem:Given a set of entities, can we understand its concept and recommend a most related entity?
Applications:E-commerce: if users are searching samsung s6, and iPhone 6, what should we recommend and why?
Yi Zhang, et al, Entity suggestion with conceptual explanation, (IJCAI 2017)
• Problem:• A concept not necessarily exists
• We can find China, Russia, Brazil and India under the topic ‘developing country’, but there is not an exact topic ‘BRIC’.
• A concept may cover many non-relevant instances
• Under the topic ‘developing country’ there are many other countries which makes us difficult to find the most related entities.
• The best concept in most cases is implicit• The information in Probase is not clean
enough
14
Explain a Set of Entities
A naive solution:Use the taxonomy, such as Probase, to find the nearest common ancestor
Model 1: using concepts as hidden variables and punish concepts with too much member entities
Model 2: the entity whose union with the query entities should preserve the concept distribution of queries.
15
Results
• Challenge: how to measure the “goodness” of the labels we assign to a bag of words
• Coverage: the conceptual labels should cover as many words and phrases in the input as possible
• Minimality: the number of conceptual labels should be as small as possible
Applications
• Topic labelling• A topic is a bag of words that do not
have explicit semantics
• Conceptual labeling turns each topic into a small set of meaningful concepts
• Language understanding• Verb role labeling
• Can summarize verb eat’s direct objects apple, breakfast, pork, beef, bullet, into a small set of concepts, such as fruit, meal, meat, bullet
16
Explain a bag of words
Problem:Given a bag of words, can we inference what the article is talking about?
Example:china, japan, india, korea -> asian countrydinner, lunch, food, child, girl -> meal, childbride, groom, dress, celebration -> wedding
Solution
17
Minimum description lengthThe best concepts should capture the regularities of the words as much as possible, which enables us to compress the data as much as possible
Noise tolerant{apple, banana, breakfast, dinner, pork, beef, bullet}
Integrating attribute information{population, president, location}, which triggers the concept country
• Our models can fully employ the attributes of concepts to generate a better label
• Our solutions can find minimum number of concepts to label a bag of words
• Most conceptual labels are specific enough
• Noise words will be ignored
18
Results
Explain hyperlinked entities in Wikipedia
• Problem: can we explain why an hyperlinked occurs in a certain Wikipage
2017/10/17 19
Solution and results
• Solution: a clustering-then-labeling framework• Clustering: K-means++ with G-means like seed selection• Labeling: maximal least ancestor model running taxonomy constructed from wiki categories
• Result• 1.5M Navbox with user satisfactory score close to existing manually built navbox in Wiki
2017/10/17 20
Challenges
• 对于“解释”的解释,对于“理解”的理解仍然十分匮乏
• 人类是如何理解世界、解释现象的,我们对于人类的理解与解释行为,在哲学、认知、心理等社会学层面的认识仍然十分有限,严重制约了我们将其赋予机器。
• 《Explanation in Artificial Intelligence: Insights from the Social Science》, Time Miller
• 大规模常识的获取及在XAI中的应用
• 当前人工智能(特别是XAI)普遍缺乏常识理解能力,常识缺乏是人工智能研究的重大制约瓶颈。人工智能的发展必须尽快突破这一瓶颈。常识理解是通用人工智能的核心问题,是人工智能发展的战略制高点。为此,必须尽快开展大规模常识获取与理解的研究。
• 《Large Scale Commonsense Knowledge Acquisition and Understanding》
• 数据驱动(Neutral)与知识引导(Symbol)深度融合的新型机器学习模型
• 如何将符号化知识有机融入基于数据的统计学习模型,是当前人工智能的重大问题,是降低深度学习模型的样本依赖,突破机器学习模型效果的天花板的关键所在。
• 《当知识图谱遇见深度学习》,中国人工智能通信
21
References
• Chenhao Xie, Jiaqing Liang, Lihan Chen, Yanghua Xiao*, Hanghang Tong, Kezun Zhang, Haixun Wang and Wei Wang, Automatic Navbox Generation by Interpretable Clustering over Linked Entities,(CIKM2017 )
• Yi Zhang, Yanghua Xiao*,Seuongwon Hwang, Haixun Wang, X.Sean Wang, Entity Suggestion with Conceptual Explanation, (IJCAI2017)
• Jiaqing Liang, Sheng Zhang, Yanghua Xiao*, How to Keep a Knowledge Base Synchronized with Its Encyclopedia Source, (IJCAI2017 )
• Bo Xu, Yong Xu, Jiaqing Liang, Chenhao Xie, Bin Liang, Wanyun Cui and Yanghua Xiao*, CN-DBpedia: A Never-Ending Chinese Knowledge Extraction System, (IEA/AIE2017 )
• Jiaqing Liang, Yi Zhang, Yanghua Xiao*, Haixun Wang, Wei Wang, Probase+: Inferring Missing Links in Conceptual Taxonomies, Transactions on Knowledge and Data Engineering (TKDE2017)
• Wanyun Cui, Yanghua Xiao*, Haixun Wang, Yangqiu Song, Seung-won Hwang, Wei Wang, KBQA: Learning Question Answering over QA Corpora and Knowledge Bases, (VLDB 2017)
• Jiaqing Liang, Yi Zhang, Yanghua Xiao*, Haixun Wang, Wei Wang and Pinpin Zhu, On the Transitivity of Hypernym-hyponym Relations in Data-Driven Lexical Taxonomies, (AAAI 2017)
• Jiaqing Liang, Yanghua Xiao*, Yi Zhang, Seung-Won Hwang and Haixun Wang, Graph-based Wrong IsA Relation Detection in a Large-scale Lexical Taxonomy, (AAAI 2017)
• Xiangyan Sun, Yanghua Xiao*, Haixun Wang, Wei Wang, On Conceptual Labeling of a Bag of Words, (IJCAI 2015)
• Xiangyan Sun, Haixun Wang, Yanghua Xiao*, Zhongyuan Wang, Syntactic Parsing of Web Queries, (EMNLP 2016)
• Bo Xu, Chenhao Xie, Yi Zhang, Yanghua Xiao*, Haixun Wang and Wei Wang, Learning Defining Features for Categories, (IJCAI 2016)
• Wanyun Cui, Yanghua Xiao*, Wei Wang, KBQA: An Online Template Based Question Answering System over Freebase, (IJCAI 2016)
• Wanyun Cui, Xiyou Zhou, Hangyu Lin,Yanghua Xiao*,Seungwon Hwang,Haixun Wang and Wei Wang, Verb Pattern: A Probabilistic Semantic Representation on Verbs, (AAAI 2016)
22
总结
• 以深度学习为代表的大数据人工智能获得巨大进展
• 深度学习的不透明性、不可解释已经成为制约其发展的巨大障碍
• 理解与解释是后深度学习时代的AI的核心任务
• 知识图谱为可解释人工智能提供全新机遇
• “解释”难以定义、常识获取与应用、Neutral-symbolic融合对XAI提出巨大挑战
23
24
Thank YOU!
• Our LAB: Knowledge Works at Fudan University
• http://kw.fudan.edu.cn