基于知识图谱的可解释人工智能机遇与挑战 - Fudan Universitykw.fudan.edu.cn/resources/ppt/invitedtalk/KGChallenge.pdf · 2017. 11. 24. · •Step 1: Using a score

Xiao Yanghua (肖仰华)

Fudan University (复旦大学)

kw.fudan.edu.cn

基于知识图谱的可解释人工智能机遇与挑战

以深度学习为代表的大数据人工智能进展迅速

• 大数据催生深度学习• 大规模标注数据是的训练深度的神经网络模型成为可能

• 以云计算为代表的大规模分布式计算平台和以及GPU、FPGA等硬件能力的提升为深度学习奠定了必要的计算基础设施。

• 大数据喂养下的深度学习取得显著进展

• 机器在图像、语音识别等感知能力方面达到甚至超越人类水平

• 在自然语言处理方面取得显著进展超过6千万个参数，65万个神经元的CNN网络，用于训练ImageNet。

深度学习模型：具有较深层次

的神经网络

模拟人脑的深度结构;模仿人

类逐层进行，逐步抽象的认知

过程

使得类人学习成为可能无需传统机器学习的专家特征

定义

“Black box” Problem of Deep Learning

“If we can't ask … why they do something and get a reasonable response back, people will just put it back on the shelf”

Mark Riedl, Georgia Institute of Technology

3

How AI detectives are cracking open the black box

of deep learning

By Paul VoosenJul. 6, 2017 , 2:00 PM

Deep Learning Is a Black Box, but Health Care Won’t Mind

New algorithms are able to diagnose disease as accurately as expert

physicians.

•by Monique Brouillette, April 27, 2017

Can we open the black box of AI?Artificial intelligence is everywhere. But before scientists trust it,

they first need to understand how machines learn.

•Davide Castelvecchi, 05 October 2016

http://www.sciencemag.org/author/paul-voosen

https://www.technologyreview.com/profile/monique-brouillette/

http://www.nature.com/news/can-we-open-the-black-box-of-ai-1.20731#auth-1

Open the “black box”：XAI

4

Mr. David Gunning 所领导的美国DAPRA可解释AI项目

https://www.darpa.mil/program/explainable-artificial-intelligence

Explainable AI(XAI): A key component of an artificially intelligent system is the

ability to explain the decisions, recommendations, predictions or actions made by it andthe process through which they are made.— O. Biran and C. Cotton, Explanation and Justification in Machine Learning: A Survey, IJCAI, 2017

The Explainable AI (XAI) program aims to

create a suite of machine learning

techniques that:

•Produce more explainable models, while

maintaining a high level of learning

performance (prediction accuracy); and

•Enable human users to understand,

appropriately trust, and effectively

manage the emerging generation of

artificially intelligent partners.

理解与解释是后深度学习时代AI的主要使命

• 后深度学习时代AI的主要使命• 从“只知其然”走向“知其所以然”• 从感知走向认知

• XAI是增强人机互信的关键技术• 解释结果• 解释过程

• XAI具有重大应用前景• Justify agent behavior• Debug a machine learning model• Explain medical decision-making • Persuasive recommendation

5

“To explain an event is to provide some information about its causal history.”— D. Lewis, Causal explanation, Philosophical Papers, 1986

Explanation

Understanding

Interpretation

Justificationcausation

知识图谱的出现为XAI带来机遇

• 知识图谱是智能化的核心技术之一• 让搜索通往答案

• From string to things

• 大数据精准分析

• 机器智脑中的知识库

• 知识图谱富含实体、概念、属性、关系等信息，使得解释成为可能

是一种大规模语义网络，表达

了实体/概念及其之间的各类

语义关系

海量规模、语义丰富、结构友

好、质量精良

为语义理解提供丰富的背景知

识

是能够满足机器认知需求的知

识表示

Knowledge Graph片段示例

“Knowledge is the power in AI system.”— Edward Feigenbaum

Why knowledge graph is useful for explanation

7

鲨鱼为什么那么可怕？因为它们是食肉动物

鸟儿为何能够飞翔？因为它们有翅膀

鹿晗关晓彤最近为何刷屏？因为关晓彤是鹿晗女朋友

概念

属性

关系

解释取决于人类认知的基本框架；概念、属性、关系是认知的基石

“Concepts are the glue that holds our mental world

together” --Gregory Murphy

Explanation with knowledge graph

Explain a concept/category (IJCAI2016)

Explain a set of entities (IJCAI 2017)

Explain a bag of words (IJCAI2015)

Explain hyperlinked entities in Wiki (CIKM2017)

8

• Extracted by Hearst pattern• NP such as NP, NP, ..., and|or NP such NP

as NP,* or|and NP

• NP, NP*, or other NP

• NP, NP*, and other NP

• NP, including NP,* or | and NP NP, especially NP,* or|and NP

• domestic animals such as cats and dogs ...

• China is a developing country.

• Life is a box of chocolate.

9

Probase and Probase+

Probate+: Completing Probase by Collaborative Filtering based Framework

Idea: if most similar terms of c have h as the hypernym, c is likely to have the hypernym h.

Jailing Liang, Yanghua Xiao, et al., Probase+: Inferring Missing Links in Conceptual Taxonomies, TKDE 2017

• DBpedia• Extract structured information from

Wikipedia• Make this information available on the

Web under an open license• Interlink the DBpedia dataset with other

datasets on the Web

• Contributors• Freie Universität Berlin (Germany)• Universität Leipzig (Germany)• OpenLink Software (UK)• Linking Open Data

• CN-DBpedia: a Chinese counterpart of DBpedia

• Developed by Knowledge Works at Fudan

• Rich for entities’ structured information.

• Contains many category, tags for entities

10

DBPedia and CN-DBPedia

CN-DBpedia @kw.fudan.edu.cn

What is Defining Features of Categories? Defining features are assumed to establish the necessary and sufficient conditions to characterize the meaning of the category.

• Any entity with the defining features should belong to the category

• Any entity belonging to the category must contain the defining features

11

Explain a Concept/Category

Problem:How do we understand a concept/category?

Example：Bachelor: Sex=man, Marriage status=unmarried

E.g., Category “Jay Chou albums”Defining Features{(Type, album), (Singer, Jay Chou)}

Non-Defining Features{(Type, album), (Singer, Jay Chou), (genre, Pop music)}{(Type, single), (Singer, Jay Chou)}

Bo Xu, et al, Learning Defining Features for Categories. (IJCAI 2016)

• The search space of candidate feature set is of exponential

• Using frequent pattern mining to find candidate defining feature sets that are frequent enough

• Repeat until no new DFs can be found• Step 1: Using a score function to find DFs

of some categories

• Step 2 & 3: Using a rule-based method to get more DFs of categories

• Step 4: Populate DBpedia by using DFs of categories discovered so far

12

Solution and Results

How to measure the goodness of a set of features

Challenge and Solutions

Solution Framework

Results

We finally obtain 60,247 new C-DFs with average quality score 2.82

13

Explain a Set of Entities

Problem:Given a set of entities, can we understand its concept and recommend a most related entity?

Applications：E-commerce: if users are searching samsung s6, and iPhone 6, what should we recommend and why?

Yi Zhang, et al, Entity suggestion with conceptual explanation, (IJCAI 2017)

• Problem:• A concept not necessarily exists

• We can find China, Russia, Brazil and India under the topic ‘developing country’, but there is not an exact topic ‘BRIC’.

• A concept may cover many non-relevant instances

• Under the topic ‘developing country’ there are many other countries which makes us difficult to find the most related entities.

• The best concept in most cases is implicit• The information in Probase is not clean

enough

14

Explain a Set of Entities

A naive solution:Use the taxonomy, such as Probase, to find the nearest common ancestor

Model 1: using concepts as hidden variables and punish concepts with too much member entities

Model 2: the entity whose union with the query entities should preserve the concept distribution of queries.

15

Results

• Challenge: how to measure the “goodness” of the labels we assign to a bag of words

• Coverage: the conceptual labels should cover as many words and phrases in the input as possible

• Minimality: the number of conceptual labels should be as small as possible

Applications

• Topic labelling• A topic is a bag of words that do not

have explicit semantics

• Conceptual labeling turns each topic into a small set of meaningful concepts

• Language understanding• Verb role labeling

• Can summarize verb eat’s direct objects apple, breakfast, pork, beef, bullet, into a small set of concepts, such as fruit, meal, meat, bullet

16

Explain a bag of words

Problem:Given a bag of words, can we inference what the article is talking about?

Example:china, japan, india, korea -> asian countrydinner, lunch, food, child, girl -> meal, childbride, groom, dress, celebration -> wedding

Solution

17

Minimum description lengthThe best concepts should capture the regularities of the words as much as possible, which enables us to compress the data as much as possible

Noise tolerant{apple, banana, breakfast, dinner, pork, beef, bullet}

Integrating attribute information{population, president, location}, which triggers the concept country

• Our models can fully employ the attributes of concepts to generate a better label

• Our solutions can find minimum number of concepts to label a bag of words

• Most conceptual labels are specific enough

• Noise words will be ignored

18

Results

Explain hyperlinked entities in Wikipedia

• Problem: can we explain why an hyperlinked occurs in a certain Wikipage

2017/10/17 19

Solution and results

• Solution: a clustering-then-labeling framework• Clustering: K-means++ with G-means like seed selection• Labeling: maximal least ancestor model running taxonomy constructed from wiki categories

• Result• 1.5M Navbox with user satisfactory score close to existing manually built navbox in Wiki

2017/10/17 20

Challenges

• 对于“解释”的解释，对于“理解”的理解仍然十分匮乏

• 人类是如何理解世界、解释现象的，我们对于人类的理解与解释行为，在哲学、认知、心理等社会学层面的认识仍然十分有限，严重制约了我们将其赋予机器。

• 《Explanation in Artificial Intelligence: Insights from the Social Science》, Time Miller

• 大规模常识的获取及在XAI中的应用

• 当前人工智能（特别是XAI）普遍缺乏常识理解能力，常识缺乏是人工智能研究的重大制约瓶颈。人工智能的发展必须尽快突破这一瓶颈。常识理解是通用人工智能的核心问题，是人工智能发展的战略制高点。为此，必须尽快开展大规模常识获取与理解的研究。

• 《Large Scale Commonsense Knowledge Acquisition and Understanding》

• 数据驱动（Neutral）与知识引导（Symbol）深度融合的新型机器学习模型

• 如何将符号化知识有机融入基于数据的统计学习模型，是当前人工智能的重大问题，是降低深度学习模型的样本依赖，突破机器学习模型效果的天花板的关键所在。

• 《当知识图谱遇见深度学习》，中国人工智能通信

21

References

• Chenhao Xie, Jiaqing Liang, Lihan Chen, Yanghua Xiao*, Hanghang Tong, Kezun Zhang, Haixun Wang and Wei Wang, Automatic Navbox Generation by Interpretable Clustering over Linked Entities，(CIKM2017 )

• Yi Zhang, Yanghua Xiao*,Seuongwon Hwang, Haixun Wang, X.Sean Wang, Entity Suggestion with Conceptual Explanation, (IJCAI2017)

• Jiaqing Liang, Sheng Zhang, Yanghua Xiao*, How to Keep a Knowledge Base Synchronized with Its Encyclopedia Source, (IJCAI2017 )

• Bo Xu, Yong Xu, Jiaqing Liang, Chenhao Xie, Bin Liang, Wanyun Cui and Yanghua Xiao*, CN-DBpedia: A Never-Ending Chinese Knowledge Extraction System, (IEA/AIE2017 )

• Jiaqing Liang, Yi Zhang, Yanghua Xiao*, Haixun Wang, Wei Wang, Probase+: Inferring Missing Links in Conceptual Taxonomies, Transactions on Knowledge and Data Engineering (TKDE2017)

• Wanyun Cui, Yanghua Xiao*, Haixun Wang, Yangqiu Song, Seung-won Hwang, Wei Wang, KBQA: Learning Question Answering over QA Corpora and Knowledge Bases, (VLDB 2017)

• Jiaqing Liang, Yi Zhang, Yanghua Xiao*, Haixun Wang, Wei Wang and Pinpin Zhu, On the Transitivity of Hypernym-hyponym Relations in Data-Driven Lexical Taxonomies, (AAAI 2017)

• Jiaqing Liang, Yanghua Xiao*, Yi Zhang, Seung-Won Hwang and Haixun Wang, Graph-based Wrong IsA Relation Detection in a Large-scale Lexical Taxonomy, (AAAI 2017)

• Xiangyan Sun, Yanghua Xiao*, Haixun Wang, Wei Wang, On Conceptual Labeling of a Bag of Words, (IJCAI 2015)

• Xiangyan Sun, Haixun Wang, Yanghua Xiao*, Zhongyuan Wang, Syntactic Parsing of Web Queries, (EMNLP 2016)

• Bo Xu, Chenhao Xie, Yi Zhang, Yanghua Xiao*, Haixun Wang and Wei Wang, Learning Defining Features for Categories, (IJCAI 2016)

• Wanyun Cui, Yanghua Xiao*, Wei Wang, KBQA: An Online Template Based Question Answering System over Freebase, (IJCAI 2016)

• Wanyun Cui, Xiyou Zhou, Hangyu Lin,Yanghua Xiao*,Seungwon Hwang,Haixun Wang and Wei Wang, Verb Pattern: A Probabilistic Semantic Representation on Verbs, (AAAI 2016)

22

总结

• 以深度学习为代表的大数据人工智能获得巨大进展

• 深度学习的不透明性、不可解释已经成为制约其发展的巨大障碍

• 理解与解释是后深度学习时代的AI的核心任务

• 知识图谱为可解释人工智能提供全新机遇

• “解释”难以定义、常识获取与应用、Neutral-symbolic融合对XAI提出巨大挑战

23

24

Thank YOU！

• Our LAB: Knowledge Works at Fudan University

• http://kw.fudan.edu.cn

Documents

基于知识图谱的可解释人工智能 机遇与挑战 - Fudan Universitykw.fudan.edu.cn/resources/ppt/invitedtalk/KGChallenge.pdf · 2017. 11. 24. · •Step 1: Using a score

基于知识图谱的可解释人工智能机遇与挑战 - Fudan Universitykw.fudan.edu.cn/resources/ppt/invitedtalk/KGChallenge.pdf · 2017. 11. 24. · •Step 1: Using a score