26
Copyright © 20012~ JNUE 지식의 !! 그리고 Linked Open Data Knowledge Extraction from Text 2014.1.24 김평 ([email protected])

Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

지식의 힘!! 그리고 Linked Open Data

Knowledge Extraction from Text

2014.1.24

김평 ([email protected])

Page 2: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

지식을 어떻게 추출할 것인가?

Linked Data 생성 지식 획득: 여러 형태의 지식원천(Knowledge Source)으로부터 필요한 지식을 추출하여 구조적으로 조직화하는 과정

DBMS 구축 대상 선정 (구축하려는 서비스와 보유 자원 파악) 서비스 시나리오 중심의 데이터 분석 개념화 (클래스, 속성) 변환 (DBMS -> RDF) 검증 -> 발행

Text 비정형 문서에서 지식을 추출하는 작업이 추가적으로 필요

2

Page 3: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

지식을 어떻게 추출할 것인가?

관련 기술 데이터 마이닝

전자상거래나 웹 로그 등 다양한 형태로 생성되는, 잠재적 가치를 가진 데이터로부터 유용한 정보를 추출하는 작업

데이터베이스의 데이터처럼 정형화된 데이터를 대상 특성간 연관성 파악이나 규칙 생성 등 다양한 알고리즘(결정트리, 신경망, 연관 규칙)이 개발되어 있음

텍스트 마이닝 자연어로 구성된 비정형 텍스트 데이터에서 패턴 또는 관계를 추출하는 마이닝 기법

자연어처리, 정보추출, 시각화, 데이터베이스 등 기계학습의 분야를 포함

브랜드 모니터링, 오피니언 마이닝, QA 시스템 등 다양하게 연구되고 있음

3

Page 4: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

지식을 어떻게 추출할 것인가?

Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing Systems Foundation)

Text understanding is an old yet-unsolved AI problem consisting of a number of nontrivial steps. The critical step in solving the problem is knowledge acquisition from

text, i.e. a transition from a non-formalized text into a formalized actionable language (i.e. capable of reasoning).

Other steps in the text understanding pipeline include linguistic processing, reasoning, text generation, search, question answering etc. which are more or less solved to the degree which allows composition of a text understanding service.

On the other hand, we know that knowledge acquisition, as the key bottleneck, can be done by humans, while automating of the process is still out of reach in its full breadth.

4

Page 5: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

지식의 추출과 서비스

Text understanding and knowledge acquisition AI research group: computational linguistics, machine learning,

probabilistic & logical reasoning, and semantic web 기계 학습(machine learning)은 인공 지능의 한 분야로, 컴퓨터가 학습할 수 있도록 하는 알고리즘과 기술을 개발하는 분야

Use of machine learning Carnegie Mellon University Cycorp IBM Research IDIAP Research Institute Jozef Stefan Institute KU Leuven(Katholieke Universiteit Leuven) Max Planck Institute MIT Media Lab University Washington Vulcan Inc.

5

Page 6: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Read the Web

카네기 멜론 대학의 연구 프로젝트 (2010.1 ~) NELL: Never-Ending Language Learning

First, it attempts to "read," or extract facts from text found in hundreds of millions of web pages (e.g., playsInstrument(George_Harrison, guitar)).

Second, it attempts to improve its reading competence, so that tomorrow it can extract more facts from the web, more accurately.

So far, NELL has accumulated over 50 million candidate beliefs by reading the web, and it is considering these at different levels of confidence. NELL has high confidence in 2,069,313 of these beliefs

http://rtw.ml.cmu.edu/rtw/

http://rtw.ml.cmu.edu/rtw/kbbrowser/

6

Page 7: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

OpenCyc (1)

Cyc the world's largest and most complete general knowledge base and

commonsense reasoning engine. rich domain modeling semantic data integration text understanding domain-specific expert systems game Ais

http://www.cyc.com/platform/opencyc ~239,000 terms (up from ~177,000 terms in the previous release) ~2,093,000 triples (up from ~1,500,000 in the previous release) ~69,000 owl:sameAs links to external (non-Cyc) semantic data

namespaces: http://www.cyc.com/vocabulary/basics

http://sw.opencyc.org/

7

Page 8: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

OpenCyc (2)

Semantic Construction Grammar

8 Semantic Construction Grammar : Michael Witbrock (2012.1.18)

Page 9: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Watson

IBM: Watson understands natural language, breaking down the barrier between people and machines.

9

The Science Behind an Answer • Question Analysis (2:11)

• Hypothesis Generation (2:45)

• Hypothesis & Evidence Scoring (3:19)

• Final Merging & Ranking (4:17)

Page 10: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Deep Learning for NLP (1)

Deep Learning A new area of Machine Learning research, which has been

introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence.

10

http://ninacsmith.com/3CLearning/Ninas3CTools/ConstructiveTools/DeeporShallow.aspx

Page 11: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Deep Learning for NLP (2)

AI 기술의 화두 알고리즘은 심층 신경망 기반 학습 알고리즘이다. 미래 ICT기기에 인간형 인지·판단 능력을 부여해 스스로 지능을 고도화해가는 기계 탄생이 가능하다. 구글, MIT, 가트너 등이 2013년 주목해야 할 기술로 선정

미국 매사추세츠공대(MIT)는 테크놀로지리뷰 2013년 5ㆍ6월호에서 '인간의 가능성을 높여줄 10가지 기술' 가운데 구글이 개발한 인공지능 시스템인 '딥러닝'을 소개하고 있다. 이 시스템은 사람처럼 배우고 학습하며 스스로 언어능력을 발전시켜 나간다고 함

11

Page 12: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Deep Learning for NLP (3)

http://deeplearning.net/ Demo (http://deeplearning.net/demos/)

재귀 신경망을 사용하여 스탠포드의 심리 분석 데모: 영화 리뷰어의 감정 분석: http://nlp.stanford.edu:8080/sentiment/rntnDemo.html

12

Page 13: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Xlike (1)

Jožef Stefan Institute, Slovenia (FP7) Cross-lingual Knowledge Extraction

two key open research problems: to extract and integrate formal knowledge from multilingual texts

with cross-lingual knowledge bases to adapt linguistic techniques and crowd sourcing to deal with

irregularities in informal language used primarily in social media.

13

Page 14: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Xlike (2)

Demo Newsfeed Clean stream of semantically enriched news articles

Multilingual Language Processing Wweb services for multilingual language processing

Cross-lingual Document Linking Demo of cross-lingual similarity search

News Data Visualization Interactive interface to Newsfeed data enriched with XLike technologies

14

Page 15: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

SemEval

SemEval (Semantic Evaluation) is an ongoing series of evaluations of computational semantic analysis systems

15

http://en.wikipedia.org/wiki/File:SemEval_framework.jpg

Page 16: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Spatial Role Labeling

Spatial relationships between objects http://en.wikipedia.org/wiki/SemEval#Semantic_evaluation_tasks

16

Page 17: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

YAGO2s (1)

Max-Planck-Institut Informatik YAGO2s

Huge semantic knowledge base, derived from Wikipedia WordNet and GeoNames. Currently, YAGO2s has knowledge of more than 10 million entities

(like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities.

The accuracy of YAGO has been manually evaluated, proving a confirmed accuracy of 95%. Every relation is annotated with its confidence value.

YAGO combines the clean taxonomy of WordNet with the richness of the Wikipedia category system, assigning the entities to more than 350,000 classes.

YAGO is an ontology that is anchored in time and space. YAGO attaches a temporal dimension and a spacial dimension to many of its facts and entities.

In addition to a taxonomy, YAGO has thematic domains such as "music" or "science" from WordNet Domains.

17

Page 18: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

YAGO2s (2)

YAGO2s Demo YAGO as Linked Open Data

YAGO2 is part of the linked data cloud. We are linked directly to DBpedia. You can download these links. sameAs-links between the classes of DBpedia and YAGO2s sameAs-links between the individuals of YAGO2s and the YAGO-

based classes of Dbpedia subClassOf-links between the classes of YAGO2s and the manual

ontology classes of DBpedia: download as TSV (with precision), download as RDF/TTL (cut at 60% precision). These links have been computed automatically by the PARIS project and are not 100% accurate.

subPropertyOf-links between the relations of YAGO2s and the manual ontology properties of DBpedia: download as TSV (with precision), download as RDF/TTL (cut at 40% precision). These links have been computed automatically by the PARIS project. They are not perfect, but of very reasonable quality.

18

Page 19: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

ConceptNet (1)

ConceptNet is a semantic network containing lots of things computers should know about the world, especially when understanding text written by people.

19

Page 20: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

ConceptNet (2)

ConceptNet A freely available commonsense knowledgebase and natural-

language-processing toolkit which supports many practical textual-reasoning tasks over real-world documents right out-of-the-box (without additional statistical training) including topic-jisting (e.g. a news article containing the concepts, “gun,”

“convenience store,” “demand money” and “make getaway” might suggest the topics “robbery” and “crime”),

affect-sensing (e.g. this email is sad and angry), analogy-making (e.g. “scissors,” “razor,” “nail clipper,” and “sword”

are perhaps like a “knife” because they are all “sharp,” and can be used to “cut something”),

text summarization contextual expansion causal projection cold document classification and other context-oriented inferences

20

Page 21: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Open Information Extraction (1)

University of Washington AI Get answers to natural-language questions!

How can a computer accumulate a massive body of knowledge? What will Web search engines look like in ten years?

To address the questions above, the Open IE project has been developing a Web-scale information extraction system that reads arbitrary text from any domain on the Web, extracts meaningful information and stores in a unified knowledge base for efficient querying. In contrast to traditional information extraction, the Open Information Extraction paradigm attempts to overcome the knowledge acquisition bottleneck by extracting a large number of relations at once.

21

Page 22: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Open Information Extraction (2)

Demo Demo: TextRunner extracted over 500,000,000 assertions from

100 million Web pages.

Software: ReVerb Open Information Extraction Software and additional information.

Data: Horn-clause inference rules learned by the Sherlock system.

Demo: Selectional Preferences from Web Text compute admissible argument values for a relation.

Data: 10,000 Functional Relations learned from Web Text predict the functionality of a phrase.

22

Page 23: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

SILK

SLIK (Semantic Inferencing on Large Knowledge) SILK is the newest part of Vulcan Inc.'s Project Halo

23 http://silk.semwebcentral.org/talk-silk-ruleml2011.pdf

Page 24: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Wolfram|Alpha

Making the world’s knowledge computable http://www.wolframalpha.com/examples/

24

Page 25: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

Knowledge Extraction

the creation of knowledge from structured and unstructured sources (Wikipedia)

25

Page 26: Knowledge Extraction from Textdakchigo.kr/events/part3/pdf/LOD(20140123,02,김평).pdf · 2014-01-23 · Knowledge Extraction from Text (KET) NIPS 2013 (Neural Information Processing

Copyright © 20012~ JNUE

결론

LOD가 확산되기 위한 절차 그 걸림돌은?

누가, 무엇을, 어떻게????

어떻게 구축하고, 확산할 것인가?

지식이 자동화되기 위한 어렵고도 먼 길…..

26