KAIST Education for the World, Research for the Future
인간의 경험 공유를 위한
태스크 및 컨텍스트 추출 및 표현
2012. 11. 29
류지희
웹사이언스공학 전공
정보검색 및 자연어처리 연구실
Why Human Experience Sharing?
© 2012 IR&NLP Lab. All rights reserved. 2
Necessity of Experiential Problem Solving Knowledge
© 2012 IR&NLP Lab. All rights reserved. 3
1. Loosen lug nuts on tire.
2. Install spare tire.
User Context Info
[On U.S. highway]
[1 year driving experience]
[Heading to New York]
[Female]
user
A. Change a Flat Tire When You Are a Woman Alone
1. Call AAA.
2. Be placed on “hold”.
B. Change a Tire like a Real Woman
Experience Mining
© 2012 IR&NLP Lab. All rights reserved. 4
Building a Relational Knowledge about Experiences
Event People Place Time
Play Soccer Yongho, … Expo Park 2011-08-10
Play Baseball Chulsoo, … Gapchun Park 2009-09-02
…
Event (Type)
People (Type)
Place (Type)
Time (Type)
(Sport) (student) (Park) (Summer)
…
Experiential Sentences &
Context
Experiential Knowledge
Web
Experiential Knowledge Distillation
Context-anchored
Automatic extraction
Aggregation & abstraction
From What?
© 2012 IR&NLP Lab. All rights reserved. 5
Various types of open contents on the Web!
How-to articles
Blog posts
Microblog posts
Human Experiential KB
Human Task mining
Event Context mining
Place Semantics mining
Human Task Mining
© 2012 IR&NLP Lab. All rights reserved. 6
Human Task Model
© 2012 IR&NLP Lab. All rights reserved. 7
Topic
Goal
Action
Object Time Location
hasTopic
hasNextAction
hasObject hasTime hasLocation
hasAction
Human Task Extraction
© 2012 IR&NLP Lab. All rights reserved. 8
Title How to Make Omelet Soup
Step 1 Place the water or canned chicken broth
in a large saucepan.
Boil the sweet yellow onion for several
minutes.
Step 2 Add the powdered chicken broth along
with the canned mushrooms.
Boil the soup for a few more minutes,
and then add the chopped green onion.
Step 3 Drop the eggs into the simmering broth
a few minutes before you're ready to
serve the omelet soup.
(boil, onion)
(add, broth)
(boil, soup)
water broth
onion soup egg
Action Sequence
Goal
(place, water) (place, broth)
(drop, egg)
(add, onion)
Ingredients
Make Omelet Soup
Hybrid Extraction Method
© 2012 IR&NLP Lab. All rights reserved. 9
No
Yes
Yes
Eat fruit every day.
Turn off the car. (eat, fruit)
(turn off, car)
Syntactic Patterns
CRFs Model
Sentences
Retrieve and apply
a rule
Select the best
label sequence
Matched?
Prob. > threshold
Extract
verb and ingredients
Next Challenging Issues
© 2012 IR&NLP Lab. All rights reserved. 10
A large fraction of sentences (more than 40%) in how-to instructions are not imperative sentences.
Difficulties arising from variations in writing Scoping ambiguity
E.g. Clear or glitter nail polish should go on the nails.
Anaphora E.g. Make it fun and unique
Condition E.g. If your computers are only a few years old
Ellipsis E.g. So why don't you?
Implicit meaning E.g. Studying improves grades. (Study hard!)
Grammatical mistake E.g. IM a friend! (Make friend relationship in a instance messenger)
Case Percentage
Scoping Ambiguity
13.9%
Anaphora 13.1%
Condition 11.9%
Ellipsis 1.9%
Implicit meaning
1.3%
Grammatical mistake
1.3%
Etc. 56.6%
Case Percentage in all the clauses in
30 sample documents
Feature Sets
© 2012 IR&NLP Lab. All rights reserved. 11
Feature Type Feature Name Feature Values
Syntactic Features
Clause Type main, subordinate
Person 1st person, 2nd person, 3rd person
Auxiliary Verb will, shall, can, may, must, able to, …
Voice active, passive, n/a
Tense past, present, future
Polarity negated, non-negated
Feature Type Feature Name Examples
Modality Features
Obligation • You have to ask about the car.
Permission • You can search for the world weather.
Explanation • The cost for delivery is already included.
Supposition • You will have access to the weather.
Result: Actionable Clause Detection
© 2012 IR&NLP Lab. All rights reserved. 12
Task Used Feature Sets F1(NB) F1(DT) F1(SVM)
Actionable Clause
Detection
Syntactic Features
(micro only) 0.933 0.942 0.948
+ Modality Features
(micro ¯o) 0.862 0.963 0.966
NB : Naïve Bayes DT : Decision Tree SVM : Support Vector Machines
Bridge to Semantic Web
© 2012 IR&NLP Lab. All rights reserved. 13
AcTN knowledge representation YAGO knowledge representation
Changing Data Representation
© 2012 IR&NLP Lab. All rights reserved. 14
Current Form
Refined tabular data records [plain text]
Ultimate Target Form
Well-designed ontology entries
[well-formed RDF]
Event Context Mining
© 2012 IR&NLP Lab. All rights reserved. 15
What is an Event?
© 2012 IR&NLP Lab. All rights reserved. 16
Events are defined as situations that happen
Punctual (example 1-2) or last for a period of time (example 3-4)
States in which something holds true (example 5)
Examples Ferdinand Magellan, a Portuguese explorer, first reached the islands in search of spices.
(1)
A fresh flow of lava, gas and debris erupted there Saturday. (2)
11,024 people were evacuated to 18 disaster relief centers. (3)
“We’re expecting a major eruption,” he said in a telephone interview early today.
(4)
Israel has been scrambling to buy more masks abroad, after a shortage of several hundred thousand gas masks.
(5)
Event Expressions
© 2012 IR&NLP Lab. All rights reserved. 17
Event may be expressed in the following forms
Type Example
Verb A fresh flow of lava, gas and debris erupted there Saturday.
Noun
Israel will ask the United States to delay a military strike ag
ainst Iraq until the Jewish state is fully prepared for a possib
le Iraqi attack.
Adjective A Philippine volcano, dormant for six centuries, began expl
oding with searing gases, thick ash and deadly debris.
Predicative clause “There is no reason why we would not be prepared,” Mord
echai told the Yediot Ahronot daily.
Prepositional phrase All 75 people on board the Aeroflot Airbus died.
Feature Sets
© 2012 IR&NLP Lab. All rights reserved. 18
Basic Features
Named entity (NE) tags and an indication of whether the target noun is prenominal or not.
Lexical Semantic Features (LS)
The set of target nouns’ lemmas and their WordNet hypernyms
Dependency-based Features (DF)
Nouns become events if they occur with a certain surrounding context, namely, syntactic dependencies
Dependency-based Features sometimes need to be combined with Lexical Semantic Features
Comparing with Previous Work
© 2012 IR&NLP Lab. All rights reserved. 19
An improvement of about 0.22 (precision) and 0.09 (recall) over the state-of-the-art, respectively.
0.718
0.577
0.95
0.584
0.483
0.727
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
F1
Recall
Precision
Llorens et al. (2010) Proposed Method
Place Semantics Mining
© 2012 IR&NLP Lab. All rights reserved. 20
Place Semantics
© 2012 IR&NLP Lab. All rights reserved. 21
AS GPS-enabled mobile devices have come into wide use, Location based services catch popularity
But it is hard to provide appropriate context-aware services to users when the system only use user’s location, i.e. GPS(latitude, longitude)
Contrary to location, Place is space where people impart a meaning
If we know the meaning of the place, Place Semantics, we can serve much better suitable services to users
Motivation
© 2012 IR&NLP Lab. All rights reserved. 22
Scenario
Recently, Lena moved to Korea from USA. She doesn’t know Korean culture and geography at all because she didn’t leave outside USA before.
How about Olympic Bowling Alley?
Is there similar places with Brooklyn Bowl that I often visited in order to relieve stress?
No. Thanks! It’s NOT the place I wanted.
Brooklyn Bowl is a bowling alley in New York City. People enjoy bowling, have a party, drink beer and hold a music event in Brooklyn Bowl.
Place Semantics Mining
© 2012 IR&NLP Lab. All rights reserved. 23
People leave texts about “why they visit, what they do” when they check-in at Place on Foursquare
We can know the perception of places from those texts
We apply LDA to extract Place Semantics A document is composed of texts written in a place.
Place
“text”
Similarity between Two Places
© 2012 IR&NLP Lab. All rights reserved. 24
How about XL Night Club?
Is there similar places with Brooklyn Bowl that I often visited in order to relieve stress?
32%
27%
18% 11%
7%
5%
Have a party & Drink beer
Enjoy a music show
After work
Eat food
Watch sports game
Others
Brooklyn Bowl XL Night Club
41%
26%
30%
3%
Concluding Remarks
© 2012 IR&NLP Lab. All rights reserved. 25
Application of Our Results
© 2012 IR&NLP Lab. All rights reserved. 26
Semantic Annotation
Adds diversity and richness to text processing
© 2012 IR&NLP Lab. All rights reserved. 27
Thank you!
KAIST Education for the World, Research for the Future
Jihee Ryu ([email protected])
http://jihee.kr
IR&NLP Lab
http://ir.kaist.ac.kr
Yoonjae Jeong ([email protected])
Sung-Hyon Myaeng ([email protected])
http://ir.kaist.ac.kr/member/professor/
Eunyoung Kim ([email protected])
Reference
© 2012 IR&NLP Lab. All rights reserved. 29
1) Jung, Y., Ryu, J., Kim, K., Myaeng, S.H.: Automatic Construction of a Large-Scale Situation Ontology by Mining How-to Instructions from the Web. Web Semantics: Science, Services and Agents on the World Wide Web (2010)
2) Ryu, J., Jung, Y., Kim, K., Myaeng, S.H.: Automatic Extraction of Human Activity Knowledge from Method-Describing Web Articles. 1st Workshop on Automated Knowledge Base Construction (2010)
3) Park, K.C., Jeong, Y., Myaeng, S.H.: Detecting Experiences from Weblogs. 48th Annual Meeting of the Association for Computational Linguistics (2010)
4) Ryu, J., Jung, Y., Myaeng, S.H.: Actionable Clause Detection from Non-imperative Sentences in How-to Instructions: A Step for Actionable Information Extraction. 15th International Conference on Text, Speech and Dialogue (2012)
5) Jeong, Y., Myaeng, S.H.: Using Syntactic Dependencies and WordNet Classes for Noun Event Recognition. Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web in conjunction with the 11th International Semantic Web Conference 2012 (2012)
6) Carter, E., Donald, J.: Space and place: theories of identity and location. Lawrence & Wishart Ltd. (1993)
Data Collection: How-to Articles
© 2012 IR&NLP Lab. All rights reserved. 30
General How-to Articles
1,850,725 articles from eHow & 109,781 articles from wikiHow
eHow Category Group # doc wikiHow Category Group # doc
Computers & Software, Internet 323,289 Computers, Electronics 18,265
Home Building & Design & Safety 307,277 Family Life, Home, Pets, Relationships 18,220
Culture, Holidays, Hobbies, Weddings 238,143 Hobbies, Holidays, Travel 14,514
Business, Investment, Personal Finance 153,458 Health, Sports 14,161
Arts, Entertainment, Music 149,426 Youth 9,161
Family, Parenting, Pets, Plants 135,909 Personal Care, Style 7,031
Cars, Car Repair 108,386 Education, Communications 6,775
Healthcare, Fitness, Sports 103,758 Finance, Business, Work 6,729
Education, Careers, Employment 103,717 Food, Entertaining 6,099
Electronics 101,403 Arts, Entertainment 5,151
Food, Recipes 63,553 Cars, Vehicles 2,316
Fashion, Beauty 62,406 Philosophy, Religion 1,359
Total (As from December 2011) 1,850,725 Total (As from December 2011) 109,781