Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
DYNAMICALLY SUPPORTING UNEXPLORED DOMAINS IN CONVERSATIONAL INTERACTIONS
BY ENRICHING SEMANTICS WITH NEURAL WORD EMBEDDINGS
Yun-Nung (Vivian) Chen and Alexander I. Rudnicky
1. The Task
• We propose an unsupervised
approach for acquiring open domain
knowledge based on a user’s verbal
request.
• We use structured knowledge to
extract slot types as semantic
seeds to obtain domain-related
information, and retrieve more the
most relevant applications without
supervision.
• We enable the system to properly
react to users’ queries, such as
providing relevant applications or
suggesting users install applications
that support uncovered domains,
based on their open domain requests.
probability that user speaks
Q to make the request for
launching the application A
Semantic parsing performs well on a
generic domain, but cannot recognize
domain-specific named entities.
Motivations
o A typical SDS needs a predefined task domain that supports specific functionality; it is not able to
dynamically support functions provided by newly installed or not yet installed apps.
o Structured knowledge resources are available (e.g. Freebase, Wikipedia, FrameNet) and may
provide semantic information that allows new functionality to be linked into the domain.
o Neural word embeddings can provide semantic knowledge via unsupervised training.
Approaches
1. Generating semantic seeds by using knowledge resources
2. Enriching the semantics with neural word embeddings
3. Retrieving relevant applications or dynamically suggesting users install the applications that
support new domain functionality.
Results
o Compared to original queries, using the Freebase knowledge resource (sufficient information
about named entities) to extract slot types for enriching semantics of queries achieves 25% and
18% relative improvement of MAP and P@5 respectively.
2. Framework
• Domain: 13 important application types, accessed the most frequently from Google Play
o Speech data collected from the users with intents described pictorially (WER = 19.8%)
• Main idea: Types of slots help determine semantic meaning of the utterance for expanding domain knowledge.
In an open domain, with spoken queries, how can we dynamically and effectively provide the
corresponding functions to fulfill users’ requests?
Approach ASR
MAP P@5
Original Query 25.50 34.97
Embedding-Enriched 30.42 40.72
Type-Embed.-Enriched
Frame 30.11 39.59
Wikipedia 30.74 40.82
Freebase 32.02 41.23
Hand-Craft 34.91 45.03
“play lady gaga’s bad romance”
1. Semantic Seed Generation
The Semantic Seeds (Slot Types)
2. Semantics Enrichment
Frame-Semantic Parsing
Entity Linking
Wikipedia
Freebase
Structured Knowledge
Word Embeddings
Enrichment Process
3. Retrieval Process
Ranking Model
Ranked Applications
Pandora singer
songwriter song
music :
Application Data
Query Utterance
3. Semantic Seed Generation
• Frame Type of Semantic Parsing
Q: compose an email to alex
Frame: text creation FT LU: compose FE LU: an email
Frame: contacting FT LU: email
Sfrm(Q): frame-based semantic seeds
• Entity Type from Linked Structured Knowledge
。Wikipedia Page Linking 。Freebase List Linking
Q: play lady gaga’s bad romance
… is an American singer, songwriter, and actress.
… is a song by American singer …
Swk(Q): wikipedia-based semantic seeds
celebrity composition
:
composition canonical version musical recording
:
Q: play lady gaga’s bad romance
Sfb(Q): freebase-based
semantic seeds
Enriching semantics improves performance by involving domain-specific knowledge.
Freebase results are better than the embedding-enriched method when |Q’| > 50, especially for P@5, showing that we can
effectively and efficiently expand domain-specific knowledge by types of slots from Freebase.
Hand-crafted mapping shows that the correct types of slots offer better understanding and tells the room of improvement.
4. Semantics Enrichment • Main idea: Use distributed word embeddings to obtain the semantically related
knowledge for each word.
1) Model word embeddings by using application vender descriptions.
2) Extract the most related words by trained word embeddings for
each semantic seed.
5. Retrieval Process
• Query Reformulation (Q’)
。Embedding-Enriched Query: integrates
similar words to all words in Q
。Type-Embedding-Enriched Query:
additionally adds similar words to semantic
seeds S(Q)
• Ranking Model
• Main idea: retrieve the applications that are
more likely to support users’ requests via
vender descriptions
Words with higher similarity suggest that they often occur with
common contexts in the embedding training data.
“text” “message”, “msg”
probability that word x
from Q’ occurs in the
application A
6. Experiments
Lady Gaga Bad Romance
Trailer of Iron Man 3
Alex
Alex
Alex
Alex “I can meet”
…
“I graduated”
…
…
? CMU
?
CMU
English: university
Chinese: ?
1. music listening
2. video watching
7. post to social websites
4. video chat
5. send an email
8. share the photo
6. text
9. share the video
3. make a phone call
10. navigation
11. address request
12. translation
13. read the book
0.22
0.26
0.30
0.34
0.38
0 25 50 75 100 125 150 175 200
MAP
#word / query
Baseline Embedding-Enriched (T)Type-Embedding-Enriched: Frame (T) Type-Embedding-Enriched: Wikipedia (T)Type-Embedding-Enriched: Freebase (T) Type-Embedding-Enriched: Hand-crafted (T)
0.25
0.31
0.37
0.43
0.49
0 25 50 75 100 125 150 175 200
P@5
#word / query
7. Conclusions
[email protected] vivian.ynchen The application with higher P(Q | A) is more likely
to be able to support the user desired functions.