Upload
inria-oak
View
73
Download
0
Embed Size (px)
Citation preview
On building more human query answering systems
Yannis VelegrakisUniversity of Trento
2
Get frustrated when they do not get what they want
They do not know to tell you what they want but point
3
THE EMPTY ANSWER PROBLEM
4
CARDB
Car with ABS, DSL, Manual
5
Dealing with the Empty Answer Problem
Ranking results based on user preferences IR [Baeza11] Databases [Chaudhuri04]
Query relaxation Modify some of the query conditions [Mishra09]
Non - Interactive(-) Suggests all the modification together
But modifications are not all equal and independent(-) Does not take user feedback into account
Users have preferences
6
INTERACTIVE QUERY RELAXATION
7
Interactive Query Relaxation
Suggesting one relaxation at a time Good for naïve users Good for small devices Good for interactive cases (orders through telephone)
Take user feedback into account
Model user preferences
Optimization centric relaxation suggestions User centric (effort, relevance) System-centric (profit)
8
CARDB{ }
Cars with ABS, DSL & Manual Transmission do not exist.Would you like to see cars with only …
The proposal will depend on:
• Likelihood to be accepted
• Belief that it will give an answer
• Results will be preferred
• Effectiveness of the proposal
• Objective function + Acceptance likelihood
9
INTERACTIVE QUERY RELAXATIONPrincipled approach. Optimization-based using a
probabilistic framework on a wide variety of application dependent objective functions
Different to solutions to the many-answers problem
10
CARDB{ }
Cars with ABS, DSL & Manual Trans. do not exist.Would you like to see cars with only ABS and Manual Transmission?DSLABS and DSL ? .
11
The Challenges
Exponential number of relaxations
Modeling of the user preferences
System encoding of the different objective functions
12
The Probabilistic Framework
Having asked the query Q, what is the probability that a user will accept to see the answers of a query Q’(with Q’ being a relaxation of Q)
13
The Probabilistic Framework
Probability of accepting relaxation Q’ of Q User belief that an answer will be found in the
database: Prior Likelihood the user will like the answers of the
relaxed query: Pref
Probability to reject a relaxation Q’ of Q
14
1 for not considered yet# for hard constraints? for the question- for relaxed conditions
Query : (ABS, DSL, Manual)
The Relaxation Tree
Decision Node
Interaction Node
15
The Probabilistic Framework
Probability to reject a relaxation
Cost for an Interaction Node
Cost for a Decision Node …
16
Different objective functions
Maximize profit Pref: favors solutions with highest values of individual
tuplesa static function
Maximize answer relevance Pref: favors solutions with most relevant tuples to original
querySemi-dynamic function (computed only once with the user query
Minimize user effort Pref: favors solutions with least number of user
interactionsfully dynamic function (changes at every relaxation)
17
Probabilistic Framework Implementation
Probability of accepting relaxation Q’ of Q User belief that an answer will be found in the
database: PriorIterative Proportional Filtering (IPF)
Likelihood the user will like the answers of the relaxed query: PrefAny Tuple scoring Function
18
The Min-Effort Relaxation Tree
0 00.3 0.7
1
1 10 0
1 2
1
0.3 0.7
Query : (ABS, DSL, Manual)
19
Full Tree Solution (FullTree)
Idea: Fully create the tree into memory Compute cost of all nodes in a bottom-up fashion
20
Fast Optimal Solution
Idea: Expand Level by Level Prune non-optimal relaxations in advance
Use of an upper and lower bound of cost function
Prune using upper/lower bound reasoning
21
Fast Optimal Solution (Min-Effort)
Prune!!!
22
Cost Distributed Relaxation Solution
Idea: Nodes cost approximated by probability
distribution (Convolution)Relaxation nodes: min/max distribution of CostChoice nodes: sum distribution of Cost
Use cost distribution instead of actual cost Construct the tree first L levels Expand the branch with the biggest probability of
being the optimal
Approximate Algorithm
23
Cost Distributed Relaxation Solution
1. compute the probability that the cost is smaller than the siblings
2. choose the child with the highest probability
Pr(n1<n2) = 0.6 n1 n2 Pr(n2<n1) = 0.4
Expand this!
24
User Effort Comparison
• CDR close to optimal• Random and Greedy
produce 1.5 more relaxations
25
Query Time
Exponential behaviour
Efficient for small queries
1.4 sec for query size 10!!!
26
User Study
Users prefer interactive systems to relaxations all at once
Better quality answers
27
Get frustrated when they do not get what they want
They do not know to tell you what they want
28
29
30
31
32
EXEMPLAR QUERIES
33
Google YouTube
Menlo Park
Business
IT Companies
Search Engines
S
acquired
isA
isA
isA
activity
activity
foundedIn
S. Mateo
California
in
of
Google YouTube Menlo Park
User Query
34
Google YouTube
Menlo Park
Business
IT Companies
Search Engines
S
acquired
isA
isA
isA
isAisA
isA
activity
activity
foundedIn
S. Mateo
California
Auto Industry
of
in
isA
isA
Yahoo! del.icio.us GM Opel
FlintS. Clara
S. Clara County
A2A1
Genesee
Michigan
acquired acquired
in
ofof
in
foundedIn foundedIn
Google YouTube Menlo Park
User Query
35
Get frustrated when they do not get what they want
They do not know to tell you what they want but they point
Empty Answer Problem
Exemplar Queries
Both works have been presented in VLDB 2014 and demonstrated in SIGMOD 2014
36
Controversy Detection in User Generated Content
Hahom MelleseMSc Student
Davide MottinPhD candidate
New Models for Query Answering
Large Scale Social Data AnalyticsUser Profiling, Event Profiling
Matteo LissandriniPhD candidate
Dimitra PapadimitriouPhD candidate
Goal-based Search in short documents
Daniele ForoniPhD candidate
Big Data QualityDistributed multi-matching
Come to our ICDE15 Tutorial
See our Wikipedia ICDE15Paper
The Data & Information Management group @ the University of Trento
Prof. Yannis VelegrakisGroup Leader
He is
graduating
37
Giovanni FrigoBSc Student
Keyword Query on Graphs
Sabeur AridhiPost-doc
Distributed Graph ProcessingK-core decomposition
Paolo SottoviaPhD candidate
Massive Information Extraction from Social Media
The Data & Information Management group @ the University of Trento
Prof. Yannis VelegrakisGroup Leader
Claudio D’AmicoMSc Student
Information Collection at Large Scale
Cristian ConconiPhD Student
Information Extraction and Large Scale Graph Processing
Martin BrugnaraMSc Student
High Level Process and Data Management Systems
38
Thank you for your attention!
Questions?
More at http://www.disi.unitn.eu/~velgias
Group Page: http://db.disi.unitn.eu
• D. Mottin, M. Lissandrini, D. Papadimitriou, Y. Velegrakis and T. Palpanas, "Unleashing the Power of Information Graphs", SIGMOD Record, 7(?), 2014.
• Davide Mottin, M. Lissandrini, Y. Velegrakis and T. Palpanas, "Exemplar Queries: Give me an Example of What You Need", Proceedings of VLDB, 7(5), 2014.
• D. Mottin, A. Marascu, S. Basu Roy, G. Das, T. Palpanas and Y. Velegrakis, "A Probabilistic Optimization Framework for the Empty-Answer Problem", Proceedings of VLDB, 6(14), 2013.
• T. Palpanas and Y. Velegrakis, "dbTrento: The Data and Information Management group at the University of Trento", SIGMOD Record, 41(3), 2012.
• D. Mottin, M. Lissandrini, Y. Velegrakis and T. Palpanas, "Searching with XQ: the eXemplar Query Search Engine", In Proceedings of SIGMOD, 2014.
• D. Mottin, A. Marascu, S. Basu Roy, G. Das, T. Palpanas and Y. Velegrakis, "IQR: An Interactive Query Relaxation System for the Empty-Answer Problem", In Proceedings of SIGMOD, 2014.