38
On building more human query answering systems Yannis Velegrakis University of Trento

On building more human query answering systems

Embed Size (px)

Citation preview

Page 1: On building more human query answering systems

On building more human query answering systems

Yannis VelegrakisUniversity of Trento

Page 2: On building more human query answering systems

2

Get frustrated when they do not get what they want

They do not know to tell you what they want but point

Page 3: On building more human query answering systems

3

THE EMPTY ANSWER PROBLEM

Page 4: On building more human query answering systems

4

CARDB

Car with ABS, DSL, Manual

Page 5: On building more human query answering systems

5

Dealing with the Empty Answer Problem

Ranking results based on user preferences IR [Baeza11] Databases [Chaudhuri04]

Query relaxation Modify some of the query conditions [Mishra09]

Non - Interactive(-) Suggests all the modification together

But modifications are not all equal and independent(-) Does not take user feedback into account

Users have preferences

Page 6: On building more human query answering systems

6

INTERACTIVE QUERY RELAXATION

Page 7: On building more human query answering systems

7

Interactive Query Relaxation

Suggesting one relaxation at a time Good for naïve users Good for small devices Good for interactive cases (orders through telephone)

Take user feedback into account

Model user preferences

Optimization centric relaxation suggestions User centric (effort, relevance) System-centric (profit)

Page 8: On building more human query answering systems

8

CARDB{ }

Cars with ABS, DSL & Manual Transmission do not exist.Would you like to see cars with only …

The proposal will depend on:

• Likelihood to be accepted

• Belief that it will give an answer

• Results will be preferred

• Effectiveness of the proposal

• Objective function + Acceptance likelihood

Page 9: On building more human query answering systems

9

INTERACTIVE QUERY RELAXATIONPrincipled approach. Optimization-based using a

probabilistic framework on a wide variety of application dependent objective functions

Different to solutions to the many-answers problem

Page 10: On building more human query answering systems

10

CARDB{ }

Cars with ABS, DSL & Manual Trans. do not exist.Would you like to see cars with only ABS and Manual Transmission?DSLABS and DSL ? .

Page 11: On building more human query answering systems

11

The Challenges

Exponential number of relaxations

Modeling of the user preferences

System encoding of the different objective functions

Page 12: On building more human query answering systems

12

The Probabilistic Framework

Having asked the query Q, what is the probability that a user will accept to see the answers of a query Q’(with Q’ being a relaxation of Q)

Page 13: On building more human query answering systems

13

The Probabilistic Framework

Probability of accepting relaxation Q’ of Q User belief that an answer will be found in the

database: Prior Likelihood the user will like the answers of the

relaxed query: Pref

Probability to reject a relaxation Q’ of Q

Page 14: On building more human query answering systems

14

1 for not considered yet# for hard constraints? for the question- for relaxed conditions

Query : (ABS, DSL, Manual)

The Relaxation Tree

Decision Node

Interaction Node

Page 15: On building more human query answering systems

15

The Probabilistic Framework

Probability to reject a relaxation

Cost for an Interaction Node

Cost for a Decision Node …

Page 16: On building more human query answering systems

16

Different objective functions

Maximize profit Pref: favors solutions with highest values of individual

tuplesa static function

Maximize answer relevance Pref: favors solutions with most relevant tuples to original

querySemi-dynamic function (computed only once with the user query

Minimize user effort Pref: favors solutions with least number of user

interactionsfully dynamic function (changes at every relaxation)

Page 17: On building more human query answering systems

17

Probabilistic Framework Implementation

Probability of accepting relaxation Q’ of Q User belief that an answer will be found in the

database: PriorIterative Proportional Filtering (IPF)

Likelihood the user will like the answers of the relaxed query: PrefAny Tuple scoring Function

Page 18: On building more human query answering systems

18

The Min-Effort Relaxation Tree

0 00.3 0.7

1

1 10 0

1 2

1

0.3 0.7

Query : (ABS, DSL, Manual)

Page 19: On building more human query answering systems

19

Full Tree Solution (FullTree)

Idea: Fully create the tree into memory Compute cost of all nodes in a bottom-up fashion

Page 20: On building more human query answering systems

20

Fast Optimal Solution

Idea: Expand Level by Level Prune non-optimal relaxations in advance

Use of an upper and lower bound of cost function

Prune using upper/lower bound reasoning

Page 21: On building more human query answering systems

21

Fast Optimal Solution (Min-Effort)

Prune!!!

Page 22: On building more human query answering systems

22

Cost Distributed Relaxation Solution

Idea: Nodes cost approximated by probability

distribution (Convolution)Relaxation nodes: min/max distribution of CostChoice nodes: sum distribution of Cost

Use cost distribution instead of actual cost Construct the tree first L levels Expand the branch with the biggest probability of

being the optimal

Approximate Algorithm

Page 23: On building more human query answering systems

23

Cost Distributed Relaxation Solution

1. compute the probability that the cost is smaller than the siblings

2. choose the child with the highest probability

Pr(n1<n2) = 0.6 n1 n2 Pr(n2<n1) = 0.4

Expand this!

Page 24: On building more human query answering systems

24

User Effort Comparison

• CDR close to optimal• Random and Greedy

produce 1.5 more relaxations

Page 25: On building more human query answering systems

25

Query Time

Exponential behaviour

Efficient for small queries

1.4 sec for query size 10!!!

Page 26: On building more human query answering systems

26

User Study

Users prefer interactive systems to relaxations all at once

Better quality answers

Page 27: On building more human query answering systems

27

Get frustrated when they do not get what they want

They do not know to tell you what they want

Page 28: On building more human query answering systems

28

Page 29: On building more human query answering systems

29

Page 30: On building more human query answering systems

30

Page 31: On building more human query answering systems

31

Page 32: On building more human query answering systems

32

EXEMPLAR QUERIES

Page 33: On building more human query answering systems

33

Google YouTube

Menlo Park

Business

IT Companies

Search Engines

S

acquired

isA

isA

isA

activity

activity

foundedIn

S. Mateo

California

in

of

Google YouTube Menlo Park

User Query

Page 34: On building more human query answering systems

34

Google YouTube

Menlo Park

Business

IT Companies

Search Engines

S

acquired

isA

isA

isA

isAisA

isA

activity

activity

foundedIn

S. Mateo

California

Auto Industry

of

in

isA

isA

Yahoo! del.icio.us GM Opel

FlintS. Clara

S. Clara County

A2A1

Genesee

Michigan

acquired acquired

in

ofof

in

foundedIn foundedIn

Google YouTube Menlo Park

User Query

Page 35: On building more human query answering systems

35

Get frustrated when they do not get what they want

They do not know to tell you what they want but they point

Empty Answer Problem

Exemplar Queries

Both works have been presented in VLDB 2014 and demonstrated in SIGMOD 2014

Page 36: On building more human query answering systems

36

Controversy Detection in User Generated Content

Hahom MelleseMSc Student

Davide MottinPhD candidate

New Models for Query Answering

Large Scale Social Data AnalyticsUser Profiling, Event Profiling

Matteo LissandriniPhD candidate

Dimitra PapadimitriouPhD candidate

Goal-based Search in short documents

Daniele ForoniPhD candidate

Big Data QualityDistributed multi-matching

Come to our ICDE15 Tutorial

See our Wikipedia ICDE15Paper

The Data & Information Management group @ the University of Trento

Prof. Yannis VelegrakisGroup Leader

He is

graduating

Page 37: On building more human query answering systems

37

Giovanni FrigoBSc Student

Keyword Query on Graphs

Sabeur AridhiPost-doc

Distributed Graph ProcessingK-core decomposition

Paolo SottoviaPhD candidate

Massive Information Extraction from Social Media

The Data & Information Management group @ the University of Trento

Prof. Yannis VelegrakisGroup Leader

Claudio D’AmicoMSc Student

Information Collection at Large Scale

Cristian ConconiPhD Student

Information Extraction and Large Scale Graph Processing

Martin BrugnaraMSc Student

High Level Process and Data Management Systems

Page 38: On building more human query answering systems

38

Thank you for your attention!

Questions?

More at http://www.disi.unitn.eu/~velgias

Group Page: http://db.disi.unitn.eu

• D. Mottin, M. Lissandrini, D. Papadimitriou, Y. Velegrakis and T. Palpanas, "Unleashing the Power of Information Graphs", SIGMOD Record, 7(?), 2014.

• Davide Mottin, M. Lissandrini, Y. Velegrakis and T. Palpanas, "Exemplar Queries: Give me an Example of What You Need", Proceedings of VLDB, 7(5), 2014.

• D. Mottin, A. Marascu, S. Basu Roy, G. Das, T. Palpanas and Y. Velegrakis, "A Probabilistic Optimization Framework for the Empty-Answer Problem", Proceedings of VLDB, 6(14), 2013.

• T. Palpanas and Y. Velegrakis, "dbTrento: The Data and Information Management group at the University of Trento", SIGMOD Record, 41(3), 2012.

• D. Mottin, M. Lissandrini, Y. Velegrakis and T. Palpanas, "Searching with XQ: the eXemplar Query Search Engine", In Proceedings of SIGMOD, 2014.

• D. Mottin, A. Marascu, S. Basu Roy, G. Das, T. Palpanas and Y. Velegrakis, "IQR: An Interactive Query Relaxation System for the Empty-Answer Problem", In Proceedings of SIGMOD, 2014.