On building more human query answering systems

On building more human query answering systems

Yannis VelegrakisUniversity of Trento

2

Get frustrated when they do not get what they want

They do not know to tell you what they want but point

3

THE EMPTY ANSWER PROBLEM

4

CARDB

Car with ABS, DSL, Manual

5

Dealing with the Empty Answer Problem

Ranking results based on user preferences IR [Baeza11] Databases [Chaudhuri04]

Query relaxation Modify some of the query conditions [Mishra09]

Non - Interactive(-) Suggests all the modification together

But modifications are not all equal and independent(-) Does not take user feedback into account

Users have preferences

6

INTERACTIVE QUERY RELAXATION

7

Interactive Query Relaxation

Suggesting one relaxation at a time Good for naïve users Good for small devices Good for interactive cases (orders through telephone)

Take user feedback into account

Model user preferences

Optimization centric relaxation suggestions User centric (effort, relevance) System-centric (profit)

8

CARDB{ }

Cars with ABS, DSL & Manual Transmission do not exist.Would you like to see cars with only …

The proposal will depend on:

• Likelihood to be accepted

• Belief that it will give an answer

• Results will be preferred

• Effectiveness of the proposal

• Objective function + Acceptance likelihood

9

INTERACTIVE QUERY RELAXATIONPrincipled approach. Optimization-based using a

probabilistic framework on a wide variety of application dependent objective functions

Different to solutions to the many-answers problem

10

CARDB{ }

Cars with ABS, DSL & Manual Trans. do not exist.Would you like to see cars with only ABS and Manual Transmission?DSLABS and DSL ? .

11

The Challenges

Exponential number of relaxations

Modeling of the user preferences

System encoding of the different objective functions

12

The Probabilistic Framework

Having asked the query Q, what is the probability that a user will accept to see the answers of a query Q’(with Q’ being a relaxation of Q)

13


Probability of accepting relaxation Q’ of Q User belief that an answer will be found in the

database: Prior Likelihood the user will like the answers of the

relaxed query: Pref

Probability to reject a relaxation Q’ of Q

14

1 for not considered yet# for hard constraints? for the question- for relaxed conditions

Query : (ABS, DSL, Manual)

The Relaxation Tree

Decision Node

Interaction Node

15


Probability to reject a relaxation

Cost for an Interaction Node

Cost for a Decision Node …

16

Different objective functions

Maximize profit Pref: favors solutions with highest values of individual

tuplesa static function

Maximize answer relevance Pref: favors solutions with most relevant tuples to original

querySemi-dynamic function (computed only once with the user query

Minimize user effort Pref: favors solutions with least number of user

interactionsfully dynamic function (changes at every relaxation)

17

Probabilistic Framework Implementation

Probability of accepting relaxation Q’ of Q User belief that an answer will be found in the

database: PriorIterative Proportional Filtering (IPF)

Likelihood the user will like the answers of the relaxed query: PrefAny Tuple scoring Function

18

The Min-Effort Relaxation Tree

0 00.3 0.7

1

1 10 0

1 2

1

0.3 0.7

Query : (ABS, DSL, Manual)

19

Full Tree Solution (FullTree)

Idea: Fully create the tree into memory Compute cost of all nodes in a bottom-up fashion

20

Fast Optimal Solution

Idea: Expand Level by Level Prune non-optimal relaxations in advance

Use of an upper and lower bound of cost function

Prune using upper/lower bound reasoning

21

Fast Optimal Solution (Min-Effort)

Prune!!!

22

Cost Distributed Relaxation Solution

Idea: Nodes cost approximated by probability

distribution (Convolution)Relaxation nodes: min/max distribution of CostChoice nodes: sum distribution of Cost

Use cost distribution instead of actual cost Construct the tree first L levels Expand the branch with the biggest probability of

being the optimal

Approximate Algorithm

23

Cost Distributed Relaxation Solution

1. compute the probability that the cost is smaller than the siblings

2. choose the child with the highest probability

Pr(n1<n2) = 0.6 n1 n2 Pr(n2<n1) = 0.4

Expand this!

24

User Effort Comparison

• CDR close to optimal• Random and Greedy

produce 1.5 more relaxations

25

Query Time

Exponential behaviour

Efficient for small queries

1.4 sec for query size 10!!!

26

User Study

Users prefer interactive systems to relaxations all at once

Better quality answers

27


They do not know to tell you what they want

28

29

30

31

32

EXEMPLAR QUERIES

33

Google YouTube

Menlo Park

Business

IT Companies

Search Engines

S

acquired

isA

isA

isA

activity

activity

foundedIn

S. Mateo

California

in

of

Google YouTube Menlo Park

User Query

34

Google YouTube

Menlo Park

Business

IT Companies

Search Engines

S

acquired

isA

isA

isA

isAisA

isA

activity

activity

foundedIn

S. Mateo

California

Auto Industry

of

in

isA

isA

Yahoo! del.icio.us GM Opel

FlintS. Clara

S. Clara County

A2A1

Genesee

Michigan

acquired acquired

in

ofof

in

foundedIn foundedIn

Google YouTube Menlo Park

User Query

35


They do not know to tell you what they want but they point

Empty Answer Problem

Exemplar Queries

Both works have been presented in VLDB 2014 and demonstrated in SIGMOD 2014

36

Controversy Detection in User Generated Content

Hahom MelleseMSc Student

Davide MottinPhD candidate

New Models for Query Answering

Large Scale Social Data AnalyticsUser Profiling, Event Profiling

Matteo LissandriniPhD candidate

Dimitra PapadimitriouPhD candidate

Goal-based Search in short documents

Daniele ForoniPhD candidate

Big Data QualityDistributed multi-matching

Come to our ICDE15 Tutorial

See our Wikipedia ICDE15Paper

The Data & Information Management group @ the University of Trento

Prof. Yannis VelegrakisGroup Leader

He is

graduating

37

Giovanni FrigoBSc Student

Keyword Query on Graphs

Sabeur AridhiPost-doc

Distributed Graph ProcessingK-core decomposition

Paolo SottoviaPhD candidate

Massive Information Extraction from Social Media

The Data & Information Management group @ the University of Trento

Prof. Yannis VelegrakisGroup Leader

Claudio D’AmicoMSc Student

Information Collection at Large Scale

Cristian ConconiPhD Student

Information Extraction and Large Scale Graph Processing

Martin BrugnaraMSc Student

High Level Process and Data Management Systems

38

Thank you for your attention!

Questions?

More at http://www.disi.unitn.eu/~velgias

Group Page: http://db.disi.unitn.eu

• D. Mottin, M. Lissandrini, D. Papadimitriou, Y. Velegrakis and T. Palpanas, "Unleashing the Power of Information Graphs", SIGMOD Record, 7(?), 2014.

• Davide Mottin, M. Lissandrini, Y. Velegrakis and T. Palpanas, "Exemplar Queries: Give me an Example of What You Need", Proceedings of VLDB, 7(5), 2014.

• D. Mottin, A. Marascu, S. Basu Roy, G. Das, T. Palpanas and Y. Velegrakis, "A Probabilistic Optimization Framework for the Empty-Answer Problem", Proceedings of VLDB, 6(14), 2013.

• T. Palpanas and Y. Velegrakis, "dbTrento: The Data and Information Management group at the University of Trento", SIGMOD Record, 41(3), 2012.

• D. Mottin, M. Lissandrini, Y. Velegrakis and T. Palpanas, "Searching with XQ: the eXemplar Query Search Engine", In Proceedings of SIGMOD, 2014.

• D. Mottin, A. Marascu, S. Basu Roy, G. Das, T. Palpanas and Y. Velegrakis, "IQR: An Interactive Query Relaxation System for the Empty-Answer Problem", In Proceedings of SIGMOD, 2014.

http://www.disi.unitn.eu/~velgias

http://www.disi.unitn.eu/~velgias

http://db.disi.unitn.eu/

http://disi.unitn.it/~velgias/docs/MottinLPVP14.pdf

http://disi.unitn.it/~velgias/docs/MottinLVP14.pdf

http://disi.unitn.it/~velgias/docs/MottinLVP14.pdf

http://disi.unitn.it/~velgias/docs/MottinMBDPV13.pdf



http://disi.unitn.it/~velgias/docs/PalpanasV12.pdf



http://disi.unitn.it/~velgias/docs/MottinLVP14b.pdf



Science

On building more human query answering systems