50
Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

Embed Size (px)

Citation preview

Page 1: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

Ranking in DB

Laks V.S. Lakshmanan

Depf. of CS

UBC

Page 2: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 2

Why ranking in query answering? 1/3

• Mutimedia data – fuzzy querying: e.g., “find top 2 red objects with a soft texture”.

Obj Score

D 0.85

B 0.80

A 0.75

E 0.65

C 0.60

Obj Score

A 0.9

D 0.8

C 0.4

B 0.3

E 0.1

Combine scores

Overall score

Page 3: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 3

Why ranking? 2/3

• IR: “find top 5 documents relevant to `computational’, `neuroscience’ and `brain theory’. – IR systems maintain full text indexes; inverted lists of

docs w.r.t. each keyword. – Same Q/A paradigm as before.

• Buying a home: several criteria – price, location, area, #BRs, school district. ORDER BY query in SQL.

• Finding hotels while traveling.

Page 4: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 4

Why ranking? 3/3• Data stream, e.g., of network flow data: “find 10

users with the max. BW consumption and max. #packets communicated”. – score may be complex aggregation of these two measures.

• In a social net, find 5 items tagged as most relevant to “lawn mowing” and blonging to users socially close to the seeker.

• And now, find top-k recs (recommender systems). • etc. • Fagin et al. – pioneering papers PODS’96, 01,

JCSS 2003. Burgeoned into a field now. • Focus on middleware algorithm, which given a score

combo. function, computes top-k answers by probing diff. subsystems (or ranked lists).

Page 5: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 5

Computational model • Naïve method. • How to compute top-K efficiently? • Access methods:

– Sorted access (sequential access) [SA]. – Random access [RA].

• Diff. optimization metrics: – Overall running time of algorithm. – SA < RA: minimize RAs. – RA not possible#: avoid RAs. – Combined optimization.

• Has led to a variety of algorithms. • Memory vs. disk model. • For the most part, assume score agg. is a monotone function;

use SUM in examples.

#: typical in IR systems.

Page 6: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 6

Fagin’s Algorithm (FA) • m lists sorted by descending scores. • Access (SA) all lists in parallel.

– For each new object seen, fetch scores from other lists by RA. Overall score t(x) = t(x1, …, xm). Store (obj, score) in set Y.

– Remember each object seen (under SA) in all lists in set H.

• Repeat until |H| >= K. • Sort Y in descending order of scores, breaking

ties arbitrarily, and output top K.

Page 7: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 7

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

Answers seen in >=1 list, i.e., Y

unsorted.

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen (under SA) in all 4 lists, i.e., H.

Page 8: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 8

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

Page 9: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 9

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

3.30

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

Page 10: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 10

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

3.30

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

2.65

Page 11: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 11

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

3.30

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

2.65

3.40

3.05

Page 12: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 12

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

3.05

3.40

3.05

3.15

3.30

2.65

Page 13: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 13

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

3.05

3.40

3.05

3.15

3.30

2.65

2.55

Page 14: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 14

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

3.05

3.40

3.05

3.15

3.30

2.65

2.55

H

Page 15: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 15

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

3.05

3.40

3.05

3.15

3.30

2.65

2.55

H, G

Page 16: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 16

Example of FA

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Answers seen in >=1 list, i.e., Y

unsorted.

Answers seen (under SA) in all 4 lists, i.e., H.

3.05

3.40

3.05

3.15

3.30

2.65

2.55

H, G, B, C

2.05

|H| = 4.

Page 17: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 17

FA Example concluded • A, F – not seen in any list. Yet, we are sure they

can’t make it to top-4. Why? • Based on where the cursors are now, what’s the

max. possible score for A, F? • What assumptions are being made about t()? • FA is shown to be optimal with very high

probability [Fagin: PODS 1996]. • But can be beaten by other algorithms on

specific inputs. • What about buffer size?

Page 18: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 18

Threshold Algorithm

• Do parallel SA on all m lists. • For each object x seen under SA in a list, fetch its

scores from other lists by RA and compute overall score.

• If |Buffer| < K add x to Buffer; • Else if score(x) <= k-th score in buffer, toss;

• Else replace bottom of buffer with (x, score(x)) & resort.

• Stop when threshold <= k-th score in buffer. • Threshold := t(worst score seen on L1, …, worst

score seen on Lm). • Output the top-K objects & scores (in buffer).

Page 19: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 19

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Page 20: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 20

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

Page 21: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 21

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar:

x1 x2 x3 x4 0.95 1.00 0.95 1.00

Page 22: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 22

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar: T = 3.90.

x1 x2 x3 x4 0.95 1.00 0.95 1.00

3.40

3.05

2.65

Page 23: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 23

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar: T=3.60.

x1 x2 x3 x4 0.90 0.95 0.80 0.95

3.40

3.05

2.65 X

3.05 X

3.15

Page 24: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 24

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar: T=3.30.

x1 x2 x3 x4 0.85 0.85 0.70 0.90

3.40

3.05

2.65 X

3.05 X

3.15

2.55 X

Page 25: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 25

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar: T=3.10.

x1 x2 x3 x4 0.80 0.80 0.65 0.85

3.40

3.05

2.65 X

3.05 X

3.15

2.55 X

Page 26: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 26

TA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

3.30

Threshold Bar: T=2.90. ==> can stop!

x1 x2 x3 x4 0.75 0.75 0.60 0.80

3.40

3.05

2.65 X

3.05 X

3.15

2.55 X

Page 27: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 27

TA Remarks

Page 28: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

TA is Instance Optimal

04/21/23 28

Page 29: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

TA IO Proof (contd.)

04/21/23 29

Page 30: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

Proof (contd.)

04/21/23 30

Page 31: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

Proof (contd.)

04/21/23 31

Page 32: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

Proof (contd.)

04/21/23 32

Page 33: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

Proof (concluded)

04/21/23 33

Page 34: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 34

No Random Access Algorithm• What if RA > SA or RA wasn’t allowed? • Do SA on all lists in parallel. At depth d:

– Maintain worst scores x1, …, xm. – x any object seen in lists {1, …, i}.

• Best(x) = t(x1, …, xi, xi+1, …, xm). • Worst(x) = t(x1, …, xi, 0, …, 0).

– TopK contains K objects with max worst scores at depth d. Break ties using Best. M = k-th Worst score in TopK.

– Object y is viable if Best(y) > M. • Stop when TopK contains >=K distinct objects

and no object outside TopK is viable. Return TopK.

Page 35: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 35

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[0.95, 3.90]

[1.00, 3.90]

[0.95, 3.90]

[1.00, 3.90]

Page 36: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 36

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[0.95, 3.65]

[1.80, 3.65]

[1.90, 3.75]

[1.00, 3.65]

[0.90, 3.60]

[0.95, 3.60]

Page 37: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 37

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[1.85, 3.40]

[1.80, 3.55]

[1.90, 3.65]

[1.85, 3.40]

[0.90, 3.35]

[1.80, 3.35]

[0.70, 3.30]

Page 38: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 38

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[3.30, 3.30]

[1.80, 3.45]

[2.70, 3.55]

[1.85, 3.30]

[1.75, 3.20]

[1.80, 3.25]

[0.70, 3.15]

Page 39: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 39

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[3.30, 3.30]

[1.80, 3.35]

[2.70, 3.50]

[2.60, 3.20]

[1.75, 3.10]

[3.15, 3.15]

[1.50, 3.00]

Page 40: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 40

NRA Example

L1 L2 L3 L4H(0.95)

C(0.80

A

B

C

D

E

F

G

H

I

J

B(0.90)

E(0.85)

G(0.75)

I(0.70)

D(0.65)

A(0.60)

J(0.55)

F(0.50)

J(1.00)

C(0.95)

G(0.85)

H(0.80)

E(0.75)

B(0.75)

F(0.60)

A(0.50)

D(0.40)

I(0.30)

C(0.95)

J(0.80)

D(0.70)

H(0.65)

G(0.60)

B(0.55)

I(0.50)

E(0.45)

F(0.40)

A(0.30)

E(1.00)

G(0.95)

H(0.90)

B(0.85)

D(0.80)

C(0.70)

A(0.65)

I(0.55)

F(0.45)

J(0.30)

[3.30, 3.30]

[1.80, 3.20]

[3.40, 3.40]

[2.60, 3.15]

[3.05, 3.05]

[3.15, 3.15]

[1.50, 2.95]

[0.70, 2.70]

Page 41: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 41

NRA Features

• What sort of t() do we need to assume, for NRA to work correctly?

• How large can the buffers get?

• How does the amount of bookkeeping compare with TA?

• NRA is instance optimal over algo’s not making RA (and of course, not making wild guesses).

Page 42: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 42

Combined optimization

• What if we are told cost(RA) = .cost(SA)?

• Can we find algo’s better than NRA and TA in this case?

• Combined algorithm = CA. (See Fagin et al.’s paper for details.)

Page 43: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 43

Worrying about I/O cost

• Based on Bast et al. VLDB 2006.

• Inverted lists of (itemID, score) entries in desc. score order, as usual, but on disk.

• Blocks sorted by itemID; across blocks still in desc. score order.

Inverted Block Index (IBI) Algorithm.

• What is an IBI?

Page 44: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 44

A Motivating Example List 1 List 2 List 3 Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3 · Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·

Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]unseen: ≤ 2.4

Page 45: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 45

A Motivating Example List 1 List 2 List 3

Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3

· Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·

Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]

unseen: ≤ 2.4

Round 2 (SA on 1,2,3)Doc17 : [1.5 , 2.0]Doc25 : [0.7 , 1.6]Doc83 : [0.9 , 1.6]

unseen: ≤ 1.4

Page 46: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 46

A Motivating Example List 1 List 2 List 3

Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3

· Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·

Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]

unseen: ≤ 2.4

Round 2 (SA on 1,2,3)Doc17 : [1.5 , 2.0]Doc25 : [0.7 , 1.6]Doc83 : [0.9 , 1.6]

unseen: ≤ 1.4

Round 3 (SA on 2,2,3!)Doc17 : [1.5 , 2.0]Doc83 : [1.4 , 1.6]

unseen: ≤ 1.0

Page 47: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 47

A Motivating Example List 1 List 2 List 3

Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3

· Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·

Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]

unseen: ≤ 2.4

Round 2 (SA on 1,2,3)Doc17 : [1.5 , 2.0]Doc25 : [0.7 , 1.6]Doc83 : [0.9 , 1.6]

unseen: ≤ 1.4

Round 3 (SA on 2,2,3!)Doc17 : [1.5 , 2.0]Doc83 : [1.4 , 1.6]

unseen: ≤ 1.0

Round 4 (RA for Doc17)Doc17 : 1.7

all others < 1.7done!

Note deviation from round-robin.

Page 48: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 48

IBI Algorithm

• Same setting as NRA/CA, except use IBI. • Maintain two lists: Top-K items (T = d1, …, dk) and

StillHaveASHot (SHASH) (S = dk+1, …, dk+q) items. • Pos_i = curr cursor position on list Li. • high_i = score in Li at curr cursor position (upper bounds

score of unseen items). • For items d in S:

– Which attr scores are known E(d). – Which attr scores are unknown E~(d). – Worst(d) = total score from E(d). – Best(d) = Worst(d) + {high_i(d) | i E~(d)}. (Exactly as Fagin.)

Page 49: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 49

IBI Algorithm (contd.)

• In each round, compute: – min-k = min{Worst(d) | d T}. – bestscore that any unseen doc can have = sum of all high_i’s. – For dj S: def_j = min-k – worst(d_j). [denotes deficit below

qualification level for top-k.] • T sorted in desc. Worst(); S sorted in desc. Best().

[sorting on (score, ItemID) for fast processing.] • Invatiant: min-k >= max{Worst(d) | d S}. • Termination: when min-k >= max{Best(d) | d S}. • Can remove an obj from S whenever its Best <= min-k.

stop when S = {}. • Early termination AND minimal bookkeeping are BOTH

important for performance.

Page 50: Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

04/21/23 50

More on IBI Framework

• Instead of scheduling SAs using RR, use a differential approach for diff. lists based on expected score reductions at future cursor positions (Knapsack).

• Do SA*RA*.

• Order RAs based on estimated Prob[dj can get into top-k answers].