View
221
Download
0
Category
Preview:
Citation preview
Ranking in DB
Laks V.S. Lakshmanan
Depf. of CS
UBC
04/21/23 2
Why ranking in query answering? 1/3
• Mutimedia data – fuzzy querying: e.g., “find top 2 red objects with a soft texture”.
Obj Score
D 0.85
B 0.80
A 0.75
E 0.65
C 0.60
Obj Score
A 0.9
D 0.8
C 0.4
B 0.3
E 0.1
Combine scores
Overall score
04/21/23 3
Why ranking? 2/3
• IR: “find top 5 documents relevant to `computational’, `neuroscience’ and `brain theory’. – IR systems maintain full text indexes; inverted lists of
docs w.r.t. each keyword. – Same Q/A paradigm as before.
• Buying a home: several criteria – price, location, area, #BRs, school district. ORDER BY query in SQL.
• Finding hotels while traveling.
04/21/23 4
Why ranking? 3/3• Data stream, e.g., of network flow data: “find 10
users with the max. BW consumption and max. #packets communicated”. – score may be complex aggregation of these two measures.
• In a social net, find 5 items tagged as most relevant to “lawn mowing” and blonging to users socially close to the seeker.
• And now, find top-k recs (recommender systems). • etc. • Fagin et al. – pioneering papers PODS’96, 01,
JCSS 2003. Burgeoned into a field now. • Focus on middleware algorithm, which given a score
combo. function, computes top-k answers by probing diff. subsystems (or ranked lists).
04/21/23 5
Computational model • Naïve method. • How to compute top-K efficiently? • Access methods:
– Sorted access (sequential access) [SA]. – Random access [RA].
• Diff. optimization metrics: – Overall running time of algorithm. – SA < RA: minimize RAs. – RA not possible#: avoid RAs. – Combined optimization.
• Has led to a variety of algorithms. • Memory vs. disk model. • For the most part, assume score agg. is a monotone function;
use SUM in examples.
#: typical in IR systems.
04/21/23 6
Fagin’s Algorithm (FA) • m lists sorted by descending scores. • Access (SA) all lists in parallel.
– For each new object seen, fetch scores from other lists by RA. Overall score t(x) = t(x1, …, xm). Store (obj, score) in set Y.
– Remember each object seen (under SA) in all lists in set H.
• Repeat until |H| >= K. • Sort Y in descending order of scores, breaking
ties arbitrarily, and output top K.
04/21/23 7
Example of FA
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
Answers seen in >=1 list, i.e., Y
unsorted.
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
Answers seen (under SA) in all 4 lists, i.e., H.
04/21/23 8
Example of FA
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
Answers seen in >=1 list, i.e., Y
unsorted.
Answers seen (under SA) in all 4 lists, i.e., H.
04/21/23 9
Example of FA
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
3.30
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
Answers seen in >=1 list, i.e., Y
unsorted.
Answers seen (under SA) in all 4 lists, i.e., H.
04/21/23 10
Example of FA
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
3.30
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
Answers seen in >=1 list, i.e., Y
unsorted.
Answers seen (under SA) in all 4 lists, i.e., H.
2.65
04/21/23 11
Example of FA
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
3.30
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
Answers seen in >=1 list, i.e., Y
unsorted.
Answers seen (under SA) in all 4 lists, i.e., H.
2.65
3.40
3.05
04/21/23 12
Example of FA
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
Answers seen in >=1 list, i.e., Y
unsorted.
Answers seen (under SA) in all 4 lists, i.e., H.
3.05
3.40
3.05
3.15
3.30
2.65
04/21/23 13
Example of FA
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
Answers seen in >=1 list, i.e., Y
unsorted.
Answers seen (under SA) in all 4 lists, i.e., H.
3.05
3.40
3.05
3.15
3.30
2.65
2.55
04/21/23 14
Example of FA
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
Answers seen in >=1 list, i.e., Y
unsorted.
Answers seen (under SA) in all 4 lists, i.e., H.
3.05
3.40
3.05
3.15
3.30
2.65
2.55
H
04/21/23 15
Example of FA
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
Answers seen in >=1 list, i.e., Y
unsorted.
Answers seen (under SA) in all 4 lists, i.e., H.
3.05
3.40
3.05
3.15
3.30
2.65
2.55
H, G
04/21/23 16
Example of FA
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
Answers seen in >=1 list, i.e., Y
unsorted.
Answers seen (under SA) in all 4 lists, i.e., H.
3.05
3.40
3.05
3.15
3.30
2.65
2.55
H, G, B, C
2.05
|H| = 4.
04/21/23 17
FA Example concluded • A, F – not seen in any list. Yet, we are sure they
can’t make it to top-4. Why? • Based on where the cursors are now, what’s the
max. possible score for A, F? • What assumptions are being made about t()? • FA is shown to be optimal with very high
probability [Fagin: PODS 1996]. • But can be beaten by other algorithms on
specific inputs. • What about buffer size?
04/21/23 18
Threshold Algorithm
• Do parallel SA on all m lists. • For each object x seen under SA in a list, fetch its
scores from other lists by RA and compute overall score.
• If |Buffer| < K add x to Buffer; • Else if score(x) <= k-th score in buffer, toss;
• Else replace bottom of buffer with (x, score(x)) & resort.
• Stop when threshold <= k-th score in buffer. • Threshold := t(worst score seen on L1, …, worst
score seen on Lm). • Output the top-K objects & scores (in buffer).
04/21/23 19
TA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
04/21/23 20
TA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
04/21/23 21
TA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
3.30
Threshold Bar:
x1 x2 x3 x4 0.95 1.00 0.95 1.00
04/21/23 22
TA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
3.30
Threshold Bar: T = 3.90.
x1 x2 x3 x4 0.95 1.00 0.95 1.00
3.40
3.05
2.65
04/21/23 23
TA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
3.30
Threshold Bar: T=3.60.
x1 x2 x3 x4 0.90 0.95 0.80 0.95
3.40
3.05
2.65 X
3.05 X
3.15
04/21/23 24
TA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
3.30
Threshold Bar: T=3.30.
x1 x2 x3 x4 0.85 0.85 0.70 0.90
3.40
3.05
2.65 X
3.05 X
3.15
2.55 X
04/21/23 25
TA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
3.30
Threshold Bar: T=3.10.
x1 x2 x3 x4 0.80 0.80 0.65 0.85
3.40
3.05
2.65 X
3.05 X
3.15
2.55 X
04/21/23 26
TA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
3.30
Threshold Bar: T=2.90. ==> can stop!
x1 x2 x3 x4 0.75 0.75 0.60 0.80
3.40
3.05
2.65 X
3.05 X
3.15
2.55 X
04/21/23 27
TA Remarks
TA is Instance Optimal
04/21/23 28
TA IO Proof (contd.)
04/21/23 29
Proof (contd.)
04/21/23 30
Proof (contd.)
04/21/23 31
Proof (contd.)
04/21/23 32
Proof (concluded)
04/21/23 33
04/21/23 34
No Random Access Algorithm• What if RA > SA or RA wasn’t allowed? • Do SA on all lists in parallel. At depth d:
– Maintain worst scores x1, …, xm. – x any object seen in lists {1, …, i}.
• Best(x) = t(x1, …, xi, xi+1, …, xm). • Worst(x) = t(x1, …, xi, 0, …, 0).
– TopK contains K objects with max worst scores at depth d. Break ties using Best. M = k-th Worst score in TopK.
– Object y is viable if Best(y) > M. • Stop when TopK contains >=K distinct objects
and no object outside TopK is viable. Return TopK.
04/21/23 35
NRA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
[0.95, 3.90]
[1.00, 3.90]
[0.95, 3.90]
[1.00, 3.90]
04/21/23 36
NRA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
[0.95, 3.65]
[1.80, 3.65]
[1.90, 3.75]
[1.00, 3.65]
[0.90, 3.60]
[0.95, 3.60]
04/21/23 37
NRA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
[1.85, 3.40]
[1.80, 3.55]
[1.90, 3.65]
[1.85, 3.40]
[0.90, 3.35]
[1.80, 3.35]
[0.70, 3.30]
04/21/23 38
NRA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
[3.30, 3.30]
[1.80, 3.45]
[2.70, 3.55]
[1.85, 3.30]
[1.75, 3.20]
[1.80, 3.25]
[0.70, 3.15]
04/21/23 39
NRA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
[3.30, 3.30]
[1.80, 3.35]
[2.70, 3.50]
[2.60, 3.20]
[1.75, 3.10]
[3.15, 3.15]
[1.50, 3.00]
04/21/23 40
NRA Example
L1 L2 L3 L4H(0.95)
C(0.80
A
B
C
D
E
F
G
H
I
J
B(0.90)
E(0.85)
G(0.75)
I(0.70)
D(0.65)
A(0.60)
J(0.55)
F(0.50)
J(1.00)
C(0.95)
G(0.85)
H(0.80)
E(0.75)
B(0.75)
F(0.60)
A(0.50)
D(0.40)
I(0.30)
C(0.95)
J(0.80)
D(0.70)
H(0.65)
G(0.60)
B(0.55)
I(0.50)
E(0.45)
F(0.40)
A(0.30)
E(1.00)
G(0.95)
H(0.90)
B(0.85)
D(0.80)
C(0.70)
A(0.65)
I(0.55)
F(0.45)
J(0.30)
[3.30, 3.30]
[1.80, 3.20]
[3.40, 3.40]
[2.60, 3.15]
[3.05, 3.05]
[3.15, 3.15]
[1.50, 2.95]
[0.70, 2.70]
04/21/23 41
NRA Features
• What sort of t() do we need to assume, for NRA to work correctly?
• How large can the buffers get?
• How does the amount of bookkeeping compare with TA?
• NRA is instance optimal over algo’s not making RA (and of course, not making wild guesses).
04/21/23 42
Combined optimization
• What if we are told cost(RA) = .cost(SA)?
• Can we find algo’s better than NRA and TA in this case?
• Combined algorithm = CA. (See Fagin et al.’s paper for details.)
04/21/23 43
Worrying about I/O cost
• Based on Bast et al. VLDB 2006.
• Inverted lists of (itemID, score) entries in desc. score order, as usual, but on disk.
• Blocks sorted by itemID; across blocks still in desc. score order.
Inverted Block Index (IBI) Algorithm.
• What is an IBI?
04/21/23 44
A Motivating Example List 1 List 2 List 3 Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3 · Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·
Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]unseen: ≤ 2.4
04/21/23 45
A Motivating Example List 1 List 2 List 3
Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3
· Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·
Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]
unseen: ≤ 2.4
Round 2 (SA on 1,2,3)Doc17 : [1.5 , 2.0]Doc25 : [0.7 , 1.6]Doc83 : [0.9 , 1.6]
unseen: ≤ 1.4
04/21/23 46
A Motivating Example List 1 List 2 List 3
Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3
· Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·
Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]
unseen: ≤ 2.4
Round 2 (SA on 1,2,3)Doc17 : [1.5 , 2.0]Doc25 : [0.7 , 1.6]Doc83 : [0.9 , 1.6]
unseen: ≤ 1.4
Round 3 (SA on 2,2,3!)Doc17 : [1.5 , 2.0]Doc83 : [1.4 , 1.6]
unseen: ≤ 1.0
04/21/23 47
A Motivating Example List 1 List 2 List 3
Doc17 : 0.8 Doc25 : 0.7 Doc83 : 0.9 Doc78 : 0.2 Doc38 : 0.5 Doc17 : 0.7 . Doc14 : 0.5 Doc61 : 0.3
· Doc83 : 0.5 · · · · · Doc17 : 0.2 · · · ·
Round 1 (SA on 1,2,3)Doc17 : [0.8 , 2.4]Doc25 : [0.7 , 2.4]Doc83 : [0.9 , 2.4]
unseen: ≤ 2.4
Round 2 (SA on 1,2,3)Doc17 : [1.5 , 2.0]Doc25 : [0.7 , 1.6]Doc83 : [0.9 , 1.6]
unseen: ≤ 1.4
Round 3 (SA on 2,2,3!)Doc17 : [1.5 , 2.0]Doc83 : [1.4 , 1.6]
unseen: ≤ 1.0
Round 4 (RA for Doc17)Doc17 : 1.7
all others < 1.7done!
Note deviation from round-robin.
04/21/23 48
IBI Algorithm
• Same setting as NRA/CA, except use IBI. • Maintain two lists: Top-K items (T = d1, …, dk) and
StillHaveASHot (SHASH) (S = dk+1, …, dk+q) items. • Pos_i = curr cursor position on list Li. • high_i = score in Li at curr cursor position (upper bounds
score of unseen items). • For items d in S:
– Which attr scores are known E(d). – Which attr scores are unknown E~(d). – Worst(d) = total score from E(d). – Best(d) = Worst(d) + {high_i(d) | i E~(d)}. (Exactly as Fagin.)
04/21/23 49
IBI Algorithm (contd.)
• In each round, compute: – min-k = min{Worst(d) | d T}. – bestscore that any unseen doc can have = sum of all high_i’s. – For dj S: def_j = min-k – worst(d_j). [denotes deficit below
qualification level for top-k.] • T sorted in desc. Worst(); S sorted in desc. Best().
[sorting on (score, ItemID) for fast processing.] • Invatiant: min-k >= max{Worst(d) | d S}. • Termination: when min-k >= max{Best(d) | d S}. • Can remove an obj from S whenever its Best <= min-k.
stop when S = {}. • Early termination AND minimal bookkeeping are BOTH
important for performance.
04/21/23 50
More on IBI Framework
• Instead of scheduling SAs using RR, use a differential approach for diff. lists based on expected score reductions at future cursor positions (Knapsack).
• Do SA*RA*.
• Order RAs based on estimated Prob[dj can get into top-k answers].
Recommended