Upload
merilyn-pitts
View
228
Download
0
Embed Size (px)
Citation preview
Ramakrishnan Srikant
Sugato Basu
Ni Wang
Daryl Pregibon
1
2
Estimating Relevance of Search Engine ResultsUse CTR (click-through rate) data.Pr(click) = Pr(examination) x Pr(click | examination)
Need user browsing models to estimate Pr(examination)
Relevance
3
NotationΦ(i) : result at position i
Examination event:
Click event:
otherwise 0,
(i)on clickeduser theif ,1 iC
otherwise 0,
(i) examineduser theif ,1 iE
4
Examination HypothesisRichardson et al, WWW 2007:
Pr(Ci = 1) = Pr(Ei = 1) Pr(Ci = 1 | Ei = 1)
αi : position bias
Depends solely on position.Can be estimated by looking at CTR of the same result in
different positions.
5
Using Prior Clicks
6
Clicks
Pr(E5 | C1,C3) = 0.5 Pr(E5 | C1) = 0.3
ClicksR1
R2
R3
R4
R5:
R1
R2
R3
R4
R5:
Examination depends on prior clicksCascade modelDependent click model (DCM)User browsing model (UBM) [Dupret & Piwowarski,
SIGIR 2008]More general and more accurate than Cascade, DCM.Conditions Pr(examination) on closest prior click.
Bayesian browsing model (BBM) [Liu et al, KDD 2009]Same user behavior model as UBM.Uses Bayesian paradigm for relevance.
7
Use position of closest prior click to predict Pr(examination).
Pr(Ei = 1 | C1:i-1) = αi β i,p(i)
Pr(Ci = 1 | C1:i-1) = Pr(Ei = 1 | C1:i-1) Pr(Ci = 1 | Ei = 1)
User browsing model (UBM)
8
position bias
p(i) = position of closest prior click
Prior clicks don’t affect relevance.
Other Related WorkExamination depends on prior clicks and prior relevance
Click chain model (CCM)General click model (GCM)
Post-click modelsDynamic Bayesian modelSession utility model
9
10
Constant Relevance AssumptionCascade model, DCM, UBM, BBM, CCM, GCM all
implicitly assume:Relevance is independent of prior clicks.Relevance is constant across query instances.
Query = “Canon S90”Aggregate relevance: Relevance to a query.
Query instance = “Canon S90” for a specific user at a specific point in time.Instance relevance: Relevance to a query
instance.
11
User intent Query string does not fully capture user intent.
12
Prior clicks signal relevance.... not just Pr(examination).
13
Testing Constant RelevanceIf we know that Pr(examination) ≈ 1:
Relevance ≈ CTRTest whether relevance is independent of prior clicks.
When is Pr(examination) ≈ 1?Users scan from top to bottom.If there is a click below position i, then Pr(Ei) ≈ 1.
14
For a specific position i:
If relevance is independent of prior clicks, we expect
Lift = LHS / RHS ≈ 1
All query instances
Statistical Power of the Test
)clicks(S Predicted
)Clicks(S
)clicks(S Predicted
)Clicks(S
0
0
1
1
15
Pr(examination) ≈ 1
S0: No clicks above i
S1: Click above i
The data speaks…• Over all configs:
Lift = 2.69 +/- 0.05 (99% conf. Interval)
•Graph shows config with 3 top ads, 8 rhs ads. • T2 = 2nd top ad, R3 = 3rd rhs ad, etc.
16
Pure Relevance
Max Examination
Joint Relevance Examination (JRE)
17
Pure RelevanceAny change in Pr(Ci = 1) when conditioned on other
clicks is solely due to change in instance relevance.Number of clicks on other results used as signal of
instance relevance.Does not use position of other clicks, only the count.
Yields identical aggregate relevance estimates as the baseline model (which does not use co-click information).
Pr(Ci = 1 | C≠i , Ei = 1) = rΦ(i) δn(i)
18
n(i) = number of click in other positions
Max ExaminationLike UBM/BBM, but also use information about clicks
below position i.Pr(examination) ≈ 1 if there is a click below i
UBM/BBM: Pr(Ei = 1 | C1:i-1) = αi βi,p(i)
Max-examination: Pr(Ei = 1 | C ≠i) = αi βi,e(i)
19
p(i) = position of closest prior click
i belowclick a is thereif1,i
iposition belowclick no ifp(i),e(i)
Joint Relevance Examination (JRE)Combines the features of the pure relevance and max-
examination models.Allows CTR changes to be caused by both changes in
examination and changes in instance relevance.
Pr(Ei = 1 | C≠i) = αi βi,e(i)
Pr(Ci = 1 | C≠i , Ei = 1) = rΦ(i) δn(i)
Pr(Ci = 1 | C≠i) = Pr(Ei = 1 | C≠i) Pr(Ci = 1 | Ei = 1, C≠i)
20
21
Predicting CTRModels:
Baseline: Google's production system for predicting relevance of sponsored results. Does not use co-click information.
Compare to UBM/BBM, max examination, pure relevance, and JRE.
Data: 10% sample of a week of data.50-50 split between training and testing.
22
Absolute Error
23
Baseline: Google’s production system.
Log likelihood
24
Squared Error
25
26
Predicting Relevance vs. Predicting CTRIf model A is more accurate than model B at predicting
CTR, wouldn’t A also be better at predicting (aggregate) relevance?
27
Counter-ExampleCTR:
pure-relevance >>
max-examination >>
baseline
Relevance: Eitherpure-relevance == baseline
>>
max-examination
ORmax-examination
>>
pure-relevance == baseline
28
IntuitionPredicting CTR:
Get the product, Pr(examination) x Relevance, right.Predicting Relevance:
Need to correctly assign credit between examination and relevance.
Incorrectly assigning credit can improve CTR prediction, while making relevance estimates less accurate.
29
Predicting relevance.Run an experiment on live traffic.
Sponsored results are ranked by bid x relevance.More accurate relevance estimates should result in higher
CTR and revenue. Will place results with higher relevance in positions with higher
Pr(examination).
Baseline/pure-relevance had better revenue and CTR than max-examination.Results were statistically significant results.
30
ConclusionsChanges in CTR when conditioned on other clicks are
also due to instance relevance, not just examination.New user browsing models that incorporate this insight
are more accurate.Evaluating user browsing models solely using offline
analysis of CTR prediction can be problematic.Use human ratings or live experiments.
31
Future WorkWhat about organic search results?Quantitatively assigning credit between instance
relevance and examination.Features are correlated.
Generalize pure-relevance and JRE to incorporate information about the relevance of prior results, or the satisfaction of the user with the prior clicked results.
32
33
Scan order
34