View
234
Download
0
Category
Preview:
Citation preview
7/25/2019 Dueling Bandits
1/68
The Dueling Bandits Problem
Yisong Yue
7/25/2019 Dueling Bandits
2/68
Outline
Brief Overview of Multi-ArmedBandits Sequential Experimental Design
Dueling Bandits Mathematical properties
Connections to other problems
Algorithmic Principles
New Directions & Ongoing Research
7/25/2019 Dueling Bandits
3/68
Multi-Armed Bandit Problem(stochastic version)
K actions (aa arms or bandits!
Each action has an a"erage re#ard$ % &nno#n to us
Assume ')* that u+is largest
,or t - +.T Algorithm chooses action a(t!
/ecei"es random re#ard 0(t!
Expectation %a(t!
Goal: minimi1e Tu+2 (%a(+! 3 %a(4! 3 . 3 %a(T!!
Algorithm onl0 recei"es5eedbac on chosen action
65 #e had per5ect in5ormation to start Expected /e#ard o5 Algorithm
Regret!
7/25/2019 Dueling Bandits
4/68
Sports
-- -- -- -- --
0 0 0 1 0# Shown
Average Likes : 0
Example:
Interactive Personalization
7/25/2019 Dueling Bandits
5/68
-- -- -- 0 --
0 0 0 1 0# Shown
Average Likes : 0
Example:
Interactive Personalization
Sports
7/25/2019 Dueling Bandits
6/68
-- -- -- 0 --
0 0 1 1 0# Shown
Average Likes : 0
Politics
Example:
Interactive Personalization
7/25/2019 Dueling Bandits
7/68
-- -- 1 0 --
0 0 1 1 0# Shown
Average Likes : 1
Politics
Example:
Interactive Personalization
7/25/2019 Dueling Bandits
8/68
-- -- 1 0 --
0 0 1 1 1# Shown
Average Likes : 1
'orld
Example:
Interactive Personalization
7/25/2019 Dueling Bandits
9/68
-- -- 1 0 0
0 0 1 1 1# Shown
Average Likes : 1
'orld
Example:
Interactive Personalization
7/25/2019 Dueling Bandits
10/68
-- -- 1 0 0
0 1 1 1 1# Shown
Average Likes : 1
Econom0
Example:
Interactive Personalization
7/25/2019 Dueling Bandits
11/68
-- 1 1 0 0
0 1 1 1 1# Shown
Average Likes : 2
Econom0
"
Example:
Interactive Personalization
7/25/2019 Dueling Bandits
12/68
-- 0.44 0.4 0.33 0.2
0 25 10 15 20# Shown
Average Likes : 24
hat should Al!orithm "ecommend#
Exploit: Explore: est:
Politics
Econom0
Celebrit0
#ow to O$timall% Balance '$lore('$loit )radeo*+Characteri1ed b0 the Multi7Armed Bandit Problem
7/25/2019 Dueling Bandits
13/68
( )
)pportunit0 cost o5 not no#ingpre5erences
8no-regret9 i5 /(T!:T ;
E
7/25/2019 Dueling Bandits
14/68
$he Motivatin! Problem
Slot Machine - )ne7Armed Bandit
Goal: Minimi1e regret ,rom pullingsuboptimal arms
6mage source$ http$::research=microso5t=com:en7us:pro>ects:bandits:
Each Arm ?asDi@erent Pa0o@
7/25/2019 Dueling Bandits
15/68
Man% Applications
)nline Ad"ertising Search Engines /ecommender S0stems
Personali1ed Clinical
Treatment
,euential '$erimental Design
7/25/2019 Dueling Bandits
16/68
Experimental &esi!n
?o# to split trials to collect in5ormation
,tatic '$erimental Design Standard practice
(pre7planned!
http$::en=#iipedia=org:#ii:Designo5experiments
Treatment Placebo Treatment Placebo Treatment
"
7/25/2019 Dueling Bandits
17/68
'euential Experimental &esi!n
Adapt experiments based onoutcomes
Treatment Placebo Treatment Treatment
"
Treatment
ential '$erimental Design as .nteractive /ersonali0
et 1 total 2 of $ositive outcomes
7/25/2019 Dueling Bandits
18/68
'euential Experimental &esi!nMatters
http$::###=n0times=com:4;+;:;:+:health:research:+trial=html
o Cousins, Two Paths-- Thomas McLaughlin, left, was gipromising experimental rug to treat his lethal s!in cancermeical trial" #ranon R$an ha to go without it%&
7/25/2019 Dueling Bandits
19/68
hat i "e*ards aren+t &irectl%Measureable#
7/25/2019 Dueling Bandits
20/68
Clic
.nter$retation 3:/esult 4 is good=(Absolute!
.nter$retation 4:
/esult 4 is betterthan /esult +=(/elati"e :Pre5erence!
Evaluatin! usin! ,lic &ata
7/25/2019 Dueling Bandits
21/68
Retrieval 5unction A Retrieval 5unction B
'hich isbetter
Evaluatin! usin! ,lic &ata
Clic
Clic
Clic
7/25/2019 Dueling Bandits
22/68
Analo!% to 'ensor% $estin!
(?0pothetical! taste experiment$"s atural usage context
Experiment +$ A6solute Metrics
3 cans 3 cans 2 cans 1 can 5 cans 3 cans
Total: 8 cans Total: 9 cans
7er% )hirst%8
7/25/2019 Dueling Bandits
23/68
Analo!% to 'ensor% $estin!
(?0pothetical! taste experiment$"s atural usage context
Experiment +$ Relative Metrics
2 - 1 3 - 0 2 - 0 1 - 0 4 - 1 2 - 1
All 6 prefer Pepsi
7/25/2019 Dueling Bandits
24/68
Ran9ing A
+= apa Falle0 2 The authorit0 5orlodging===###=napa"alle0=com
4= apa Falle0 'ineries 7 Plan 0our#ine===###=napa"alle0=com:#ineries
G= apa Falle0 College
###=napa"alle0=edu:homex=aspH= Been There I Tips I apa Falle0
###=i"ebeenthere=co=u:tips:+JJ+
L= apa Falle0 'ineries and 'ine###=napa"intners=com
J= apa Countr0 Cali5ornia 2'iipediaen=#iipedia=org:#ii:apaFalle0
Ran9ing B
+= apa Countr0 Cali5ornia 2'iipedia
en=#iipedia=org:#ii:apaFalle0
4= apa Falle0 2 The authorit0 5orlodging===
###=napa"alle0=comG= apa$ The Stor0 o5 an American
Eden===boos=google=co=u:boosisbn-===
H= apa Falle0 ?otels 2 Bed andBrea5ast===###=napalins=com
L= apaFalle0=org###=napa"alle0=org
J= The apa Falle0 Marathon###=napa"alle0marathon=org
/resented Ran9ing+= apa Falle0 2 The authorit0 5or
lodging===###=napa"alle0=com
4= apa Countr0 Cali5ornia 2'iipedia
en=#iipedia=org:#ii:apaFalle0G= apa$ The Stor0 o5 an AmericanEden===boos=google=co=u:boosisbn-===
H= apa Falle0 'ineries 2 Plan 0our#ine===
###=napa"alle0=com:#ineriesL= apa Falle0 ?otels 2 Bed and
AB
[Radlinski et al. 2008]
Interleavin! ($aste $est in 'earch)
7/25/2019 Dueling Bandits
25/68
Ran9ing A
+= apa Falle0 2 The authorit0 5orlodging===###=napa"alle0=com
4= apa Falle0 'ineries 7 Plan 0our#ine===###=napa"alle0=com:#ineries
G= apa Falle0 College
###=napa"alle0=edu:homex=aspH= Been There I Tips I apa Falle0
###=i"ebeenthere=co=u:tips:+JJ+
L= apa Falle0 'ineries and 'ine###=napa"intners=com
J= apa Countr0 Cali5ornia 2'iipediaen=#iipedia=org:#ii:apaFalle0
Ran9ing B
+= apa Countr0 Cali5ornia 2'iipedia
en=#iipedia=org:#ii:apaFalle0
4= apa Falle0 2 The authorit0 5orlodging===
###=napa"alle0=comG= apa$ The Stor0 o5 an American
Eden===boos=google=co=u:boosisbn-===
H= apa Falle0 ?otels 2 Bed andBrea5ast===###=napalins=com
L= apaFalle0=org###=napa"alle0=org
J= The apa Falle0 Marathon###=napa"alle0marathon=org
/resented Ran9ing+= apa Falle0 2 The authorit0 5or
lodging===###=napa"alle0=com
4= apa Countr0 Cali5ornia 2'iipedia
en=#iipedia=org:#ii:apaFalle0G= apa$ The Stor0 o5 an AmericanEden===boos=google=co=u:boosisbn-===
H= apa Falle0 'ineries 2 Plan 0our#ine===
###=napa"alle0=com:#ineriesL= apa Falle0 ?otels 2 Bed and
B#insO
Clic
[Radlinski et al. 2008]
Clic
Interleavin! ($aste $est in 'earch)
7/25/2019 Dueling Bandits
26/68
ueries
6nterlea"ing is more sensitiveand more relia6le
Disagreeme
ntProbabilit0
QChapelle Roachims /adlinsi ue T)6S 4;+
&eplo%ment on .ahoo/ 'earch En!ineComparing Two Ranking Functions
.nterleaving
A6solute Metricsg; 2
7/25/2019 Dueling Bandits
27/68
>eftwins
Rightwins
A vs B ; 3
A vs < ; ;
B vs < ; ;
Interleave A vs B
7/25/2019 Dueling Bandits
28/68
>eftwins
Rightwins
A vs B ; +
A vs < ; 3
B vs < ; ;
Interleave A vs C
7/25/2019 Dueling Bandits
29/68
>eftwins
Rightwins
A vs B ; +
A vs < ; +
B vs < ; 3
Interleave B vs C
7/25/2019 Dueling Bandits
30/68
>eftwins
Rightwins
A vs B ; +
A vs < 3 +
B vs < ; +
&oal: 'axii%e total (ser (tilit)
Exploit: run (interlea!e "it# itself)
Explore: interlea!e A !s $
est: A(interlea!e A "it# itself)
%o" to interact opti&all'
*(eling an+its ,role
Interleave A vs C
7/25/2019 Dueling Bandits
31/68
Example Pair*ise Preerences
A B < D 5
A ; ;=;G ;=;H ;=;J ;=+; ;=++
B 7
;=;G
; ;=;G ;=;L ;=; ;=+
+< 7
;=;H7;=;G
; ;=;H ;=;N ;=;
D 7;=;J
7;=;L
7;=;H
; ;=;L ;=;N
7;=+;
7;=;
7;=;N
7;=;L
; ;=;G
5 7;=++
7;=++
7;=;
7;=;N
7;=;G
;
Values are Pr(ro ! "ol# 0.$
?tilit% function ma% note'ist
#ow to de@ne regret+
7/25/2019 Dueling Bandits
32/68
Example Pair*ise Preerences
A B < D 5
A
C
3
33
B 7
;=;G
; ;=;G ;=;L ;=; ;=+
+< 7
;=;H7;=;G
; ;=;H ;=;N ;=;
D 7;=;J
7;=;L
7;=;H
; ;=;L ;=;N
7;=+;
7;=;
7;=;N
7;=;L
; ;=;G
5 7;=++
7;=++
7;=;
7;=;N
7;=;G
;
Values are Pr(ro ! "ol# 0.$
?tilit% function ma% not e'ist
#ow to de@ne regret+
7/25/2019 Dueling Bandits
33/68
&uelin! Bandits Problem(*ith 0ose Broder1 "obert 2leinber! and $horsten 0oachims)
K bandits b+ . bK
Each iteration$ compare (duel! t#o bandits )bser"e (nois0! outcome
Cost 5unction (regret!$
(bt btW! are the t#o bandits chosen bXis the o"erall best one
(?o# much human user pre5erred bXo"er chosenbandits!
Que Broder Kleinberg Roachims C)T 4;;
/equires Dueling Mechanism
RT= P('*> 't)+ P('*> 't')1t=1
T
7/25/2019 Dueling Bandits
34/68
&uelin! Bandits Problem
Values are Pr(ro ! "ol# 0.$
+>=T
t
ttT bbPbbPR
1
1)'*()*(
7/25/2019 Dueling Bandits
35/68
&uelin! Bandits Problem
Values are Pr(ro ! "ol# 0.$
=T
t
ttT bbPbbPR
1
1)'*()*(
7/25/2019 Dueling Bandits
36/68
&uelin! Bandits Problem
Values are Pr(ro ! "ol# 0.$
+>=T
t
ttT bbPbbPR
1
1)'*()*(
7/25/2019 Dueling Bandits
37/68
Modeling Assumptions
P(biY b>! - Z 3 [i>(distinguishabilit0!
,trong ,tochastic )ransitivit%
,or three bandits biY b>Y b $
Monotonicit0 propert0
,tochastic )riangle .neualit%
,or three bandits biY b>Y b $
Diminishing returns propert0
Satis\ed b0 man0 standard models E=g= ogistic : Bradle07Terr0
{ }jkijik ,max
i!i(+(!
7/25/2019 Dueling Bandits
38/68
'tron! 'tochastic $ransitivit%
A B < D 5
A ; ;=;G ;=;H ;=;J ;=+; ;=++
B 7;=;G
; ;=;G ;=;L ;=; ;=++
< 7;=;H 7;=;G ; ;=;H ;=;N ;=;
D 7;=;J
7;=;L
7;=;H
; ;=;L ;=;N
7
;=+;
7
;=;
7
;=;N
7
;=;L
; ;=;
G
Values are Pr(ro ! "ol# 0.$
Monotonic
Mo
no
to
n
ic
{ }jkijik ,max
7/25/2019 Dueling Bandits
39/68
'tochastic $rian!le Ineualit%
A B < D 5
A ; ;=;G C
;=+; ;=++
B 7;=;G
;
F
;=; ;=++
< 7;=;H 7;=;G ; C ;=;N ;=;
D 7;=;J
7;=;L
7;=;H
; ;=;L ;=;N
7
;=+;
7
;=;
7
;=;N
7
;=;L
; ;=;
G
Values are Pr(ro ! "ol# 0.$
Red Blue HGreen
jkijik +
7/25/2019 Dueling Bandits
40/68
'tochastic $rian!le Ineualit%
A B < D 5
A ; ;=;G ;=;H ;=;J 3
33
B 7;=;G
; ;=;G ;=;L I
33
< 7;=;H 7;=;G ; ;=;H E J
D 7;=;J
7;=;L
7;=;H
; F
E
7
;=+;
7
;=;
7
;=;N
7
;=;L
;
Values are Pr(ro ! "ol# 0.$
Red Blue HGreen
jkijik +
7/25/2019 Dueling Bandits
41/68
Aside:,onidence Intervals
True pre5erence Current Estimate
#oe*dingKs.neualit%:
Desired Error Tolerance
7/25/2019 Dueling Bandits
42/68
Example
t-+;; t-H;; t-+J;;
2
7/25/2019 Dueling Bandits
43/68
Explore-then-Exploit
Decompose into 4 Phases
'$lore /hase
6denti50 the best bandit #=h=p= Minimi1e incurred regret
'$loit /hase
Pla0 best bandit "s itsel5 6ncurs no regret
7/25/2019 Dueling Bandits
44/68
,onnection to $ournaments
Each pair 8duels9 until statisticalsigni\cance
Aa ois0 Tournament *uarantees \nding best bandit #=h=p=
7/25/2019 Dueling Bandits
45/68
Analog%: ?0potheticalSoccer Tournament A team #ins #hen it has a G7goal lead
Audience pre5ers good teams pla0 Lregret
)wo Lnearl% euall% 6ad teams will $la% for along time
$ournament is Bad
Each pair 8duels9 until statisticalsigni\cance
/ro6lem$ t#oEquall0 bad bandits
7/25/2019 Dueling Bandits
46/68
Tournament E=g= tennis Q,eige et al= +HU
Champion E=g= boxing Que Broder Kleinberg Roachims 4;;U
S#iss E=g= group rounds in 'orld Cup Que Roachims 4;++U
t#er *+plore ,trateies
7/25/2019 Dueling Bandits
47/68
Champion duels each challenger (roundrobin! &ntil statistical signi\cance
Que Broder Kleinberg Roachims C)T 4;;
,hampion(Interleaved 3ilter)
7/25/2019 Dueling Bandits
48/68
/egret per champion bounded uicl0 replaced i5 bad
Comparisons until elimination$ /egret per comparison$
/egret o5 challenge:champion pair$ e"erage Transiti"it0 Triangle 6nequalit0
Que Broder Kleinberg Roachims C)T 4;;
,hampion is 4ood
Margin bet#een best bandit and rest
/egret perChampion$
/emaining Bandits
O R
logT
O min1
i(2,1
1i2
logT
1i+1(
O1
1i
logT
7/25/2019 Dueling Bandits
49/68
/egret per champion bounded uicl0 replaced i5 bad
Sequence o5 champions as a random #al og rounds to arri"e at best
Que Broder Kleinberg Roachims C)T 4;;
One of these will 6ecome ne't cham$ion
,hampion is 4ood
Better
Margin bet#een best bandit and rest
/egret perChampion$
/emaining Bandits
O R
logT
7/25/2019 Dueling Bandits
50/68
One of these will 6ecome ne't cham$ion
Que Broder Kleinberg Roachims C)T 4;;
,hampion is 4ood
/egret per champion bounded uicl0 replaced i5 bad
Sequence o5 champions as a random #al og rounds to arri"e at best
Better
Margin bet#een best bandit and rest
/egret perChampion$
/emaining Bandits
O R
logT
7/25/2019 Dueling Bandits
51/68
One of these will 6ecome ne't cham$ion
Que Broder Kleinberg Roachims C)T 4;;
,hampion is 4ood
/egret per champion bounded uicl0 replaced i5 bad
Sequence o5 champions as a random #al og rounds to arri"e at best
Better
Margin bet#een best bandit and rest
/egret perChampion$
/emaining Bandits
O R
logT
7/25/2019 Dueling Bandits
52/68
Que Broder Kleinberg Roachims C)T 4;;
,hampion is 4ood
/egret per champion bounded uicl0 replaced i5 bad
Sequence o5 champions as a random #al og rounds to arri"e at best
Better
Bandits
Margin bet#een best bandit and others
Time ?ori1onRegret:O$timal RegretGuarantee8
Margin bet#een best bandit and rest
/egret perChampion$
/emaining Bandits
) RT[ ]= O *
logT
O R
logT
7/25/2019 Dueling Bandits
53/68
Each iteration$ Duel random $air Eliminate bandit #: #orst record
Que Roachims 6CM 4;++U
'*iss(Beat the Mean)
est
DuelingMechanism
/emo"e duels #ith eliminatedbandit 5rom all remaining records(onl0 a 5raction o5 a records!
/ecord - #in rate "s 8mean9 bandit
/elated to action elimination algorithmsQE"en7Dar et al= 4;;JU
/egret untilnext remo"al$ O
1logT
7/25/2019 Dueling Bandits
54/68
Champion has high "ariance Depends on initial champion
S#iss o@ers lo#7"ariance alternati"e Successi"el0 eliminate #orst bandit
Que Roachims 6CM 4;++U
'*iss is Better
Regret: )ptimal /egret#: ?igh Probabilit0ORT= O *
logT
7/25/2019 Dueling Bandits
55/68
/egret
B&
)
)
&
R
Bandits
Que Roachims 6CM 4;++U
urnamentper5orms poorl0 (3' worse!some experiments Y+;;;;x #orse!isshas lo#er "ariance than
7/25/2019 Dueling Bandits
56/68
4eneral Al!orithmic 'tructure(most existin! &B al!orithms ollo* this structure)
5irst 6andit is a $ivot(anchor 6nterlea"ed ,ilter$ Champion
Beat the Mean$ &ni5orm(all remaining bandits!
,econd 6andit e'$lores relativeto $ivot 6nterlea"ed ,ilter$ /ound /obin
Beat the Mean$ /andomi1ed /ound /obin
?$date /ivot /eriodicall%This structure maes it easier to anal01e regret
E=g= spend more time exploring good pi"ots
7/25/2019 Dueling Bandits
57/68
More "ecent "esults
/elaxing Strong Transiti"it0$
Be0ond Condorcet 'inner Borda 'inner
"on eumann 'inner Copeland 'inner
An0time algorithms )riginal #or required no#ing T a priori
More sophisticated dueling mechanisms
6ncluding
o@7polic0e"aluation
G
Ad"ersarial Setting
Contextual Setting
{ }jkijik ,max
7/25/2019 Dueling Bandits
58/68
On!oin! or: &ependent Arms(*ith .anan 'ui1 5incent 6huan! and 0oel Burdic)
Suppose K is "er0 large (possibl0in\nite! But arms ha"e dependenc0 structure
E=g= P(aYb! ] P(aYbW! i5 b similar to bW Measure similarit0 using ernel
.nitial results: As0mptoticall0 optimal but impractical
algorithm$Degrees o5 5reedom o5 ernel:mani5old=
7/25/2019 Dueling Bandits
59/68
Personalized ,linical $reatment(*ith .anan 'ui1 5incent 6huan! and 0oel Burdic)
Hmm
+;mm
Medtronichumanarra0
6mage source$#illiamcapicottomd=com
SC6 Patient
ach $atient is uniue
3J $ossi6le con@gurations8
7/25/2019 Dueling Bandits
60/68
,entralized vs &istributed
Most DB algorithms are centrali1ed Single algorithm controls choice o5 both
bandits
'hat about distributed algorithmsT#o algorithms each controlling one
bandit ,$arring!QAilon Karnin and Roachims 6CM 4;+HU
Each algorithm pla0s a standard MABalgorithm
/ecentl0 Anal01edQDudi Schapire and Sli"ins
7/25/2019 Dueling Bandits
61/68
&uelin! Bandits 7 6ero-'um 4ame
A B < D 5
A ; ;=;G ;=;H ;=;J ;=+; ;=++
B7;=;G ; ;=;G ;=;L ;=; ;=++
< 7;=;H
7;=;G
; ;=;H ;=;N ;=;
D 7
;=;J
7
;=;L
7
;=;H
; ;=;L ;=;
N 7
;=+;7;=;
7;=;N
7;=;L
; ;=;G
5 7;=++
7;=++
7;=;
7;=;N
7;=;G
;Values are Pr(ro ! "ol# 0.$
Basic Setting$ Single Dominant Strateg0/egret - )pportunit0 Cost to Social 'el5are
/la%er 3
/la%er4
7/25/2019 Dueling Bandits
62/68
&uelin! Bandits 7 6ero-'um 4ame
A B < D 5
A ; ;=;G ;=;H ;=;J ;=+; ;=++
B 7;=;G
; ;=;G ;=;L ;=; ;=++
< 7;=;H
7;=;G
; ;=;H ;=;N ;=;
D 7
;=;J
7
;=;L
7
;=;H
; ;=;L ;=;
N 7
;=+;7;=;
7;=;N
7;=;L
; ;=;G
5 7;=++
7;=++
7;=;
7;=;N
7;=;G
;Values are Pr(ro ! "ol# 0.$
Basic Setting$ Single Dominant Strateg0/egret - )pportunit0 Cost to Social 'el5are
/la%er 3
/la%er4
7/25/2019 Dueling Bandits
63/68
&uelin! Bandits 7 6ero-'um 4ame
A B < D 5
A ; ;=;G ;=;H ;=;J ;=+; ;=++
B 7;=;G
; ;=;G ;=;L ;=; ;=++
< 7;=;H
7;=;G
; ;=;H ;=;N ;=;
D 7
;=;J
7
;=;L
7
;=;H
; ;=;L ;=;
N 7
;=+;7;=;
7;=;N
7;=;L
; ;=;G
5 7;=++
7;=++
7;=;
7;=;N
7;=;G
;Values are Pr(ro ! "ol# 0.$
Basic Setting$ Single Dominant Strateg0/egret - )pportunit0 Cost to Social 'el5are
/la%er 3
/la%er4
7/25/2019 Dueling Bandits
64/68
&uelin! Bandits 7 6ero-'um 4ame
A B < D 5
A ; ;=;G ;=;H ;=;J ;=+; ;=++
B 7;=;G
; ;=;G ;=;L ;=; ;=++
< 7;=;H
7;=;G
; ;=;H ;=;N ;=;
D 7
;=;J
7
;=;L
7
;=;H
; ;=;L ;=;
N 7
;=+;7;=;
7;=;N
7;=;L
; ;=;G
5 7;=++
7;=++
7;=;
7;=;N
7;=;G
;Values are Pr(ro ! "ol# 0.$
Basic Setting$ Single Dominant Strateg0/egret - )pportunit0 Cost to Social 'el5are
/la%er 3
/la%er4
7/25/2019 Dueling Bandits
65/68
On!oin! or:8earnin! in 4ames(*ith 'id Barman and 2atrina 8i!ett)
E
7/25/2019 Dueling Bandits
66/68
On!oin! or:8earnin! in 4ames(*ith 'id Barman and 2atrina 8i!ett)
Centrali1ed algorithms -8coordination9 Settings that bene\t 5rom minimal
coordination
o# /an Matrix$T#o algorithms coordinate on
exploration Small initial phase
7/25/2019 Dueling Bandits
67/68
'ummar%: &uelin! Bandits Problem
Elicits pre5erence 5eedbac Moti"ated b0 human7centric
personali1ation
Characteri1es explore:exploit tradeo@
Connections to nois0 tournaments
Connections to learning in games
"eerences
7/25/2019 Dueling Bandits
68/68
)he -armed Dueling Bandits /ro6lem b0 isong ue Rose5 Broder /obert Kleinbergand Thorsten Roachims C)T 4;;
.nteractivel% O$timi0ing .nformation Retrieval ,%stems as a Dueling Bandits/ro6lem b0 isong ue and Thorsten Roachims 6CM 4;;
Beat the Mean Bandit b0 isong ue and Thorsten Roachims 6CM 4;++Reusing #istorical .nteraction Data for 5aster Online >earning to Ran9 for .Rb0 Kat>a ?o5mann Anne Schuth Shimon 'hiteson and Maarten de /i>e 'SDM 4;+G
Generic '$loration and -armed 7oting Bandits b0 Tangu0 &r"o0 ,abrice Clerot/aphael ,eraud and Sami aamane 6CM 4;+G
Reducing Dueling Bandits to
Recommended