Upload
toshiyuki-maezawa
View
736
Download
0
Embed Size (px)
Citation preview
30 AdaGrad+RDA
echizen_tm Oct.11, 2014
(3p) Stochastic Gradient Descent(2p) AdaGrad+RDA(6p) AdaGrad+RDA(3p) (1p)
(1/3)
{, } {, , , , } {10, 20, 30, 40, }
(2/3)
:x :w
y>0Ay
(3/3) x = {:1, :1, :1, :1} w = {:1, :1, :1, :-1} y = 1*1 + 1*1 + 1*1 + 1*-1
= 2 > 0
(t=1)(t=-1)
Stochastic Gradient Descent(1/2)
wxt w
:
:
f ( w, x, t) =max(0,1 t xiwii )
f ( w, x, t) = 12 (t xiwii )2
Stochastic Gradient Descent(2/2)
w Stochastic Gradient Descent(SGD)
Stochastic Gradient Descent =
w = 0; for ((x,t) in X) { w -= f(w, x, t); }
:
:
f ( w, x, t) /wi = txif ( w, x, t) /wi = (t xiwi
i )xi
AdaGrad+RDA(1/6) SGDAROWSCW
AdaGrad+RDA
AROWSCW
SGDAdaGrad+RDA SGD:
sxs+1w AdaGrad+RDA:
0ss+1
AdaGrad+RDA(2/6) AdaGrad+RDARegret
R( ws+1) = ( giws+1,i )i + ws+1 +
12 ( hii ws+1,i2 )
gi =1s f (
wj,x j, t j ) /wj,i
j=0
s
hj =1s f (
wj,x j, t j ) /wj,i{ }
2
i=0
s
AdaGrad+RDA(3/6)
Regretw, g, h, 4 w: s+1 g,h: (f) ghRegret
:
R( ws+1) = ( giws+1,i )i + ws+1 +
12 ( hii ws+1,i2 )
AdaGrad+RDA(4/6)
gh g
h
s
gi =1s f (
wj,x j, t j ) /wj,i
j=0
s
hj =1s f (
wj,x j, t j ) /wj,i{ }
2
i=0
s
f ( wj,x j, t j ) /wj,i
AdaGrad+RDA(5/6)
R(w)=0w=r(,g,h) =
w = 0; for ((x,t) in X) { g(w,x,t); h(w,x,t); w = r(, g, h); }
AdaGrad+RDA(6/6) R(w)=0w=r(,g,h)
gi wi = 0
gi > wi = (gi +) / h i
wi = (gi ) / h igi <
AdaGrad+RDA(1/3)
AdaGrad+RDA
AdaGrad = Adaptive Gradient = AROWSCW
RDA = Regularized Dual Averaging Regularized: () Dual Averaging: ()
AdaGrad+RDA(2/3) Regret
R( ws+1) = ( giws+1,i )i + ws+1 +
12 ( hii ws+1,i2 )
loss function: ()
regularization term:
proximal term
Dual
Averaging
Regularized
Adaptive Gradient
AdaGrad+RDA(3/3) 1 w
maxws+1
f j,wj ws+1j=0
s
=maxws+1
f j,wjj=0
s
f j,ws+1j=0
s
=minws+1
f j,ws+1j=0
s
=minws+1 f jj=0
s
/ s,ws+1
=minws+1
g,ws+1 =minws+1 giws+1,ii
(1/1) SGD AdaGrad+RDA AdaGrad+RDA
(https://github.com/echizentm/AdaGrad)
: Duchi et.al.(2010) Adaptive Subgradient Methods for Online
Learning and Stochastic Optimization Xiao(2010) Dual Averaging Methods for Regularized
Stochastic Learning and Online Optimization