Upload
masahiro-suzuki
View
746
Download
1
Embed Size (px)
Citation preview
Variational Dropout and the Local Reparameterization TrickDiederik P.Kingma, Tim Salimans and Max Welling
Submitted on 8 Jun 2015(arXiv) 7/17
Dropout = local reparameterization trick
(SGVB) EM
EM
(SGVB)
Variational Dropout and the Local Reparameterization Trick
EM
(SGVB)
Variational Dropout and the Local Reparameterization Trick
EM EM
q(z)
q(z) q(z)
EM
(SGVB)
Variational Dropout and the Local Reparameterization Trick
z p(z)p(x|z)
z
p(x) p(x)=p(z)p(z|x)dz
p(z|x) p(z|x)=p(x|z)p(z)/p(x) EM
q(x|z) p(x|z)
reparameterization trick
reparameterization trick
L(,;x) =
q(z|x) logp(x, z)
q(z|x)dz
=
q(z|x) log
p(z)p(x|z)q(z|x)
dz
=
q(z|x) log
p(z)
q(z|x)dz+
q(z|x) log p(x|z)dz
=
q(z|x) logq(z|x)p(z)
dz+
q(z|x) log p(x|z)dz
= DKL(q(z|x)||p(z)) + Eq(z|x)[log p(x|z)] (6)
2 (SGVB),. , Eq(z|x)[f(z)],.
1. q(z|x) {z(l)}Ll=1 .2. z(l) ,.
Eq(z|x)[f(z)] 1
L
L
l=1
f(z(l)) (7)
,z q(z|x), g(,x) (z = g(,x)). ,p()., (7).
Eq(z|x,)[f(z)] =
q(z|x,)f(z)dz
=
p()f(z)d ( q(z|x,)dz = p()d)
=
p()f(g(,x))d
= Ep()[f(g(,x))] 1
L
L
l=1
f(g((l),x))
(l) p() (8)
(5), LA(q, ;x).
LA(q, ;x) = 1L
L
l=1
log p(x, z(l)|) log q(z(l)|x,)
z(l) = g((l),x), (l) p() (9)
, (SGVB)., (6) SGVB. (6) KL,., (6)
2
L(,;x) =
q(z|x) logp(x, z)
q(z|x)dz
=
q(z|x) log
p(z)p(x|z)q(z|x)
dz
=
q(z|x) log
p(z)
q(z|x)dz+
q(z|x) log p(x|z)dz
=
q(z|x) logq(z|x)p(z)
dz+
q(z|x) log p(x|z)dz
= DKL(q(z|x)||p(z)) + Eq(z|x)[log p(x|z)] (6)
2 (SGVB),. , Eq(z|x)[f(z)],.
1. q(z|x) {z(l)}Ll=1 .2. z(l) ,.
Eq(z|x)[f(z)] 1
L
L
l=1
f(z(l)) (7)
,z q(z|x), g(,x) (z = g(,x)). ,p()., (7).
Eq(z|x,)[f(z)] =
q(z|x,)f(z)dz
=
p()f(z)d ( q(z|x,)dz = p()d)
=
p()f(g(,x))d
= Ep()[f(g(,x))] 1
L
L
l=1
f(g((l),x))
(l) p() (8)
(5), LA(q, ;x).
LA(q, ;x) = 1L
L
l=1
log p(x, z(l)|) log q(z(l)|x,)
z(l) = g((l),x), (l) p() (9)
, (SGVB)., (6) SGVB. (6) KL,., (6)
2
(SGVB)
(SGVB)
1reconstruction error2
(SGVB) N
MSGVB
(M=100)L1
1
SGD
(SGVB)
1. M 2. 3. 4. 5.
SGVB
zx reparamaterization trick
SGVB
Hinton
MCMC1
Deep Learning
EM
(SGVB)
Variational Dropout and the Local Reparameterization Trick
KL
SGVB SGVB
SGD SGD
M
local reparameteraization trick
f()
0
local reparameterization trick
reparameteraization trick
0
10001000M
local reparameteraization trick
B1000
1000 = A
1000
M W
1000
1000
local reparameteraization trick
B1000
M = A
1000
M W
1000
1000
B
localreparameteraization trick 0M1000
local
01
p
independent weight noise
N(1,)b
Wang and Manning (2013) B
B=AWWlocal reparameterizaiton trick
correlated weight noise
B
local reparameterizaiton trick
W
dropout posterior Dropout
KL
scale invariant log-uniform prior
1
standard binary dropout Gaussian dropout type A (A) Gaussian dropout type B (B) variational dropout type A variational dropout type B
MNIST
fully connected3 rectified linear units(ReLUs) dropout rate: input layer p=0.2, hidden layers p=0.5 early stopping
variational dropout type B
dropout
dropout
SGVB
local reparameterizationSGVBepoch SGVB1635sec SGVB7.4sec
local reparameterizaiton200
A2KL
local reparameterization trick globallocal
local reparameterization trick variational dropout