Upload
tomomi-moriyama
View
115
Download
9
Embed Size (px)
Citation preview
2017/10/3
Distral: Robust Multitask Reinforcement Learning[ DeepMind 2017/7/13 arXive: 1707.04175]
6 1
✤
DisTraL (Distill) (Transfer Learning)
✤
DeepMind Lab
Asynchronous Advantage Actor-Critic (A3C)
✤
✤
2
✤ Distral
✤ ( )
✤
✤
✤ KL divergence
✤
✤ ”sematically meanigful ”
✤ semantic [ ]
✤
✤ KL entropy( ) 3
4
✤ ❶ { distillation, soften, temperature }30 …
https://www.youtube.com/watch?v=eZdOkDtYMoo
CS231n 2017 Spring Lecture 15 | Efficient Methods and Hardware for Deep Learning
50:47~ Model Distillation
✤ ❷A3C { Asynchronous, Actor-Critic, Advantage}
p50~53
TensorFlow
https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2
DisTraL vs
5
arXiv:1612.03801
A3C
A3C Multitask
π0
DisTraL
f
h
fπ1 πn
π0h
πi
KL
✤ 2col. : 1col. KL divergence
6
regularize:
7
α=0.5
C
CVα=0.5 CVα=0.5
CVα=0.5
KL divergences:
distill:
8
− as
Ctemperature: soft hyper-parameter
θ0
CVα=0.5πi π0
DeepMind Lab
9
arXiv:1612.03801
3D, first-person
✤ 9
✤ 4
✤ 9 x 4 10
step 1e8
Mazes (fθ)
✤ Distral
✤ 1col 2col
✤ entropy
✤ A3C multitask A3C 2 col. Distral
✤ Distal
11
step 1e8
Mazes (hθ)
12
step 1e8
✤ KL 2col. KL+ent 2 col.
✤ Distal
✤ α
Navigation
13
✤ Get
✤
✤
✤ Distal
[A3C]
step 1e8
Laser-tag
14
✤
✤ entropy
✤ A3C
✤ Distral 1col.
✤ Distral
step 1e8
greedy
entropy
15