[RL輪読会]Distral: Robust Multitask Reinforcement Learning

2017/10/3

Distral: Robust Multitask Reinforcement Learning[ DeepMind 2017/7/13 arXive: 1707.04175]

6 1

✤

DisTraL (Distill) (Transfer Learning)

✤

DeepMind Lab

Asynchronous Advantage Actor-Critic (A3C)

✤

✤

2

✤ Distral

✤ ( )

✤

✤

✤ KL divergence

✤

✤ ”sematically meanigful ”

✤ semantic [ ]

✤

✤ KL entropy( ) 3

4

✤ ❶ { distillation, soften, temperature }30 …

https://www.youtube.com/watch?v=eZdOkDtYMoo

CS231n 2017 Spring Lecture 15 | Efficient Methods and Hardware for Deep Learning

50:47~ Model Distillation

✤ ❷A3C { Asynchronous, Actor-Critic, Advantage}

p50~53

TensorFlow

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2

https://www.youtube.com/watch?v=eZdOkDtYMoo



DisTraL vs

5

arXiv:1612.03801

A3C

A3C Multitask

π0

DisTraL

f

h

fπ1 πn

π0h

πi

KL

✤ 2col. : 1col. KL divergence

6

regularize:

7

α=0.5

C

CVα=0.5 CVα=0.5

CVα=0.5

KL divergences:

distill:

8

− as

Ctemperature: soft hyper-parameter

θ0

CVα=0.5πi π0

DeepMind Lab

9

arXiv:1612.03801

3D, first-person

✤ 9

✤ 4

✤ 9 x 4 10

step 1e8

Mazes (fθ)

✤ Distral

✤ 1col 2col

✤ entropy

✤ A3C multitask A3C 2 col. Distral

✤ Distal

11

step 1e8

Mazes (hθ)

12

step 1e8

✤ KL 2col. KL+ent 2 col.

✤ Distal

✤ α

Navigation

13

✤ Get

✤

✤

✤ Distal

[A3C]

step 1e8

Laser-tag

14

✤

✤ entropy

✤ A3C

✤ Distral 1col.

✤ Distral

step 1e8

greedy

entropy

15

Data & Analytics

[RL輪読会]Distral: Robust Multitask Reinforcement Learning