15
2017/10/3 Distral: R obust M ultitask R einforcement L earning [ DeepMind 2017/7/13 arXive: 1707.04175] 6 1

[RL輪読会]Distral: Robust Multitask Reinforcement Learning

Embed Size (px)

Citation preview

Page 1: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

2017/10/3

Distral: Robust Multitask Reinforcement Learning[ DeepMind 2017/7/13 arXive: 1707.04175]

6 1

Page 2: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

DisTraL (Distill) (Transfer Learning)

DeepMind Lab

Asynchronous Advantage Actor-Critic (A3C)

2

Page 3: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

✤ Distral

✤ ( )

✤ KL divergence

✤ ”sematically meanigful ”

✤ semantic [ ]

✤ KL entropy( ) 3

Page 4: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

4

✤ ❶ { distillation, soften, temperature }30 …

https://www.youtube.com/watch?v=eZdOkDtYMoo

CS231n 2017 Spring Lecture 15 | Efficient Methods and Hardware for Deep Learning

50:47~ Model Distillation

✤ ❷A3C { Asynchronous, Actor-Critic, Advantage}

p50~53

TensorFlow

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2

Page 5: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

DisTraL vs

5

arXiv:1612.03801

A3C

A3C Multitask

π0

DisTraL

f

h

fπ1 πn

π0h

πi

KL

Page 6: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

✤ 2col. : 1col. KL divergence

6

Page 7: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

regularize:

7

α=0.5

C

CVα=0.5 CVα=0.5

CVα=0.5

KL divergences:

Page 8: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

distill:

8

− as

Ctemperature: soft hyper-parameter

θ0

CVα=0.5πi π0

Page 9: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

DeepMind Lab

9

arXiv:1612.03801

3D, first-person

Page 10: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

✤ 9

✤ 4

✤ 9 x 4 10

step 1e8

Page 11: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

Mazes (fθ)

✤ Distral

✤ 1col 2col

✤ entropy

✤ A3C multitask A3C 2 col. Distral

✤ Distal

11

step 1e8

Page 12: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

Mazes (hθ)

12

step 1e8

✤ KL 2col. KL+ent 2 col.

✤ Distal

✤ α

Page 13: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

Navigation

13

✤ Get

✤ Distal

[A3C]

step 1e8

Page 14: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

Laser-tag

14

✤ entropy

✤ A3C

✤ Distral 1col.

✤ Distral

step 1e8

greedy

entropy

Page 15: [RL輪読会]Distral: Robust Multitask Reinforcement Learning

15