23
Continuous Control with Deep Reinforcement Learning 2016 ICLR Timothy P. Lillicrap, et al. (Google DeepMind) Presenter : Hyemin Ahn

0415_seminar_DeepDPG

Embed Size (px)

Citation preview

PowerPoint

Continuous Control with Deep Reinforcement Learning2016 ICLRTimothy P. Lillicrap, et al. (Google DeepMind)Presenter : Hyemin Ahn

1

Introduction2016. 4. 15.CPSLAB (EECS)2

Another work ofDeep Learning + Reinforcement Learningfrom Google DEEPMIND !

Extended their Deep Q Network, which is dealing with discrete action space, to continuous action space.

Results : Preview2016. 4. 15.CPSLAB (EECS)3

Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)4

AgentHow can we formulize our behavior?

Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)5

Wowso scaresuch gunso many bulletsnice suit btw

Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)6

Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)7

Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)8

MDP

Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)9

Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)10

Q-learning is finding , the greedy policyReinforcement Learning : Q-Learning2016. 4. 15.CPSLAB (EECS)11

Let us think about the deterministic policy instead of stochastic one

Can we do this in continuous action space?

Reinforcement Learning : continuous space?2016. 4. 15.CPSLAB (EECS)12

Problem 1: How can we know this real value?

Reinforcement Learning : continuous space?2016. 4. 15.CPSLAB (EECS)13

How can we find in a continuous action space?

Anyway, if assume that we know

Silver, David, et al. "Deterministic policy gradient algorithms."ICML. 2014.

The gradient of the policys performance can be defined as,

Problem 2: How can we successfully explore this action space?

Objective : Learn and in a continuous action space!Reinforcement Learning : continuous space?2016. 4. 15.CPSLAB (EECS)14

How can we successfully explore this action space?

Problem of

How can we know this real value?Problem of

Both are neural network

Authors suggest to use additional target networks and

DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)15

DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)16

Our objective

DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)17

Our objective

DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)18

Our objective

DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)19

Our objective

Explored reward + sum of future rewards from target policy network

DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)20

Our objective

Explored reward + sum of future rewards from target policy network

DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)21

Assume these are right, real target networks

Exploration

Experiment : Results2016. 4. 15.CPSLAB (EECS)22

2016. 4. 15.CPSLAB (EECS)23