0415_seminar_DeepDPG

PowerPoint

Continuous Control with Deep Reinforcement Learning2016 ICLRTimothy P. Lillicrap, et al. (Google DeepMind)Presenter : Hyemin Ahn

1

Introduction2016. 4. 15.CPSLAB (EECS)2

Another work ofDeep Learning + Reinforcement Learningfrom Google DEEPMIND !

Extended their Deep Q Network, which is dealing with discrete action space, to continuous action space.

Results : Preview2016. 4. 15.CPSLAB (EECS)3

Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)4

AgentHow can we formulize our behavior?


Wowso scaresuch gunso many bulletsnice suit btw




MDP



Q-learning is finding , the greedy policyReinforcement Learning : Q-Learning2016. 4. 15.CPSLAB (EECS)11

Let us think about the deterministic policy instead of stochastic one

Can we do this in continuous action space?

Reinforcement Learning : continuous space?2016. 4. 15.CPSLAB (EECS)12

Problem 1: How can we know this real value?

Reinforcement Learning : continuous space?2016. 4. 15.CPSLAB (EECS)13

How can we find in a continuous action space?

Anyway, if assume that we know

Silver, David, et al. "Deterministic policy gradient algorithms."ICML. 2014.

The gradient of the policys performance can be defined as,

Problem 2: How can we successfully explore this action space?

Objective : Learn and in a continuous action space!Reinforcement Learning : continuous space?2016. 4. 15.CPSLAB (EECS)14

How can we successfully explore this action space?

Problem of

How can we know this real value?Problem of

Both are neural network

Authors suggest to use additional target networks and

DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)15


Our objective


Our objective


Our objective


Our objective

Explored reward + sum of future rewards from target policy network


Our objective

Explored reward + sum of future rewards from target policy network


Assume these are right, real target networks

Exploration

Experiment : Results2016. 4. 15.CPSLAB (EECS)22

2016. 4. 15.CPSLAB (EECS)23

Engineering

0415_seminar_DeepDPG