Upload
hye-min-ahn
View
258
Download
0
Embed Size (px)
Citation preview
PowerPoint
Continuous Control with Deep Reinforcement Learning2016 ICLRTimothy P. Lillicrap, et al. (Google DeepMind)Presenter : Hyemin Ahn
1
Introduction2016. 4. 15.CPSLAB (EECS)2
Another work ofDeep Learning + Reinforcement Learningfrom Google DEEPMIND !
Extended their Deep Q Network, which is dealing with discrete action space, to continuous action space.
Results : Preview2016. 4. 15.CPSLAB (EECS)3
Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)4
AgentHow can we formulize our behavior?
Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)5
Wowso scaresuch gunso many bulletsnice suit btw
Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)6
Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)7
Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)8
MDP
Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)9
Reinforcement Learning : overview2016. 4. 15.CPSLAB (EECS)10
Q-learning is finding , the greedy policyReinforcement Learning : Q-Learning2016. 4. 15.CPSLAB (EECS)11
Let us think about the deterministic policy instead of stochastic one
Can we do this in continuous action space?
Reinforcement Learning : continuous space?2016. 4. 15.CPSLAB (EECS)12
Problem 1: How can we know this real value?
Reinforcement Learning : continuous space?2016. 4. 15.CPSLAB (EECS)13
How can we find in a continuous action space?
Anyway, if assume that we know
Silver, David, et al. "Deterministic policy gradient algorithms."ICML. 2014.
The gradient of the policys performance can be defined as,
Problem 2: How can we successfully explore this action space?
Objective : Learn and in a continuous action space!Reinforcement Learning : continuous space?2016. 4. 15.CPSLAB (EECS)14
How can we successfully explore this action space?
Problem of
How can we know this real value?Problem of
Both are neural network
Authors suggest to use additional target networks and
DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)15
DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)16
Our objective
DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)17
Our objective
DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)18
Our objective
DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)19
Our objective
Explored reward + sum of future rewards from target policy network
DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)20
Our objective
Explored reward + sum of future rewards from target policy network
DDPG(Deep DPG) Algorithm2016. 4. 15.CPSLAB (EECS)21
Assume these are right, real target networks
Exploration
Experiment : Results2016. 4. 15.CPSLAB (EECS)22
2016. 4. 15.CPSLAB (EECS)23