Upload
hye-min-ahn
View
163
Download
0
Embed Size (px)
Citation preview
PowerPoint
Continuous Deep Q-Learningwith Model-based Acceleration2016 ICMLS. Gu, T. Lillicrap, I. Sutskever, S. Levine. Presenter : Hyemin Ahn
HAYA!
1
Introduction2016. 11. 18.CPSLAB (EECS)2
Another, and Another improved work ofDeep - Reinforcement Learning
Tried incorporate the advantages of Model-free Reinforcement Learning && Model-based Reinforcement Learning
HAYA!Results : Preview2016. 11. 18.CPSLAB (EECS)3
HAYA!Reinforcement Learning : overview2016. 11. 18.CPSLAB (EECS)4
AgentHow can we formulize our behavior?
HAYA!Reinforcement Learning : overview2016. 11. 18.CPSLAB (EECS)5
Wowso scaresuch gunso many bulletsnice suit btw
HAYA!Reinforcement Learning : overview2016. 11. 18.CPSLAB (EECS)6
HAYA!Reinforcement Learning : overview2016. 11. 18.CPSLAB (EECS)7
HAYA!Reinforcement Learning : overview2016. 11. 18.CPSLAB (EECS)8
MDP
HAYA!Reinforcement Learning : overview2016. 11. 18.CPSLAB (EECS)9
HAYA!Reinforcement Learning : overview2016. 11. 18.CPSLAB (EECS)10
HAYA!Reinforcement Learning : Model Free?2016. 11. 18.CPSLAB (EECS)11
HAYA!Continuous Q-Learning with Normalized Advantage Functions2016. 11. 18.CPSLAB (EECS)12How Authors learned parameterized Q-function with Deep Learning, when the domain of state-action is continuous?
Value function They suggest to use a neural network that separately outputs a value function term, and an advantage term.
HAYA!Of these, the least novel are the value/advantage decomposition of Q(s,a) and the use of locally-adapted linear-Gaussian dynamics.12
Continuous Q-Learning with Normalized Advantage Functions2016. 11. 18.CPSLAB (EECS)13
Trick : assume that we have a target network.EXPERIENCECONTAINER
HAYA!But we dont know the target!13
Accelerating Learning with Imagination Rollouts2016. 11. 18.CPSLAB (EECS)14The sample complexity of model-free algorithms tends to be high when using high-dimensional function approximators.To reduce the sample complexity and accelerate the learning phase, how about using a good exploratory behavior from the trajectory optimization?
HAYA!But we dont know the target!14
Accelerating Learning with Imagination Rollouts2016. 11. 18.CPSLAB (EECS)15how about using a good exploratory behavior from the trajectory optimization?
HAYA!But we dont know the target!15
Experiment : Results2016. 11. 18.CPSLAB (EECS)16
HAYA!Experiment : Results2016. 11. 18.CPSLAB (EECS)17
HAYA!2016. 11. 18.CPSLAB (EECS)18
HAYA!