Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Reinforcement LearningOpensource to Baselines
Member
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
2
● 한국생산기술연구원○ 로보틱스 및 재활로봇 연구
● KAIST 연구원○ IoT 시스템 개발 & 강화학습
연구
● 플랜아이 연구원○ 추천 시스템 & 강화학습 연구
● 한양대학교 석사과정○ 강화학습 시뮬레이션 구축○ 속도 예측 연구
● 호서대학교 로봇 자동화공학과 학부생○ 임베디드, ROS, 제어, 강화학습○ 강화학습을 로봇에 적용하는
연구
Index
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
3
● Reinforcement Learning
● Policy based Reinforcement Learning
● Value based Reinforcement Learning
● Famous open source for Reinforcement Learning
● Why?
● Requirements
● Supported algorithms
● To do List
● How to use
● License
Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
4
Policy base Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
5
Policy base Reinforcement Learning
Probability to select action(a) at state(s)
Policy base Reinforcement Learning
Policy base Reinforcement Learning
State Distribution
Policy base Reinforcement Learning
State Distribution
Policy
Policy base Reinforcement Learning
State Distribution
Policy
Reward
Policy base Reinforcement Learning
Policy base Reinforcement Learning
Policy base Reinforcement Learning
Policy base Reinforcement Learning
Policy base Reinforcement Learning
Policy base Reinforcement Learning
Policy base Reinforcement Learning
Policy base Reinforcement Learning
Policy base Reinforcement Learning
Policy base Reinforcement Learning
● Policy Gradient
● Advantage Actor Critic
● Natural Policy Gradient
● Trust Region Policy Gradient
● Proximal Policy Gradient
Policy Gradient
● Policy Gradient Methods for Reinforcement Learning with Function Approximation
○ Policy can be trained by iterable method
○ Policy can be trained by Policy Gradient
Method
● Policy Gradient
○ Optimal Policy can be obtained by Gradient Ascend
Method
● Iterable Method
○ Optimal Policy Can be obtained by iterable method
like deep learning method
Policy Gradient
● Why advantage is needed?
○ Reduce variance, without changing expectation
○ a
○ laksdfasdf
○
Advantage Actor Critic
Natural Policy Gradient
● The parameter update can not guarantee
the performance improvement of the actual function
Trust Region Policy Optimization● Trust Region
Trust Region Policy Optimization● Trust Region
Trust Region Policy Optimization● Trust Region
Trust Region Policy Optimization● Trust Region
Trust Region Policy Optimization● Trust Region
Trust Region Policy Optimization● Trust Region
Trust Region Policy Optimization● Trust Region
Trust Region Policy Optimization● Trust Region
Trust Region Policy Optimization● Trust Region
Trust Region Policy Optimization● Trust Region
Trust Region Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Proximal Policy Optimization● Trust Region
Value base Reinforcement Learning
Value base Reinforcement Learning
● Playing Atari with Deep Reinforcement Learning(NIPS 2013)
● Human-level control through deep reinforcement learning(Nature 2015)
● Deep Reinforcement Learning with Double Q-Learning
● Dueling Network Architectures for Deep Reinforcement Learning
● Prioritized Experience Replay
● …..
Value base Reinforcement Learning
Dispatch Time : 5 min
Distance : 2 min
Value base Reinforcement Learning
Dispatch Time : 5 min
Distance : 2 min
Value base Reinforcement Learning
3 min 8 min
Value base Reinforcement Learning
● A Distributional perspective on Reinforcement Learning(2017)
● Distributional Reinforcement Learning with Quantile Regression(2017)
● Implicit Quantile Networks for Distributional Reinforcement Learning(2018)
Value base Reinforcement Learning
Value base Reinforcement Learning
Value base Reinforcement Learning
Distributional Reinforcement Learningwith Quantile Regrssion
● Histogram cannot meet condition
● Wasserstein Distance
Distributional Reinforcement Learningwith Quantile Regrssion
Implicit Quantile Networkfor Distributional Reinforcement Learning
Implicit Quantile Networkfor Distributional Reinforcement Learning
RandomSampling
Value base Reinforcement Learning
Famous open source for Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
64
Famous open source for Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
65
● Deepminds
Famous open source for Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
66
● Deepminds
● Open AI
Famous open source for Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
67
● Deepminds
● Open AI
Famous open source for Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
68
● Deepminds
● Open AI
Famous open source for Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
69
● Deepminds
● Open AI
Famous open source for Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
70
● Deepminds
● Open AI
Famous open source for Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
71
● Deepminds
● Open AI
Famous open source for Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
72
● Deepminds
○ Dopamine
■ Value based Reinforcement Learning opensource baseline
■ https://github.com/google/dopamine
● Open AI
Famous open source for Reinforcement Learning
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
73
● Deepminds
○ Dopamine
■ Value based Reinforcement Learning open source baseline
■ https://github.com/google/dopamine
● Open AI
○ Baselines
■ Policy based Reinforcement Learning open source baseline
■ https://github.com/openai/baselines
Why?
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
74
Why?
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
75
● Hard to Install
Why?
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
76
● Hard to Install
○ baselines : mp4py, mujoco…
○ dopamine : easy
Why?
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
77
● Hard to Install
○ baselines : mp4py, mujoco…
○ dopamine : easy
Why?
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
78
● Hard to Install
○ baselines : mp4py, mujoco…
○ dopamine : easy
● Hard to use to your environment
Why?
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
79
● Hard to Install
○ baselines : mp4py, mujoco…
○ dopamine : easy
● Hard to use to your environment
○ manipulate the api
○ https://github.com/google/dopamine
○ https://github.com/openai/baselines
Why?
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
80
● Hard to Install
○ baselines : mp4py, mujoco…
○ dopamine : easy
● Hard to use to your environment
○ manipulate the api
○ https://github.com/google/dopamine
○ https://github.com/openai/baselines
● Korean
Requirements
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
81
Requirements
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
82
● Hard to Install
Requirements
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
83
● Easy to Install
Requirements
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
84
● Easy to Install
Requirements
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
85
● Easy to Install
● Hard to use to your environment
Requirements
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
86
● Easy to Install
● Easy to use to your environment
Requirements
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
87
● Easy to Install
● Easy to use to your environment
○ Provide many tutorial code(discrete, continuous action)
Requirements
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
88
● Easy to Install
● Easy to use to your environment
○ Provide many tutorial code(discrete, continuous action)
○ Keep the flow of other reinforcement learning code
Requirements
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
89
● Easy to Install
● Easy to use to your environment
○ Provide many tutorial code(discrete, continuous action)
○ Keep the flow of other reinforcement learning code
Requirements
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
90
● Easy to Install
● Easy to use to your environment
● Korean
Supported algorithms
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
91
Supported algorithms
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
92
● Vanilla Policy Gradient
● Advantage Actor Critic
● Proximal Policy Optimization
● Deep Deterministic Policy Gradient
Supported algorithms
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
93
● Vanilla Policy Gradient(Discrete Action)
● Advantage Actor Critic(Discrete Action)
● Proximal Policy Optimization(Discrete, Continuous Action)
● Deep Deterministic Policy Gradient(Discrete, Continuous Action)
To do List
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
94
● Vanilla Policy Gradient(Discrete Action)
● Advantage Actor Critic(Discrete Action)
● Proximal Policy Optimization(Discrete, Continuous Action)
● Deep Deterministic Policy Gradient(Discrete, Continuous Action)
To do List
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
95
● Vanilla Policy Gradient(Discrete Action)
● Advantage Actor Critic(Discrete Action)
● Proximal Policy Optimization(Discrete, Continuous Action)
● Deep Deterministic Policy Gradient(Discrete, Continuous Action)
● Applicate LSTM to Vanilla Policy Gradient, Advantage Actor Critic, Proximal Policy Optimization
● Actor Critic Experience Replay
● Soft actor critic
● Value based Reinforcement Learning
○ DQN, Double DQN, Deuling DQN to Rainbow
○ Distributional RL
How to use
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
96
How to use
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
97
License
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
98
License
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
99
Free
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
100
We are hiring
Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering
101
Support us in any formhttps://github.com/RLOpensource/tensorflow_RL/blob/master/model.py