Reinforcement Learning Opensource to Baselines...Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 3 Reinforcement Learning Policy

Reinforcement LearningOpensource to Baselines

Member

Korea Advanced Institute of Science Technology (KAIST)Dept. of Civil and Environmental Engineering

2

● 한국생산기술연구원○ 로보틱스 및 재활로봇 연구

● KAIST 연구원○ IoT 시스템 개발 & 강화학습

연구

● 플랜아이 연구원○ 추천 시스템 & 강화학습 연구

● 한양대학교 석사과정○ 강화학습 시뮬레이션 구축○ 속도 예측 연구

● 호서대학교 로봇 자동화공학과 학부생○ 임베디드, ROS, 제어, 강화학습○ 강화학습을 로봇에 적용하는

연구

Index


3

● Reinforcement Learning

● Policy based Reinforcement Learning

● Value based Reinforcement Learning

● Famous open source for Reinforcement Learning

● Why?

● Requirements

● Supported algorithms

● To do List

● How to use

● License

Reinforcement Learning


4

Policy base Reinforcement Learning


5


Probability to select action(a) at state(s)



State Distribution


State Distribution

Policy


State Distribution

Policy

Reward











● Policy Gradient

● Advantage Actor Critic

● Natural Policy Gradient

● Trust Region Policy Gradient

● Proximal Policy Gradient

Policy Gradient

● Policy Gradient Methods for Reinforcement Learning with Function Approximation

○ Policy can be trained by iterable method

○ Policy can be trained by Policy Gradient

Method

● Policy Gradient

○ Optimal Policy can be obtained by Gradient Ascend

Method

● Iterable Method

○ Optimal Policy Can be obtained by iterable method

like deep learning method

Policy Gradient

● Why advantage is needed?

○ Reduce variance, without changing expectation

○ a

○ laksdfasdf

○

Advantage Actor Critic

Natural Policy Gradient

● The parameter update can not guarantee

the performance improvement of the actual function

Trust Region Policy Optimization● Trust Region











Proximal Policy Optimization● Trust Region














Value base Reinforcement Learning


● Playing Atari with Deep Reinforcement Learning(NIPS 2013)

● Human-level control through deep reinforcement learning(Nature 2015)

● Deep Reinforcement Learning with Double Q-Learning

● Dueling Network Architectures for Deep Reinforcement Learning

● Prioritized Experience Replay

● …..


Dispatch Time : 5 min

Distance : 2 min


Dispatch Time : 5 min

Distance : 2 min


3 min 8 min


● A Distributional perspective on Reinforcement Learning(2017)

● Distributional Reinforcement Learning with Quantile Regression(2017)

● Implicit Quantile Networks for Distributional Reinforcement Learning(2018)




Distributional Reinforcement Learningwith Quantile Regrssion

● Histogram cannot meet condition

● Wasserstein Distance

Distributional Reinforcement Learningwith Quantile Regrssion

Implicit Quantile Networkfor Distributional Reinforcement Learning

Implicit Quantile Networkfor Distributional Reinforcement Learning

RandomSampling


Famous open source for Reinforcement Learning


64



65

● Deepminds



66

● Deepminds

● Open AI



67

● Deepminds

● Open AI



68

● Deepminds

● Open AI



69

● Deepminds

● Open AI



70

● Deepminds

● Open AI



71

● Deepminds

● Open AI



72

● Deepminds

○ Dopamine

■ Value based Reinforcement Learning opensource baseline

■ https://github.com/google/dopamine

● Open AI



73

● Deepminds

○ Dopamine

■ Value based Reinforcement Learning open source baseline

■ https://github.com/google/dopamine

● Open AI

○ Baselines

■ Policy based Reinforcement Learning open source baseline

■ https://github.com/openai/baselines

Why?


74

Why?


75

● Hard to Install

Why?


76

● Hard to Install

○ baselines : mp4py, mujoco…

○ dopamine : easy

Why?


77

● Hard to Install


○ dopamine : easy

Why?


78

● Hard to Install


○ dopamine : easy

● Hard to use to your environment

Why?


79

● Hard to Install


○ dopamine : easy


○ manipulate the api

○ https://github.com/google/dopamine

○ https://github.com/openai/baselines

https://github.com/google/dopamine

https://github.com/openai/baselines

Why?


80

● Hard to Install


○ dopamine : easy


○ manipulate the api

○ https://github.com/google/dopamine

○ https://github.com/openai/baselines

● Korean

https://github.com/google/dopamine

https://github.com/openai/baselines

Requirements


81

Requirements


82

● Hard to Install

Requirements


83

● Easy to Install

Requirements


84

● Easy to Install

Requirements


85

● Easy to Install


Requirements


86

● Easy to Install

● Easy to use to your environment

Requirements


87

● Easy to Install


○ Provide many tutorial code(discrete, continuous action)

Requirements


88

● Easy to Install



○ Keep the flow of other reinforcement learning code

Requirements


89

● Easy to Install



○ Keep the flow of other reinforcement learning code

Requirements


90

● Easy to Install


● Korean

Supported algorithms


91



92

● Vanilla Policy Gradient

● Advantage Actor Critic

● Proximal Policy Optimization

● Deep Deterministic Policy Gradient



93

● Vanilla Policy Gradient(Discrete Action)

● Advantage Actor Critic(Discrete Action)

● Proximal Policy Optimization(Discrete, Continuous Action)

● Deep Deterministic Policy Gradient(Discrete, Continuous Action)

To do List


94





To do List


95





● Applicate LSTM to Vanilla Policy Gradient, Advantage Actor Critic, Proximal Policy Optimization

● Actor Critic Experience Replay

● Soft actor critic

● Value based Reinforcement Learning

○ DQN, Double DQN, Deuling DQN to Rainbow

○ Distributional RL

How to use


96

How to use


97

License


98

License


99

Free


100

We are hiring


101

Support us in any formhttps://github.com/RLOpensource/tensorflow_RL/blob/master/model.py

https://github.com/RLOpensource/tensorflow_RL/blob/master/model.py

Documents

Reinforcement Learning Opensource to Baselines...Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 3 Reinforcement Learning Policy