w o r k s h o p AWS DeepRacer - Amazon Web Services, Inc. · SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DeepRacer Amazon Web Service Japan

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T

AWS DeepRacer

Amazon Web Service Japan K.K.*** Solutions Architect****

w o r k s h o p


自己紹介

志村誠

ソリューションアーキテクト

• データ分析・機械学習系サービスを担当

• 好きなサービス• Amazon Athena

• AWS Glue

• そして Amazon SageMaker


AWS マンガ第 10 話：いざ挑戦、AWS Summit で AWS DeepRacer リーグ！


https://aws.amazon.com/jp/campaigns/manga

AWS マンガ

https://aws.amazon.com/jp/campaigns/manga


アジェンダ

• AWS DeepRacer の概要

• 強化学習

• シミュレータ

• AWS DeepRacer の構成詳細

• DeepRacer リーグ

• AWS DeepRacer コンソールの利用方法

本資料では2019年5月30日時点のサービス内容についてご説明しています。最新の情報は AWS 公式ウェブサイト(http://aws.amazon.com) にてご確認ください。

S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.


AWS DeepRacer

強化学習をすべての開発者の

手に届けるためのサービス


AWS DeepRacer とは

1/18スケールの自律走行カー

学習と評価のためのシミュレータ

世界中でのレースリーグ


DeepRacer を走らせるためには

直進

….

• クルマからのカメラ画像のあらゆる見え方に対して、自動運転カーがとるべき運転行動を登録できれば、コースを走らせることが可能

• 実際には無数の見え方が存在するため登録自体が難しい

左


エージェント環境行動ゴールモデル状態

強化学習の導入

• カメラ画像から行動を決定するモデルを学習により作成• 環境 (コース) に対して、エージェントが様々な行動 (運転) を試し、

ゴールに到達できるように学習


強化学習


強化学習の位置づけ

強化学習教師あり学習

教師なし学習


機械学習の全体像

教師あり学習

すべての学習データは、対応するラベルが必要

教師なし学習

学習データにラベルは不要

強化学習

特定の環境下で、一連の行動から学習


実世界における強化学習

良い行動に報酬を与える

悪い行動には報酬なし


強化学習の用語

エージェント環境状態

行動エピソード報酬


報酬関数

強化学習において、特定の行動にインセンティブを与える報酬関数が重要


S G = 2

ゴールエージェント

レースのための報酬関数


センターラインを走るようにインセンティブを与える

0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

S 2 2 2 2 2 2 G = 2

0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

8.6 9.5 8.5 7.5 6.3 5.0 3.5 1.9

S 10.4 9.4 8.2 6.9 5.4 3.8 G = 2

8.6 9.5 8.5 7.5 6.3 5.0 3.5 1.9

ステップの割引率0.9


学習が行われるプロセス価値関数 (value fn)

方策関数 (reward fn)


強化学習アルゴリズム: Vanilla policy gradient

* Image Source: Landscape image is CC0 1.0 public domain

J()Newweights

Newweights

0.4 ± 𝛿0.3 ±𝛿

https://www.maxpixel.net/Mountains-Valleys-Landscape-Hills-Grass-Green-699369

https://creativecommons.org/publicdomain/zero/1.0/deed.en


AWS DeepRacer のニューラルネットワーク構造

入力 –状態 (画像) 出力 –行動


Amazon SageMaker Reinforcement Learning

• ゲームやロボットのシミュレーション環境と統合した SageMaker 上の強化学習

• 強化学習ツールキットとして Coach とRL-Ray をサポート

• AWS RoboMakerなどのシミュレータを OSS OpenAI Gym インターフェース経由で利用可能．分散学習とシミュレーションの並列化が可能

Redis

方策をもとに行動

観測結果, 報酬

方策を学習

エージェント

Container for Agent

Container for Agent

Container for environment

Container for environment

OpenAI gym, simulator…

環境シミュレータ

AWS RoboMaker

強化学習ツールCoach, RLLib


教師あり学習(BEHAVIORAL CLONING)

• カメラ付きの実機カーを熟練のドライバーが運転

• カメラ画像とドライバーの運転を記録し、モデルを学習

学習の結果状態 (画像)を入力すると運転行動を決定する

DeepRacer における強化学習 vs. それ以外のアルゴリズム

強化学習

• 仮想的なエージェントがシミュレーション環境で行動を繰り返し、経験 (入力画像・行動・次状態・報酬) を蓄積

• 経験を利用して学習し、学習したモデルでさらに経験を獲得

学習の結果状態 (画像)を入力すると運転行動を決定する



AWS Cloud

AWS DeepRacerNAT gateway

VPC

AWS DeepRacer

モデル

シミュレーション動画

メトリクス

AWS DeepRacer シミュレーションアーキテクチャ


AWS DeepRacer コンソールの流れ


行動空間の設定

• スピードとステアリングの組合せで定義

• 細かい調整を行うために粒度を設定可能


報酬関数の実装


コースの構成要素

センターライン

サーキットの壁

コース面 (別名: コース上, on-track)

フィールド（別名: コース外, off-track)

コースの境界線


座標系と参照点 (waypoints)

コース外側の参照点

コース中央の参照点

コース内側の参照点

X

Yコース幅

自動運転カーの向き


学習アルゴリズムを制御するハイパーパラメータ


AWS DeepRacer の構成詳細


AWS DeepRacer スペック

CAR:18th scale 4WD with monster truck chassisCPU: Intel Atom ProcessorMEMORY: 4 GB RAMSTORAGE: 32 GB (expandable)WI-FI: 802.11acCAMERA: 4 MP camera with MJPEGDRIVE BATTERY: 1000 mAh lithium polymerCOMPUTE BATTERY: 13600 mAh USB-C SENSORS: Integrated accelerometer and gyroscopePORTS: 4x USB-A, 1x USB-C, 1x Micro-USB, 1x HDMISOFTWARE: Ubuntu OS 16.04.3 LTS, Intel OpenVINOtoolkit, ROS Kinetic


Stored file

ROS nodes

Video

M-JPEG

Webサーバ動画

最適化済みモデル

メディアエンジン

カメラ

モデル

AWS DeepRacer ソフトウェアアーキテクチャ

モデル最適化

推論エンジン

推論結果

ナビゲーションノード

自動運転

手動運転Webサーバ

publisher

制御ノード

サーボ&モータ


シミュレーションと実環境のドメイン転移

シミュレーションから実環境への難しさ

• シミュレーション画像を利用して学習しているが、実機では実世界の画像を利用

• 実環境の完全なシミュレーションも難しい

戦略

• 環境制御を実世界に近づける

• 環境にランダムな要素を追加

• モデルのモジュール化・抽象化



AWS DeepRacer League

世界で最初のグローバルな自動運転レースリーグ

www.deepracerleague.com

バーチャルサーキット Summit サーキット

• AWS DeepRacer のサービスにアクセスしましょう

• モデルを学習させます

• バーチャルサーキットで開催されているレースにモデルを提出します

• 実機とコースは AWS Summit で用意されます

• モデルを持ち込むか、ワークショップで学習させましょう

• レースに参加して、リーダーボードに名前を載せ、歴史を作りましょう


バーチャルサーキットへの参加方法


強化学習についてもっと知りたい

• リーダーボードで上位になるためには強化学習に関する知識が必要不可欠です

• 強化学習とAWS DeepRacer に関する学習コンテンツを提供しています

• コンテンツは無料で、90分間、6つの自己学習のパートで構成されています

https://www.aws.training/learningobject/wbc?id=32143


AWS DeepRacer: training and certification



AWS DeepRacer コンソールの利用方法


https://github.com/aws-samples/aws-deepracer-workshops/blob/master/Workshops/2019-AWSSummits-AWSDeepRacerService/Lab1/Readme-Japanese.md

http://bit.ly/deepracer-wsjp

AWS DeepRacer workshop labs

AWS DeepRacer の強化学習モデルを構築しましょう!

https://github.com/aws-samples/aws-deepracer-workshops/blob/master/Workshops/2019-AWSSummits-AWSDeepRacerService/Lab1/Readme-Japanese.md

http://bit.ly/deepracer-wsjp

S U M M I TN A M E


Get hands on with AWS DeepRacer & compete in the AWS DeepRacer League

DeClercq WentzelSenior Product ManagerAmazon Web Services

< < Y O U R W O R K S H O P C O D E > >


Agenda

• AWS DeepRacer origin

• RL for the Sunday driver

• Virtual simulator

• Rubber meets the road

• Under the hood



How can we put machine learning in the hands of all developers? literally


1/18 scale autonomous race car

AWS DeepRacer: An exciting way for developers to get hands-on experience with machine learning

Global Racing LeagueVirtual simulator, to train and evaluate


AWS DeepRacer League, race for prizes and glory

The world’s first global, autonomous racing league


Keen on setting up a race in your company? Please reach out


AWS DeepRacer problem formulation

STATE



Reinforcement learning in the broader AI context

ReinforcementLearning

SupervisedLearning

UnsupervisedLearning


Machine learning overview

SUPERVISED UNSUPERVISED REINFORCEMENT


Reinforcement learning in the real world

Reward positive behavior

Don’t reward negative behavior The result!


Reinforcement learning terms

AGENT ENVIRONMENT STATE

ACTIONEPISODEREWARD


The reward function

The reward function incentivizes particular behaviors and is at the core of reinforcement

learning


The reward function in a race grid

S G = 2

GOALAGENT


Incentivizing centerline behavior

0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

S 2 2 2 2 2 2 G = 2

0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

8.6 9.5 8.5 7.5 6.3 5.0 3.5 1.9

S 10.4 9.4 8.2 6.9 5.4 3.8 G = 2

8.6 9.5 8.5 7.5 6.3 5.0 3.5 1.9

Discount per step 0.9


How does learning happen?VALUE FUNCTION

POLICY FUNCTION


RL algorithms: Vanilla policy gradient

* Image Source: Landscape image is CC0 1.0 public domain

Data is only used once• High variance of rewards• Magnitude of update could be too large

J()Newweights

Newweights

0.4 ± 𝛿 0.3 ± 𝛿

https://www.maxpixel.net/Mountains-Valleys-Landscape-Hills-Grass-Green-699369

https://creativecommons.org/publicdomain/zero/1.0/deed.en


AWS DeepRacer Neural Network Architecture

Output - actionInput - state (image)


METHOD Supervised learning

HOW IT WORKS Expert driver controls a real world car, that has a camera. Save the images from the camera as inputs and corresponding driving actions (speed and steering angle) as outputs. Train a model.

RESULT Provide state(image) into model and receive driving action

RL vs. other approaches for robotic racing

METHOD Reinforcement learning

HOW IT WORKS Virtual agent repeatedly interacts with a simulated environment and logs experience (image, action, new state, reward). Experience is used to train a model, and new model is used to get more experience.

RESULT Provide state(image) into model and receive driving action



Lab 0 – AWS DeepRacer service resource creation

OBJECTIVE Setup your account resources to get you to the races!

TIME 5 min.

1. Find the lab content here:

https://github.com/aws-samples/aws-deepracer-workshops/

2. Navigate to:

Workshops/2019-AWSSummits-AWSDeepRacerService/Lab0_Create_resources


AWS Cloud

AWS DeepRacer

NAT gateway

VPC

AWS DeepRacer

Models

Simulation video

Metrics

AWS DeepRacer simulator architecture


AWS DeepRacer console diagram


Programming your own reward function


Track components

TRACK CENTER

TRACK WALL

TRACK SURFACE aka ON-TRACK

FIELD aka OFF-TRACK

TRACK BOUNDARIES


Coordinate system and track waypoints

OUTER BOUNDARY WAYPOINTS

TRACK CENTER WAYPOINTS

INNER BOUNDARY WAYPOINTS

X

YTRACK WIDTH

CAR DIRECTION


Action space


Hyper parameters control the training algorithm



AWS DeepRacer League, race for prizes and glory

The world’s first global, autonomous racing league



Submit your model now to race in the Virtual Circuit!


Lab 1 – AWS DeepRacer service

OBJECTIVE Build your first AWS DeepRacer RL model

TIME 50 min.

1. Find the lab content here:

https://github.com/aws-samples/aws-deepracer-workshops/

2. Navigate to: Workshops/2019-AWSSummits-AWSDeepRacerService/Lab1


AWS DeepRacer: Driven by reinforcement learning

Want to learn more?

Learn how to build a reinforcement learning model and find tips and tricks about how to tune those models to climb the League leaderboard in a digital training

course for reinforcement learning and AWS DeepRacer.

This 90-minute course is available at no cost, has 6 self-guided chapters, and will help you prepare to compete in the AWS DeepRacer League.




AWS DeepRacer car specifications

CAR 18th scale 4WD with monster truck chassis

CPU Intel Atom Processor

MEMORY 4 GB RAM

STORAGE 32 GB (expandable)

WI-FI 802.11ac

CAMERA 4 MP camera with MJPEG

DRIVE BATTERY 1000 mAh lithium polymer

COMPUTE BATTERY 13600 mAh USB-C

SENSORS Integrated accelerometer and gyroscope

PORTS 4x USB-A, 1x USB-C, 1x Micro-USB, 1x HDMI

SOFTWARE Ubuntu OS 16.04.3 LTS, Intel OpenVINO

toolkit, ROS Kinetic


ROS msg node

Stored file

ROS nodes

Web Server

Publisher

Model Optimizer

Video M-JPEG

Web ServerVideo

Inference Results

Autonomous Drive

Control Node

Optimized Model

Media engine

Camera

Model

Inference engine

Manual Drive

Navigation Node

Servo & Motor

AWS DeepRacer software architecture


Simulation-to-real domain transfer

SIM-to-REAL CHALLENGE

Train model using simulated images, but the race car using the images the car experiences in the real world

STRATEGIES

Environment control

Domain randomization

Modularity and abstraction

Thank you!



Documents

w o r k s h o p AWS DeepRacer - Amazon Web Services, Inc. · SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DeepRacer Amazon Web Service Japan