25
LEAP Algorithm Reinforcement Learning with Adaptive Partitioning Tsufit Cohen Eyal Radiano Supervised by: Andrey Bernstein

LEAP Algorithm Reinforcement Learning with Adaptive Partitioning Tsufit Cohen Eyal Radiano Supervised by: Andrey Bernstein

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

LEAP Algorithm Reinforcement Learning with Adaptive Partitioning

Tsufit CohenEyal RadianoSupervised by: Andrey Bernstein

Agenda

Intro Q-learning Leap Algorithm Simulation LEAP vs Q-Learn Conclusions

Intro

Reinforcement Learning– Learn optimal policy by trying– Reward for “Good” steps– Performance improvement

סוכן

Q-learn

Definitions:

Key specification :– Table representation– Q לכתוב את הנוסחא של– במאמר6ואת ההגדרות זה מעמוד – Q כדי שנוכל להסביר מה זה

Exploration policy: epsilon greedy אולי לפצל את זה לשני שקפים

LEAP Learning Entity (LE) Adaptive Partitioning

Key specifications : – Macro States– Multi Partitioning

(each partition is called LE)

– Pruning and Joining

Algorithm

Action Selection Incoherence Criterion JLE Generation Update Pruning Mechanism

Action Selection

, ,i ii L s

Q s a Q s a

1

22

1ˆ ,

ˆ ,i ij L s j

s as a

Action Selection ( Cont. )

'

, , ' , max ', 'a A

T s a s R s a Q s a

Incoherence Criterion

JLE Generation

Update

'

2

'

, , , max ', ' ,

, , , max ', ' ,

, , 1

i i i ia A

i i i ia A

i i

Q s a Q s a R s a Q s a Q s a

s a s a R s a Q s a s a

v s a v s a

Pruning Mechanism

Changes and Add-ons to the Algorithm

Change the order of pruning and updating Epsilon Greedy policy starts from 0.9 Boundary condition – Q=0 for End of game.

Implementation

Key Operation :– Finding Active LE List for a given state– Finding a macro state within a LE– Add/Remove JLE and/or macro state

Data Structures– Basic LE– JLE

inheritance

LE

Basic LEJLE

CList<macrostate> Macro_listInt* ID_arr_

Int order

CList<JLE>* Sons_lists_arr

General Data Structure Implementation

Basic LE array: Basic LE

1 Basic LE

2 Basic LE

3

pointer to JLEs list in

order 1)empty(

pointer to JLEs list in

order 2

pointer to JLEs list in

order 3

Basic LE 1 - magnification:macro list, Id array, orderSons list array

3D Grid World Implementation Example

Basic LE array:

Basic LE X Basic LE Y Basic LE Z

Sons list array:

0 1 2

JLE XY

JLE XZ

JLE XYZ

Sons list array:

0 1 2

JLE YZ

Sons list array:

0 1 2

Simulation 1 – 2D Grid World

Environment Properties: – Size: 20x20– Step cost: -1– Award: +2– Available Moves: Up, Down, Left, Right– Wall Bumping – No movement.– Award Taking – Start a new episode.– Basic LEs: X,Y

prize

Start point

חלוקה xלפי -

חלוקה yלפי -

0 2 4 6 8 10 12 14 16 18 20

-20

-18

-16

-14

-12

-10

-8

-6

-4

-2

0RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

LEFTLEFTLEFTLEFTLEFTLEFT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

UPUPUPUPUPUP

LEFTLEFTLEFTLEFTLEFTLEFT

LEFTLEFTLEFTLEFTLEFTLEFT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

UPUPUPUPUPUP

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

UPUPUPUPUPUP

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

UPUPUPUPUPUP

LEFTLEFTLEFTLEFTLEFTLEFT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

UPUPUPUPUPUP

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

LEFTLEFTLEFTLEFTLEFTLEFT

LEFTLEFTLEFTLEFTLEFTLEFT

DOWNDOWNDOWNDOWNDOWNDOWN

UPUPUPUPUPUP

LEFTLEFTLEFTLEFTLEFTLEFT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

UPUPUPUPUPUP

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

UPUPUPUPUPUP

LEFTLEFTLEFTLEFTLEFTLEFT

UPUPUPUPUPUP

DOWNDOWNDOWNDOWNDOWNDOWN

UPUPUPUPUPUP

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

UPUPUPUPUPUP

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

UPUPUPUPUPUP

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

UPUPUPUPUPUP

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

UPUPUPUPUPUP

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

UPUPUPUPUPUP

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

UPUPUPUPUPUP

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

UPUPUPUPUPUP

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

UPUPUPUPUPUP

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

Simulation 1 Results - Policystart

Prize

Results – Average Reward & refined macrostates count

0 500 1000 150020

40

60

80

100

120

140

160

Number of Trials ( x 50)

Num

ber

of M

acro

stat

es

Refined macrostates count Vs Number of Trials

0 100 200 300 400 500 600 700

-300

-250

-200

-150

-100

-50

Number of Trials ( x 50)

Ave

rage

Rew

ard

Average Reward Vs Number of Trials

600 650 700 750-60

-55

-50

-45

-40

-35

-30

-25

Number of Trials ( x 50)

Ave

rage

Rew

ard

Average Reward Vs Number of Trials

Simulation 2 – Grid Word with an obstacle

Environment Properties : – Size : 5x5– Step Cost: -1– Award: +2– Obstacle: -3

prize

start

Simulation 2 – Grid Word with an obstacle

Simulation 2 Results

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-5

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

DOWNDOWNDOWNDOWNDOWNDOWN

0 0.5 1 1.5 2 2.5

x 104

-35

-30

-25

-20

-15

-10

-5

Number of Trials ( x 50)

Ave

rage

Rew

ard

Average Reward Vs Number of Trials

• Note: the policy changes – Due to Epsilon

start

LEAP vs Q-Learn

0 100 200 300 400 500 600 700-800

-700

-600

-500

-400

-300

-200

-100

0

Number of Trials ( x 50)

Ave

rage

Rew

ard

Avergae Reward Vs Number of Trials

LEAP with multi partition

Regular Q learning

Conclusions

Memory reduction Complexity of implementation Deviation from optimal policy

Questions?