Learning objects

Learning objects

Vivian van der Burgt & Floris Heijne | B1.1

Content

Front page………………………………………………………p. 1

Introduction……………………………………………………..p. 3

The problem…………………………………………………….p. 4

Program…………………………………………………………p. 8

Introduction

Reinforcement learning tends to be pretty complex. Every time an action is performed, it’s not clear whether the result is positive or negative. Because this doesn’t solve anything, two simple rules are added in order to make this system work. One is that the last action performed before a negative result is reached, should be avoided at all time.The second one is that situations should be avoided in which there is no chance on a positive result. With these two rules reinforcement learning is made a lot easier.

The problem that I am going to implement to one of the given learning algorithms has to deal with the game ‘Hide and Seek’. Everyone knows that game: You have one seeker and multiple players who are hiding at a certain spot. As a seeker you don’t know where everyone is hiding so you don’t know where to look. To solve this problem I choose reinforcement learning algorithm. Because with trial and error we can get data and with that we can set up a strategy.For that you have to take indicated states (locations where you can hide) and an indicated base:

The problem

Hide and seek with indicated hidden spots.

Hide and seek with indicated states and actions.

The system observes interactions with the environment and the rewards that this interaction generates. The reward is when the seeker has found someone at a state. (Hiding spot)if the seeker takes an action the outcome can be rewarding or non-rewarding. Also the action brings the seeker to a new state from which there maybe again the possibility to obtain reward. If an action brings to the outcome of non-rewarding the next play round the action will be avoided.

To make this algorithm you have to define the coordinates of the rewarding state (hiding spots).Not every action leads to a state. You first have to take more actions before you arrive at a state. I take a grid of 6x6

Coordinate base: (1,4)•Coordinate R1: (4,6)•Coordinate R2: (6,4)•Coordinate R3: (3,1)

At the beginning of the game there is no memory, no date so there is no strategy. The most simple solution is to begin at the first state. If there is no rewarding you will go to the second state. You keep doing that until you have a reward outcome. If the outcome of reward was at state 3 the next game you will start looking at state 3 because that is according to the data the optimal reward. After several times you have for example these outcomes of rewarding:R1 = 0,15R2 = 0,25R3 = 0,60

Program

The optimal reward is the highest at state 3. So the highest change to get rewarded is at state 3. So you will look first at state 3 at the beginning of the next game.

But to make this you need to define the steps of the actions. If you begin at the coordinate (1,4) you have to take (x +2), (y + 2), (x + 1) to get at the coordinate of R1 (4,6).

What you also need to do is to give the rewarding spots an optimal reward value. For example: you take the numbers from 1 till 100. R1 you give the number from 1 till 10, R2 from 11 till 40 and R3 from 40 till 100. A random function takes randomly a number between 1 and 100 every game. The algorithm learns then after multiple times playing the game that R3 has the optimal rewarding value.

Documents

Learning objects