View
213
Download
1
Category
Preview:
Citation preview
1
UM Stratego
Colin SchepersDaan VeltmanEnno RuijtersLeon Gerritsen
Niek den TeulingYannick Thimister
2
Introduction Yannick The game of Stratego Daan Evaluation Function Leon Monte Carlo Colin Genetic Algorithm Enno Opponent modeling and strategy Niek Conclusion
Yannick
Content
3
The game of Stratego
Board of 10x10
Setup field 4x10
4
The game of Stratego
B Bombs 1 Marshall 2 General 3 Colonels 4 Majors 5 Captains
6 Lieutenants 7 Sergeants 8 Miner 9 Scout S Spy F Flag
5
The game of Stratego
WinFlag captureUnmovable pieces
DrawUnmovable piecesMaximum moves
6
Starting Positions
Flag placed Bombs placed Remaining pieces placed randomly
7
Starting Positions
Distance to Freedom Being bombed in Partial obstruction Adjacency
Flag defence Startup Pieces
8
Starting Positions
Distance to Freedom
9
Starting Positions
Startup Pieces
10
Sub-functions of the evaluation function: Material value Information value Near enemy piece value Near flag value Progressive bonus value First-move penalty
Evaluation Function
11
Evaluation Function
How it works: All the sub-functions return a value These values are then weighted and added to
each other The higher the total added value, the better
that move is for the player
12
Evaluation Function
Material Value: Used for comparing the two players' board
strengths Each piece type has a value Total value of the opponent's board is
subtracted from the player's board value Positive value means strong player board Negative value means weak player board
13
Evaluation Function
Information value: Stimulates the collection of opponent information
and the keeping of personal piece information Each piece type has a certain information value All the values from each side are summed up and
then substracted from each other A marshall being discovered is worse than a
scout being discovered
14
Evaluation Function
Near enemy piece value Checks if a moveable piece can or cannot
defeat a piece next to it If piece can be defeated, return positive
score If not, return a negative one If piece unknown, return 0
15
Evaluation Function Near flag value
Stimulates the defence of own flag and the attacking of enemy's flag
Constructs array with possible enemy flag locations
If enemy near own flag, return negative number If own piece near possible enemy flag, return
positive number
16
Evaluation Function
Progressive bonus value Stimulates the advancement of pieces
towards enemy lines Returns a positive value if piece moves
forward Negative if backward
17
Evaluation Function
First-move valueKeeps pieces from giving away informationKeeps the number of unmoved pieces high
18
Monte Carlo A subset of all possible moves is played
No strategy or weights used Evaluation value received after every move
At the end a comparison of evaluation values determines the best move
A depth limit is used so the tree doesn't grow to big and the algorithm will end at some point
19
Monte Carlo
Advantages:
Simple implementation Can be changed quickly Easy observation of behavior Good documentation Good for partial information situations
20
Monte Carlo
Disadvantages:
Generally not smart Dependent on the evaluation function Computationally slow
Tree grows very fast
21
Monte Carlo Experiments
MC against lower-depth MC
Player Wins Losses Draw
MC 28 59 49
MC-LD 59 28 49
22
Monte Carlo Experiments
MC against no-depth MC
Player Wins Losses Draw
MC 15 2 12
MC-ND 2 15 12
23
Monte Carlo Experiments
MC against deeper-depth but narrower MC
Player Wins Losses Draw
MC 5 2 11
MC-DDN 2 5 11
24
Monte Carlo Experiments
MC against narrower MC
Player Wins Losses Draw
MC 62 18 85
MC-N 18 62 85
25
Genetic Algorithm
Evolve weights of the terms in the evaluation functions
AI uses standard expectiminimax search tree Evolution strategies (evolution paremeters are
themselves evolved)
26
Genetic Algorithm
Genome:
Mutation:
G= σ,α,w1,. .. ,wn
σ n=σ n−1⋅eN 0,τ
α n=α n−1+α n⋅N 0,σ w i,n= w i,n−1w i,n−1⋅N 0,σ
27
Genetic Algorithm
Crossover: σ and α of parents average weights:
Averaged if Else randomly chosen from parents
1α<ratio<α
28
Genetic Algorithm
Fitness function: Win bonus Number of own pieces left Number of turns spent
29
Genetic Algorithm
Reference AI: Monte Carlo AI Self-selecting reference genome
Select average genome from each generation
Pick winner between this genome and previous reference
30
Hill climbing
The GA takes too long to train Hill climbing is faster
31
Opponent modeling
Observing moves Ruling out pieces Stronger pieces are moved towards you Weaker pieces are moved away
32
Opponent modeling
No knowledge about enemy pieces at the start Updating the probabilities
Update the probability of the moving piece Update probabilities of all other pieces
33
Monte Carlo Experiments
MC against MC with opponent modeling using a database of Human versus human games
Player Wins Losses Draw
MC 39 44 58
MC-OM 44 39 58
34
Monte Carlo Experiments
MC against MC with opponent modeling using a database of MC versus MC games
Player Wins Losses Draw
MC
MC-OM
35
Strategy
Split the game up into phases Exploration phase
Until 25% of enemy pieces are identified Elimination phase
Until 70% of enemy pieces are killed End-game phase
Alter the evaluation function
36
Conclusion
Both AIs are very slow The genetic AI takes too long to train
In case of Stratego, tweaking a few weights may not be an optimal way to create an intelligent player
Recommended