48
Lesson 35 Game Theory and Linear Programming Math 20 December 14, 2007 Announcements I Pset 12 due December 17 (last day of class) I Lecture notes and K&H on website I next OH Monday 1–2 (SC 323)

Lesson 35: Game Theory and Linear Programming

Embed Size (px)

DESCRIPTION

This connects two topics of the last few weeks. The optimal strategies to a matrix game turn out be solutions to linear programming problems. In fact, the strategies are the solutions to the primal and dual versions of the same problem!

Citation preview

Page 1: Lesson 35: Game Theory and Linear Programming

Lesson 35Game Theory and Linear Programming

Math 20

December 14, 2007

Announcements

I Pset 12 due December 17 (last day of class)

I Lecture notes and K&H on website

I next OH Monday 1–2 (SC 323)

Page 2: Lesson 35: Game Theory and Linear Programming

Outline

RecapDefinitionsExamplesFundamental TheoremGames we can solve so far

GT problems as LP problemsFrom the continuous to the discreteStandardizationRock/Paper/Scissors again

The row player’s LP problem

Page 3: Lesson 35: Game Theory and Linear Programming

DefinitionA zero-sum game is defined by a payoff matrix A, where aij

represents the payoff to the row player if R chooses option i and Cchooses option j .

I The row player chooses from the rows of the matrix, and thecolumn player from the columns.

I The payoff could be a negative number, representing a netgain for the column player.

Page 4: Lesson 35: Game Theory and Linear Programming

DefinitionA zero-sum game is defined by a payoff matrix A, where aij

represents the payoff to the row player if R chooses option i and Cchooses option j .

I The row player chooses from the rows of the matrix, and thecolumn player from the columns.

I The payoff could be a negative number, representing a netgain for the column player.

Page 5: Lesson 35: Game Theory and Linear Programming

DefinitionA strategy for a player consists of a probability vector representingthe portion of time each option is employed.

I We use a row vector p for the row player’s strategy, and acolumn vector q for the column player’s strategy.

I A pure strategy (select the same option every time) isrepresented by a standard basis vector ej or e′j . For instance,if R has three choices and C has five:

e′2 =

010

e4 =(0 0 0 1 0

)I A non-pure strategy is called mixed.

Page 6: Lesson 35: Game Theory and Linear Programming

DefinitionA strategy for a player consists of a probability vector representingthe portion of time each option is employed.

I We use a row vector p for the row player’s strategy, and acolumn vector q for the column player’s strategy.

I A pure strategy (select the same option every time) isrepresented by a standard basis vector ej or e′j . For instance,if R has three choices and C has five:

e′2 =

010

e4 =(0 0 0 1 0

)I A non-pure strategy is called mixed.

Page 7: Lesson 35: Game Theory and Linear Programming

DefinitionThe expected value of row and column strategies p and q is thescalar

E (p, q) =n∑

i ,j=1

piaijqj = pAq

Probabilistically, this is the amount the row player receives (or thecolumn player if it’s negative) if players employ these strategies.

Page 8: Lesson 35: Game Theory and Linear Programming

DefinitionThe expected value of row and column strategies p and q is thescalar

E (p, q) =n∑

i ,j=1

piaijqj = pAq

Probabilistically, this is the amount the row player receives (or thecolumn player if it’s negative) if players employ these strategies.

Page 9: Lesson 35: Game Theory and Linear Programming

Rock/Paper/Scissors

Example

What is the payoff matrix for Rock/Paper/Scissors?

SolutionThe payoff matrix is

A =

0 −1 11 0 −1−1 1 0

.

Page 10: Lesson 35: Game Theory and Linear Programming

Rock/Paper/Scissors

Example

What is the payoff matrix for Rock/Paper/Scissors?

SolutionThe payoff matrix is

A =

0 −1 11 0 −1−1 1 0

.

Page 11: Lesson 35: Game Theory and Linear Programming

Example

Consider a new game: players R and C each choose a number 1,2, or 3. If they choose the same thing, C pays R that amount. Ifthey choose differently, R pays C the amount that C has chosen.What is the payoff matrix?

Solution

A =

1 −2 −3−1 2 −3−1 −2 3

Page 12: Lesson 35: Game Theory and Linear Programming

Example

Consider a new game: players R and C each choose a number 1,2, or 3. If they choose the same thing, C pays R that amount. Ifthey choose differently, R pays C the amount that C has chosen.What is the payoff matrix?

Solution

A =

1 −2 −3−1 2 −3−1 −2 3

Page 13: Lesson 35: Game Theory and Linear Programming

Theorem (Fundamental Theorem of Matrix Games)

There exist optimal strategies p∗ for R and q∗ for C such that forall strategies p and q:

E (p∗, q) ≥ E (p∗, q∗) ≥ E (p, q∗)

E (p∗, q∗) is called the value v of the game.

Page 14: Lesson 35: Game Theory and Linear Programming

Theorem (Fundamental Theorem of Matrix Games)

There exist optimal strategies p∗ for R and q∗ for C such that forall strategies p and q:

E (p∗, q) ≥ E (p∗, q∗) ≥ E (p, q∗)

E (p∗, q∗) is called the value v of the game.

Page 15: Lesson 35: Game Theory and Linear Programming

Reflect on the inequality

E (p∗, q) ≥ E (p∗, q∗) ≥ E (p, q∗)

In other words,

I E (p∗, q) ≥ E (p∗, q∗): R can guarantee a lower bound onhis/her payoff

I E (p∗, q∗) ≥ E (p, q∗): C can guarantee an upper bound onhow much he/she loses

I This value could be negative in which case C has theadvantage

Page 16: Lesson 35: Game Theory and Linear Programming

Fundamental problem of zero-sum games

I Find the p∗ and q∗!I Last time we did these:

I Strictly-determined gamesI 2× 2 non-strictly-determined games

I The general case we’ll look at next.

Page 17: Lesson 35: Game Theory and Linear Programming

Pure Strategies are optimal in Strictly-Determined Games

TheoremLet A be a payoff matrix. If ars is a saddle point, then e′r is anoptimal strategy for R and es is an optimal strategy for C. Alsov = E (e′r , es) = ars .

Page 18: Lesson 35: Game Theory and Linear Programming

Optimal strategies in 2× 2 non-Strictly-Determined Games

Let A be a 2× 2 matrix with no saddle points. Then the optimalstrategies are

p =(a22 − a21

a11 − a12

)q =

a22 − a12

∆a11 − a21

where ∆ = a11 + a22 − a12 − a21. Also

v =|A|∆

Page 19: Lesson 35: Game Theory and Linear Programming

Outline

RecapDefinitionsExamplesFundamental TheoremGames we can solve so far

GT problems as LP problemsFrom the continuous to the discreteStandardizationRock/Paper/Scissors again

The row player’s LP problem

Page 20: Lesson 35: Game Theory and Linear Programming

This could get a little weird

This derivation is not something that needs to be memorized, butshould be understood at least once.

Page 21: Lesson 35: Game Theory and Linear Programming

Objectifying the problem

Let’s think about the problem from the column player’sperspective. If she chooses strategy q, and R knew it, he wouldchoose p to maximize the payoff pAq. Thus the column playerwants to minimize that quantity. That is, C ’s objective is realizedwhen the payoff is

E = minq

(max

ppAq.

)

This seems hard! Luckily, linearity, saves us.

Page 22: Lesson 35: Game Theory and Linear Programming

Objectifying the problem

Let’s think about the problem from the column player’sperspective. If she chooses strategy q, and R knew it, he wouldchoose p to maximize the payoff pAq. Thus the column playerwants to minimize that quantity. That is, C ’s objective is realizedwhen the payoff is

E = minq

(max

ppAq.

)This seems hard! Luckily, linearity, saves us.

Page 23: Lesson 35: Game Theory and Linear Programming

From the continuous to the discrete

LemmaRegardless of q, we have

maxp

pAq = max1≤i≤m

e′iAq

Here e′i is the probability vector represents the pure strategy ofgoing only with choice i .

The idea is that a weighted average of things is no bigger than thelargest of them. (Think about grades).

Page 24: Lesson 35: Game Theory and Linear Programming

From the continuous to the discrete

LemmaRegardless of q, we have

maxp

pAq = max1≤i≤m

e′iAq

Here e′i is the probability vector represents the pure strategy ofgoing only with choice i .

The idea is that a weighted average of things is no bigger than thelargest of them. (Think about grades).

Page 25: Lesson 35: Game Theory and Linear Programming

Proof of the lemma

Proof.We must have

maxp

pAq ≥ max1≤i≤m

e′iAq

(the maximum over a larger set must be at least as big). On theother hand, let q be C ’s strategy. Let the quantity on the right bemaximized when i = i0. Let p be any strategy for R. Notice thatp =

∑i pie

′i . So

E (p, q) = pAq =m∑

i=1

pie′iAq ≤

m∑i=1

pie′i0Aq

=

(m∑

i=1

pi

)e′i0Aq = e′i0Aq.

Thusmax

ppAq ≤ e′i0Aq.

Page 26: Lesson 35: Game Theory and Linear Programming

The next step is to introduce a new variable v representing thevalue of this inner maximization. Our objective is to minimize it.Saying it’s the maximum of all payoffs from pure strategies is thesame as saying

v ≥ e′iAq

for all i . So we finally have something that looks like an LPproblem! We want to choose q and v which minimize v subject tothe constraints

v ≥ e′iAq i = 1, 2, . . . m

qj ≥ 0 j = 1, 2, . . . nn∑

j=1

qj = 1

Page 27: Lesson 35: Game Theory and Linear Programming

Trouble with this formulation

I Simplex method with equalities?

I Not in standard form

Resolution:

I We may assume all aij ≥ 0, so v > 0

I Let xj =qj

v

Page 28: Lesson 35: Game Theory and Linear Programming

Since we know v > 0, we still have x ≥ 0. Now

n∑j=1

xj =1

v

n∑j=1

qj =1

v.

So our problem is now to choose x ≥ 0 which maximizes∑

j xj .The constraints now take the form

v ≥ e′iAq ⇐⇒ 1 ≥ e′iAx,

for all i . Another way to write this is

Ax ≤ 1,

where 1 is the vector consisting of all ones.

Page 29: Lesson 35: Game Theory and Linear Programming

Upshot

TheoremConsider a game with payoff matrix A, where each entry of A is

positive. The column player’s optimal strategy q isx

x1 + · · ·+ xn,

where x ≥ 0 satisfies the LP problem of maximizing x1 + · · ·+ xn

subject to the constraints Ax ≤ 1.

Page 30: Lesson 35: Game Theory and Linear Programming

Rock/Paper Scissors

The payoff matrix is

A =

0 −1 11 0 −1−1 1 0

.

We can add 2 to everything to make

A =

2 1 33 2 11 3 2

.

Page 31: Lesson 35: Game Theory and Linear Programming

Rock/Paper Scissors

The payoff matrix is

A =

0 −1 11 0 −1−1 1 0

.

We can add 2 to everything to make

A =

2 1 33 2 11 3 2

.

Page 32: Lesson 35: Game Theory and Linear Programming

Convert to LP

The problem is to maximize x1 + x2 + x3 subject to the constraints

2x1 + x2 + 3x3 ≤ 1

3x1 + 2x2 + x3 ≤ 1

x1 + 3x3 + 2x3 ≤ 1.

We introduce slack variables y1, y2, and y3, so the constraints nowbecome

2x1 + x2 + 3x3 + y1 = 1

3x1 + 2x2 + x3 + y2 = 1

x1 + 3x3 + 2x3 + y3 = 1.

Page 33: Lesson 35: Game Theory and Linear Programming

An easy initial basic solution is to let x = 0 and y = 1. The initialtableau is therefore

x1 x2 x3 y1 y2 y3 z valuey1 2 1 3 1 0 0 0 1y2 3 2 1 0 1 0 0 1y3 1 3 2 0 0 1 0 1z −1 −1 −1 0 0 0 1 0

Page 34: Lesson 35: Game Theory and Linear Programming

Which should be the entering variable? The coefficients in thebottom row are all the same, so let’s just pick one, x1. To find thedeparting variable, we look at the ratios 1

2 , 13 , and 1

1 . So y2 is thedeparting variable.We scale row 2 by 1

3 :

x1 x2 x3 y1 y2 y3 z valuey1 2 1 3 1 0 0 0 1y2 1 2/3 1/3 0 1/3 0 0 1/3

y3 1 3 2 0 0 1 0 1z −1 −1 −1 0 0 0 1 0

Page 35: Lesson 35: Game Theory and Linear Programming

Then we use row operations to zero out the rest of column one:

x1 x2 x3 y1 y2 y3 z valuey1 0 −1/3 7/3 1 −2/3 0 0 1/3

x1 1 2/3 1/3 0 1/3 0 0 1/3

y3 0 7/3 5/3 0 −1/3 1 0 2/3

z 0 −1/3 −2/3 0 1/3 0 1 1/3

Page 36: Lesson 35: Game Theory and Linear Programming

We can still improve this: x3 is the entering variable and y1 is thedeparting variable. The new tableau is

x1 x2 x3 y1 y2 y3 z valuex3 0 −1/7 1 3/7 −2/7 0 0 1/7

x1 1 5/7 0 −1/7 3/7 0 0 2/7

y3 0 18/7 0 −5/7 1/7 1 0 3/7

z 0 −3/7 0 2/7 1/7 0 1 3/7

Page 37: Lesson 35: Game Theory and Linear Programming

Finally, entering x2 and departing y3 gives

x1 x2 x3 y1 y2 y3 z valuex3 0 0 1 7/18 −5/18 1/18 0 1/6

x1 1 0 0 1/18 7/18 −5/18 0 1/6

x2 0 1 0 −5/18 1/18 7/18 0 1/6

z 0 0 0 1/6 1/6 1/6 1 1/2

Page 38: Lesson 35: Game Theory and Linear Programming

So the x variables have values x1 = 1/6, x2 = 1/6, x3 = 1/6.Furthermore z = x1 + x2 + x3 = 1/2, so v = 1/z = 2. This alsomeans that p1 = 1/3, p2 = 1/3, and p3 = 1/3. So the optimalstrategy is to do each thing the same number of times.

Page 39: Lesson 35: Game Theory and Linear Programming

Outline

RecapDefinitionsExamplesFundamental TheoremGames we can solve so far

GT problems as LP problemsFrom the continuous to the discreteStandardizationRock/Paper/Scissors again

The row player’s LP problem

Page 40: Lesson 35: Game Theory and Linear Programming

Now let’s think about the problem from the column player’sperspective. If he chooses strategy p, and C knew it, he wouldchoose p to minimize the payoff pAq. Thus the row player wantsto maximize that quantity. That is, R’s objective is realized whenthe payoff is

E = maxp

minq

pAq.

Page 41: Lesson 35: Game Theory and Linear Programming

LemmaRegardless of p, we have

minq

pAq = min1≤j≤n

pAej

Page 42: Lesson 35: Game Theory and Linear Programming

The next step is to introduce a new variable v representing thevalue of this inner minimization. Our objective is to maximize it.Saying it’s the minimum of all payoffs from pure strategies is thesame as saying

v ≤ pAej

for all j . Again, we have something that looks like an LP problem!We want to choose p and v which maximize v subject to theconstraints

v ≤ pAej j = 1, 2, . . . n

pi ≥ 0 i = 1, 2, . . . mm∑

i=1

pi = 1

Page 43: Lesson 35: Game Theory and Linear Programming

As before, we can standardize this by renaming

y =1

vp′

(this makes y a column vector). Then

m∑i=1

yi =1

v,

So maximizing v is the same as minimizing 1′y. Likewise, theequations of constraint become v ≤ (vy′)Aej for all j , or y′A ≥ 1′,or (taking transposes) A′y ≥ 1. If all the entries of A are positive,we may assume that v is positive, so the constraints p ≥ 0 aresatisfied if and only if y ≥ 0.

Page 44: Lesson 35: Game Theory and Linear Programming

Upshot

TheoremConsider a game with payoff matrix A, where each entry of A is

positive. The row player’s optimal strategy p isy′

y1 + · · ·+ yn,

where y ≥ 0 satisfies the LP problem of minimizingy1 + · · ·+ yn = 1′y subject to the constraints A′y ≥ 1.

Page 45: Lesson 35: Game Theory and Linear Programming

The big idea

The big observation is this:

TheoremThe row player’s LP problem is the dual of the column player’s LPproblem.

Page 46: Lesson 35: Game Theory and Linear Programming

The final tableau in the Rock/Paper/Scissors LP problem was this:

x1 x2 x3 y1 y2 y3 z valuex3 0 0 1 7/18 −5/18 1/18 0 1/6

x1 1 0 0 1/18 7/18 −5/18 0 1/6

x2 0 1 0 −5/18 1/18 7/18 0 1/6

z 0 0 0 1/6 1/6 1/6 1 1/2

The entries in the objective row below the slack variables are thesolutions to the dual problem! In this case, we have the samevalues, which means R has the same strategy as C . This reflectsthe symmetry of the original game.

Page 47: Lesson 35: Game Theory and Linear Programming

Example

Consider the game: players R and C each choose a number 1, 2,or 3. If they choose the same thing, C pays R that amount. Ifthey choose differently, R pays C the amount that C has chosen.What should each do?

Answer.

Choice R C

1 54.5% 22.7%

2 27.3% 36.3%

3 18.2% 40.1%

The expected payoff is 2.71 to the column player.

Page 48: Lesson 35: Game Theory and Linear Programming

Example

Consider the game: players R and C each choose a number 1, 2,or 3. If they choose the same thing, C pays R that amount. Ifthey choose differently, R pays C the amount that C has chosen.What should each do?

Answer.

Choice R C

1 54.5% 22.7%

2 27.3% 36.3%

3 18.2% 40.1%

The expected payoff is 2.71 to the column player.