EE807 Assignment solution 1 - netsys.kaist.ac.krnetsys.kaist.ac.kr/EE807/assignment/solution1.pdf · Let k denotes the time that the repairman just about to _x the k + 1th site. The

EE807 Assignment solution 1

Problem 1

(a)

(b)

𝑥𝑥4 = 5

𝑥𝑥3 + 𝑢𝑢3 = 𝑥𝑥4 = 5

Apply DPA

k = N

𝐽𝐽4(𝑥𝑥4) = 𝐽𝐽4(5) = 0

k = 3

𝑢𝑢3 = 𝜇𝜇3(𝑥𝑥3) = 5 − 𝑥𝑥3

𝐽𝐽3(𝑥𝑥3) = min−𝑥𝑥3≤𝑢𝑢3≤5−𝑥𝑥3

𝑥𝑥32 + 𝑢𝑢32 + 𝐽𝐽4(5) = 𝑥𝑥32 + (5 − 𝑥𝑥32)

k = 2


𝑥𝑥22 + 𝑢𝑢22 + 𝐽𝐽3(𝑥𝑥2 + 𝑢𝑢2)

𝒖𝒖 𝟐𝟐 = -5 -4 -3 -2 -1 0 1 2 3 4 5

𝒙𝒙𝟐𝟐 = 0 - - - - - 25 18 17 22 33 50

1 - - - - 27 18 15 18 27 42 -

2 - - - 33 22 17 18 25 38 -

3 - - 43 30 23 22 27 38 - - -

4 - 57 42 33 30 33 42 - - - -

5 75 58 47 42 43 50 - - - - -

k = 1

k = 0


𝑥𝑥02 + 𝑢𝑢02 + 𝐽𝐽1(𝑥𝑥0 + 𝑢𝑢0) = 25 + 𝑢𝑢02 + 𝐽𝐽1(5 + 𝑢𝑢0)

Optimal control: 𝜇𝜇0(𝑥𝑥0) = −3

Optimal cost: 𝐽𝐽3(𝑥𝑥3) = 54

System evolutions :

𝑥𝑥0 = 5 → 𝑢𝑢0 = -3

𝑥𝑥2 𝐽𝐽𝑥𝑥2(𝑥𝑥2) 𝜇𝜇2(𝑥𝑥2)

0 17 2

1 15 1

2 17 0

3 22 0

4 30 -1

5 42 -2

𝒖𝒖 𝟐𝟐 = -5 -4 -3 -2 -1 0 1 2 3 4 5

𝒙𝒙𝟐𝟐 = 0 - - - - - 17 16 21 31 46 67

1 - - - - 19 16 19 27 40 59 -

2 - - - 25 20 21 27 38 55 - -

3 - - 35 28 27 31 40 55 - - -

4 - 49 40 37 39 46 59 - - - -

5 67 56 51 51 56 67 - - - - -

𝑥𝑥1 𝐽𝐽𝑥𝑥1(𝑥𝑥1) 𝜇𝜇1(𝑥𝑥1)

0 16 1

1 16 0

2 20 -1

3 27 -1

4 37 -2

5 51 -2 or -3

𝑢𝑢0 = −5 -4 -3 -2 -1 0

𝑥𝑥0 = 5 66 57 54 56 63 76

𝑥𝑥1 = 2 → 𝑢𝑢1 = -1

𝑥𝑥2 = 1 → 𝑢𝑢2 = 1

𝑥𝑥3 = 2 → 𝑢𝑢3 = 3

(c)

Problem 2

Problem 3

Let k denotes the time that the repairman just about to _x the k + 1th site.

The total number of stages is just the number of sites, which are needed to be repaired. In particular ,we have N = n.

The state at stage k is the closed interval [i, j], in which the sites have been repaired, and the position the repairman currently in the interval. In particular, we have Sk = [i, j], where i ≤ j, and S0 = [s, s], Sn = [1, n]. yk denotes the position the repairman in the interval, i.e., yk = 0 means the repairman at i, yk = 1 means the repairman at j. y0 can be set to either 0 or 1.

The control xk in this problem is the decision we would make at time k. In particular, we have xk = 0 means the repairman will go left, and xk = 1 means the repairman will go right.

The parameters ωk of this problem are deterministic, which are the waiting cost ci

and traveling cost tij . The transition function Sk+1 = [i-(1-xk), j +xk], and yk+1 = xk. In particular, we assume

that S1 = S0, and y1 = y0 means that the repairman will always firstly repair the site where he starts. (This is just an assumption.)

The feasible set Uk(Sk) is given by

Uk(Sk) =

⎩⎨

⎧{1, 0}, Sk = [i, j], 1 < i ≤ j < n,{1}, Sk = [1, j], 1 < j < n,{0}, Sk = [𝑖𝑖, n], 1 < i < n,

ø, Sk = [1, n]

The cost function

Λ𝑘𝑘(𝑆𝑆𝑘𝑘,𝑦𝑦𝑘𝑘 , 𝑥𝑥𝑘𝑘 ,𝜔𝜔𝑘𝑘) = ∑ 𝑐𝑐𝑘𝑘 + 𝑡𝑡𝑖𝑖(1−𝑦𝑦𝑘𝑘)+𝑗𝑗𝑦𝑦𝑘𝑘,(1−𝑥𝑥𝑘𝑘)(𝑖𝑖−(1−𝑥𝑥𝑘𝑘))+𝑥𝑥𝑘𝑘(𝑗𝑗+𝑥𝑥𝑘𝑘)𝑘𝑘∉[i−(1−𝑥𝑥𝑘𝑘),j+𝑥𝑥𝑘𝑘]

In particular, we have Λ0 = ∑ 𝑐𝑐𝑘𝑘 ,𝑘𝑘≠𝑠𝑠 and Λ0 = 0

After identifying all the elements of this problem, we have this problem can be formulated as following dynamic programming problem

𝑞𝑞𝑛𝑛(𝑆𝑆0,𝑦𝑦0) = min𝜋𝜋

��Λ𝑘𝑘(𝑆𝑆𝑘𝑘 ,𝑦𝑦𝑘𝑘 , 𝑥𝑥𝑘𝑘 ,𝜔𝜔𝑘𝑘) + Λ𝑛𝑛(𝑆𝑆𝑛𝑛 ,𝑦𝑦𝑛𝑛 , 𝑥𝑥𝑛𝑛 ,𝜔𝜔𝑛𝑛)𝑛𝑛−1

𝑘𝑘=0

�

= min𝑥𝑥0∈𝑈𝑈0(𝑆𝑆0)

[Λ0(𝑆𝑆0,𝑦𝑦0, 𝑥𝑥0,𝜔𝜔0) + 𝑞𝑞𝑛𝑛−1(Г(𝑆𝑆0,𝑦𝑦0 , 𝑥𝑥0,𝜔𝜔0))]

Problem 4

Let µ𝐴𝐴, (µ𝐵𝐵) be the stationary control applied when in town A, (B). The control µ ∈ {𝑆𝑆𝑡𝑡𝑆𝑆𝑦𝑦,𝐶𝐶ℎ𝑆𝑆𝑎𝑎𝑎𝑎𝑎𝑎},

We can obtain the optimal stationary control by solving Bellman’s equation for each of the four

possible policies. Let µ = (µ𝐴𝐴, µ𝐵𝐵).

For µ1 = (𝑆𝑆, 𝑆𝑆):

𝐽𝐽𝐴𝐴1 = 𝑟𝑟𝐴𝐴 + 𝛼𝛼𝐽𝐽𝐴𝐴1

𝐽𝐽𝐵𝐵1 = 𝑟𝑟𝐵𝐵 + 𝛼𝛼𝐽𝐽𝐵𝐵1

So,

𝐽𝐽𝐴𝐴1 = 𝑟𝑟𝐴𝐴1−𝛼𝛼

; 𝐽𝐽𝐵𝐵1 = 𝑟𝑟𝐵𝐵1−𝛼𝛼

For µ2 = (𝑆𝑆,𝐶𝐶):

𝐽𝐽𝐴𝐴2 = 𝑟𝑟𝐴𝐴1−𝛼𝛼

𝐽𝐽𝐵𝐵2 = 𝑟𝑟𝐴𝐴 − 𝑐𝑐 + 𝛼𝛼𝐽𝐽𝐴𝐴2 = −c + 𝑟𝑟𝐴𝐴1−𝛼𝛼

For µ3 = (𝐶𝐶, 𝑆𝑆):

𝐽𝐽𝐴𝐴3 = −c + 𝑟𝑟𝐵𝐵1−𝛼𝛼

; 𝐽𝐽𝐵𝐵3 = 𝑟𝑟𝐵𝐵1−𝛼𝛼

For µ4 = (𝐶𝐶,𝐶𝐶):

𝐽𝐽𝐴𝐴4 = 𝑟𝑟𝐴𝐴 − 𝑐𝑐 + 𝛼𝛼𝐽𝐽𝐴𝐴4

𝐽𝐽𝐵𝐵4 = 𝑟𝑟𝐵𝐵 − 𝑐𝑐 + 𝛼𝛼𝐽𝐽𝐵𝐵4

Thus,

𝐽𝐽𝐴𝐴4 = 𝑟𝑟𝐴𝐴+𝛼𝛼𝑟𝑟𝐵𝐵−(1+𝛼𝛼)𝑐𝑐1−𝛼𝛼2

; 𝐽𝐽𝐵𝐵4 = 𝑟𝑟𝐵𝐵+𝛼𝛼𝑟𝑟𝐴𝐴−(1+𝛼𝛼)𝑐𝑐1−𝛼𝛼2

As α → 0, 𝐽𝐽1̅ = �𝐽𝐽𝐴𝐴1

𝐽𝐽𝐵𝐵1� = �

𝑟𝑟𝐴𝐴𝑟𝑟𝐵𝐵� is clearly optimal. Thus, the optimal policy is for the salesmen to stay

in the town he starts in. As α → 1, we have:

(1 − α)𝐽𝐽1̅ = �𝑟𝑟𝐴𝐴𝑟𝑟𝐵𝐵�, (1 − α)𝐽𝐽2̅ = �

𝑟𝑟𝐴𝐴𝑟𝑟𝐴𝐴�

(1 − α)𝐽𝐽3̅ = �𝑟𝑟𝐵𝐵𝑟𝑟𝐵𝐵�, (1 − α)𝐽𝐽4̅ = �

𝑟𝑟𝐴𝐴 + 𝑟𝑟𝐵𝐵 − 𝑐𝑐𝑟𝑟𝐴𝐴 + 𝑟𝑟𝐵𝐵 − 𝑐𝑐�

Since c > 𝑟𝑟𝐴𝐴 > 𝑟𝑟𝐵𝐵 , µ2 is optimal. That is, the salesman should move to A and remain there.

(b) For c=3, 𝑟𝑟𝐴𝐴 =2, 𝑟𝑟𝐵𝐵 = 1, and α =.9:

J1 = �2010�, J2 = �20

17�, J3 = � 710�, J4 = �−15.26

−14.74�

Thus, the optimal policy is to move into A and remain there.

Problem 5

(a) State

U : a person has an umbrella

N : a person does not have an umbrella

Action

T : a person takes an umbrella to go to the other place

L : a person leaves an umbrella

J(N) = p(W+αJ(U)) + α(1-p)J(U)

= pW + αJ(U)

J(U) = pαJ(U) + (1-p)min[V + αJ(U), αJ(N)]

= min[(1-p)V+αJ(U), pαJ(U) + (1-p) αJ(N)]

= min[(1-p)V+αJ(U), pαJ(U) + pα(1-p)W + α2

(1 − 𝑝𝑝)J(U)]

(b) for action T

J(U) = 1−𝑝𝑝1−𝛼𝛼

𝑉𝑉

for action L

J(U) = 𝛼𝛼𝑝𝑝(1−𝑝𝑝)𝑊𝑊

1−𝑝𝑝𝛼𝛼−𝛼𝛼2(1−𝑝𝑝)

when

1−𝑝𝑝1−𝛼𝛼

𝑉𝑉 > 𝛼𝛼𝑝𝑝(1−𝑝𝑝)𝑊𝑊1−𝑝𝑝𝛼𝛼−𝛼𝛼2(1−𝑝𝑝)

p < (1+𝛼𝛼)𝑉𝑉(𝑉𝑉+𝑊𝑊)𝛼𝛼

the optimal policy is to leave the umbrella. Otherwise, the optimal policy is to take the

umbrella.

Documents

EE807 Assignment solution 1 - netsys.kaist.ac.krnetsys.kaist.ac.kr/EE807/assignment/solution1.pdf · Let k denotes the time that the repairman just about to _x the k + 1th site. The