Upload
vannga
View
212
Download
0
Embed Size (px)
Citation preview
EE807 Assignment solution 1
Problem 1
(a)
(b)
𝑥𝑥4 = 5
𝑥𝑥3 + 𝑢𝑢3 = 𝑥𝑥4 = 5
Apply DPA
k = N
𝐽𝐽4(𝑥𝑥4) = 𝐽𝐽4(5) = 0
k = 3
𝑢𝑢3 = 𝜇𝜇3(𝑥𝑥3) = 5 − 𝑥𝑥3
𝐽𝐽3(𝑥𝑥3) = min−𝑥𝑥3≤𝑢𝑢3≤5−𝑥𝑥3
𝑥𝑥32 + 𝑢𝑢32 + 𝐽𝐽4(5) = 𝑥𝑥32 + (5 − 𝑥𝑥32)
k = 2
𝐽𝐽2(𝑥𝑥2) = min−𝑥𝑥2≤𝑢𝑢2≤5−𝑥𝑥2
𝑥𝑥22 + 𝑢𝑢22 + 𝐽𝐽3(𝑥𝑥2 + 𝑢𝑢2)
𝒖𝒖 𝟐𝟐 = -5 -4 -3 -2 -1 0 1 2 3 4 5
𝒙𝒙𝟐𝟐 = 0 - - - - - 25 18 17 22 33 50
1 - - - - 27 18 15 18 27 42 -
2 - - - 33 22 17 18 25 38 -
3 - - 43 30 23 22 27 38 - - -
4 - 57 42 33 30 33 42 - - - -
5 75 58 47 42 43 50 - - - - -
k = 1
k = 0
𝐽𝐽0(𝑥𝑥0) = min−𝑥𝑥0≤𝑢𝑢0≤5−𝑥𝑥0
𝑥𝑥02 + 𝑢𝑢02 + 𝐽𝐽1(𝑥𝑥0 + 𝑢𝑢0) = 25 + 𝑢𝑢02 + 𝐽𝐽1(5 + 𝑢𝑢0)
Optimal control: 𝜇𝜇0(𝑥𝑥0) = −3
Optimal cost: 𝐽𝐽3(𝑥𝑥3) = 54
System evolutions :
𝑥𝑥0 = 5 → 𝑢𝑢0 = -3
𝑥𝑥2 𝐽𝐽𝑥𝑥2(𝑥𝑥2) 𝜇𝜇2(𝑥𝑥2)
0 17 2
1 15 1
2 17 0
3 22 0
4 30 -1
5 42 -2
𝒖𝒖 𝟐𝟐 = -5 -4 -3 -2 -1 0 1 2 3 4 5
𝒙𝒙𝟐𝟐 = 0 - - - - - 17 16 21 31 46 67
1 - - - - 19 16 19 27 40 59 -
2 - - - 25 20 21 27 38 55 - -
3 - - 35 28 27 31 40 55 - - -
4 - 49 40 37 39 46 59 - - - -
5 67 56 51 51 56 67 - - - - -
𝑥𝑥1 𝐽𝐽𝑥𝑥1(𝑥𝑥1) 𝜇𝜇1(𝑥𝑥1)
0 16 1
1 16 0
2 20 -1
3 27 -1
4 37 -2
5 51 -2 or -3
𝑢𝑢0 = −5 -4 -3 -2 -1 0
𝑥𝑥0 = 5 66 57 54 56 63 76
𝑥𝑥1 = 2 → 𝑢𝑢1 = -1
𝑥𝑥2 = 1 → 𝑢𝑢2 = 1
𝑥𝑥3 = 2 → 𝑢𝑢3 = 3
(c)
Problem 2
Problem 3
Let k denotes the time that the repairman just about to _x the k + 1th site.
The total number of stages is just the number of sites, which are needed to be repaired. In particular ,we have N = n.
The state at stage k is the closed interval [i, j], in which the sites have been repaired, and the position the repairman currently in the interval. In particular, we have Sk = [i, j], where i ≤ j, and S0 = [s, s], Sn = [1, n]. yk denotes the position the repairman in the interval, i.e., yk = 0 means the repairman at i, yk = 1 means the repairman at j. y0 can be set to either 0 or 1.
The control xk in this problem is the decision we would make at time k. In particular, we have xk = 0 means the repairman will go left, and xk = 1 means the repairman will go right.
The parameters ωk of this problem are deterministic, which are the waiting cost ci
and traveling cost tij . The transition function Sk+1 = [i-(1-xk), j +xk], and yk+1 = xk. In particular, we assume
that S1 = S0, and y1 = y0 means that the repairman will always firstly repair the site where he starts. (This is just an assumption.)
The feasible set Uk(Sk) is given by
Uk(Sk) =
⎩⎨
⎧{1, 0}, Sk = [i, j], 1 < i ≤ j < n,{1}, Sk = [1, j], 1 < j < n,{0}, Sk = [𝑖𝑖, n], 1 < i < n,
ø, Sk = [1, n]
The cost function
Λ𝑘𝑘(𝑆𝑆𝑘𝑘,𝑦𝑦𝑘𝑘 , 𝑥𝑥𝑘𝑘 ,𝜔𝜔𝑘𝑘) = ∑ 𝑐𝑐𝑘𝑘 + 𝑡𝑡𝑖𝑖(1−𝑦𝑦𝑘𝑘)+𝑗𝑗𝑦𝑦𝑘𝑘,(1−𝑥𝑥𝑘𝑘)(𝑖𝑖−(1−𝑥𝑥𝑘𝑘))+𝑥𝑥𝑘𝑘(𝑗𝑗+𝑥𝑥𝑘𝑘)𝑘𝑘∉[i−(1−𝑥𝑥𝑘𝑘),j+𝑥𝑥𝑘𝑘]
In particular, we have Λ0 = ∑ 𝑐𝑐𝑘𝑘 ,𝑘𝑘≠𝑠𝑠 and Λ0 = 0
After identifying all the elements of this problem, we have this problem can be formulated as following dynamic programming problem
𝑞𝑞𝑛𝑛(𝑆𝑆0,𝑦𝑦0) = min𝜋𝜋
��Λ𝑘𝑘(𝑆𝑆𝑘𝑘 ,𝑦𝑦𝑘𝑘 , 𝑥𝑥𝑘𝑘 ,𝜔𝜔𝑘𝑘) + Λ𝑛𝑛(𝑆𝑆𝑛𝑛 ,𝑦𝑦𝑛𝑛 , 𝑥𝑥𝑛𝑛 ,𝜔𝜔𝑛𝑛)𝑛𝑛−1
𝑘𝑘=0
�
= min𝑥𝑥0∈𝑈𝑈0(𝑆𝑆0)
[Λ0(𝑆𝑆0,𝑦𝑦0, 𝑥𝑥0,𝜔𝜔0) + 𝑞𝑞𝑛𝑛−1(Г(𝑆𝑆0,𝑦𝑦0 , 𝑥𝑥0,𝜔𝜔0))]
Problem 4
Let µ𝐴𝐴, (µ𝐵𝐵) be the stationary control applied when in town A, (B). The control µ ∈ {𝑆𝑆𝑡𝑡𝑆𝑆𝑦𝑦,𝐶𝐶ℎ𝑆𝑆𝑎𝑎𝑎𝑎𝑎𝑎},
We can obtain the optimal stationary control by solving Bellman’s equation for each of the four
possible policies. Let µ = (µ𝐴𝐴, µ𝐵𝐵).
For µ1 = (𝑆𝑆, 𝑆𝑆):
𝐽𝐽𝐴𝐴1 = 𝑟𝑟𝐴𝐴 + 𝛼𝛼𝐽𝐽𝐴𝐴1
𝐽𝐽𝐵𝐵1 = 𝑟𝑟𝐵𝐵 + 𝛼𝛼𝐽𝐽𝐵𝐵1
So,
𝐽𝐽𝐴𝐴1 = 𝑟𝑟𝐴𝐴1−𝛼𝛼
; 𝐽𝐽𝐵𝐵1 = 𝑟𝑟𝐵𝐵1−𝛼𝛼
For µ2 = (𝑆𝑆,𝐶𝐶):
𝐽𝐽𝐴𝐴2 = 𝑟𝑟𝐴𝐴1−𝛼𝛼
𝐽𝐽𝐵𝐵2 = 𝑟𝑟𝐴𝐴 − 𝑐𝑐 + 𝛼𝛼𝐽𝐽𝐴𝐴2 = −c + 𝑟𝑟𝐴𝐴1−𝛼𝛼
For µ3 = (𝐶𝐶, 𝑆𝑆):
𝐽𝐽𝐴𝐴3 = −c + 𝑟𝑟𝐵𝐵1−𝛼𝛼
; 𝐽𝐽𝐵𝐵3 = 𝑟𝑟𝐵𝐵1−𝛼𝛼
For µ4 = (𝐶𝐶,𝐶𝐶):
𝐽𝐽𝐴𝐴4 = 𝑟𝑟𝐴𝐴 − 𝑐𝑐 + 𝛼𝛼𝐽𝐽𝐴𝐴4
𝐽𝐽𝐵𝐵4 = 𝑟𝑟𝐵𝐵 − 𝑐𝑐 + 𝛼𝛼𝐽𝐽𝐵𝐵4
Thus,
𝐽𝐽𝐴𝐴4 = 𝑟𝑟𝐴𝐴+𝛼𝛼𝑟𝑟𝐵𝐵−(1+𝛼𝛼)𝑐𝑐1−𝛼𝛼2
; 𝐽𝐽𝐵𝐵4 = 𝑟𝑟𝐵𝐵+𝛼𝛼𝑟𝑟𝐴𝐴−(1+𝛼𝛼)𝑐𝑐1−𝛼𝛼2
As α → 0, 𝐽𝐽1̅ = �𝐽𝐽𝐴𝐴1
𝐽𝐽𝐵𝐵1� = �
𝑟𝑟𝐴𝐴𝑟𝑟𝐵𝐵� is clearly optimal. Thus, the optimal policy is for the salesmen to stay
in the town he starts in. As α → 1, we have:
(1 − α)𝐽𝐽1̅ = �𝑟𝑟𝐴𝐴𝑟𝑟𝐵𝐵�, (1 − α)𝐽𝐽2̅ = �
𝑟𝑟𝐴𝐴𝑟𝑟𝐴𝐴�
(1 − α)𝐽𝐽3̅ = �𝑟𝑟𝐵𝐵𝑟𝑟𝐵𝐵�, (1 − α)𝐽𝐽4̅ = �
𝑟𝑟𝐴𝐴 + 𝑟𝑟𝐵𝐵 − 𝑐𝑐𝑟𝑟𝐴𝐴 + 𝑟𝑟𝐵𝐵 − 𝑐𝑐�
Since c > 𝑟𝑟𝐴𝐴 > 𝑟𝑟𝐵𝐵 , µ2 is optimal. That is, the salesman should move to A and remain there.
(b) For c=3, 𝑟𝑟𝐴𝐴 =2, 𝑟𝑟𝐵𝐵 = 1, and α =.9:
J1 = �2010�, J2 = �20
17�, J3 = � 710�, J4 = �−15.26
−14.74�
Thus, the optimal policy is to move into A and remain there.
Problem 5
(a) State
U : a person has an umbrella
N : a person does not have an umbrella
Action
T : a person takes an umbrella to go to the other place
L : a person leaves an umbrella
J(N) = p(W+αJ(U)) + α(1-p)J(U)
= pW + αJ(U)
J(U) = pαJ(U) + (1-p)min[V + αJ(U), αJ(N)]
= min[(1-p)V+αJ(U), pαJ(U) + (1-p) αJ(N)]
= min[(1-p)V+αJ(U), pαJ(U) + pα(1-p)W + α2
(1 − 𝑝𝑝)J(U)]
(b) for action T
J(U) = 1−𝑝𝑝1−𝛼𝛼
𝑉𝑉
for action L
J(U) = 𝛼𝛼𝑝𝑝(1−𝑝𝑝)𝑊𝑊
1−𝑝𝑝𝛼𝛼−𝛼𝛼2(1−𝑝𝑝)
when
1−𝑝𝑝1−𝛼𝛼
𝑉𝑉 > 𝛼𝛼𝑝𝑝(1−𝑝𝑝)𝑊𝑊1−𝑝𝑝𝛼𝛼−𝛼𝛼2(1−𝑝𝑝)
p < (1+𝛼𝛼)𝑉𝑉(𝑉𝑉+𝑊𝑊)𝛼𝛼
the optimal policy is to leave the umbrella. Otherwise, the optimal policy is to take the
umbrella.