Uday V. Shanbhag Lecture 7
Introduction
Consider the following stochastic program:
minx∈X
f(x), f(x) , E[f(x, ξ)],
where X ⊆ Rn is a closed and convex set, ξ is a random vector in Rd with
distribution P. Furthermore ξ ∈ Ξ and f : X × Ξ → R. Unless stated
otherwise, the expectation is assumed to be well-defined and finite valued
for all x ∈ X.
Suppose, we have access to N realizations of the random vector ξ, denoted
by ξ1, . . . , ξN . Then, an estimator for the expected value f(x) can be
obtained by solving the following sample average approximation (SAA)
Stochastic Optimization 1
Uday V. Shanbhag Lecture 7
problem:
minx∈X
fN(x), where fN(x) ,1
N
N∑i=1
f(x, ξj).
Note that fN(x) can be viewed as the Expectation taken with respect to the
empirical measure associated with a probability mass function
1N, . . . , 1
N
.
By the law of large numbers (LLN), under suitable regularity conditions
fN(x) converges to f(x) pointwise with probability one as N → ∞.Moreover, by the classical LLN, this convergence holds if the sample is
independent and identically distributed.
Furthermore, we expect that E[fN(x)] = f(x); the estimator fN(x) is
unbiased.
It is also natural to expect that the optimal value/solutions of (SAA)
converge to their true counterpart/set as N →∞.
Note that we may view fN(x) as being defined on a common probability
space (Ω,F ,P). When considering iid samples, one avenue is to set
Stochastic Optimization 2
Uday V. Shanbhag Lecture 7
Ω := Ξ∞, where Ξ∞ is the set of sequences
Ξ∞ = ξ1, . . . , ξi∈Ξ,i∈N .
Furthermore, this set is equipped with the product of the corresponding
probability measures.
Finally, suppose f(x, ξ) is a Caratheodory function (continuous in x and
measurable in ξ). It follows that fN(x) is also a Caratheodory function.
Stochastic Optimization 3
Uday V. Shanbhag Lecture 7
Consistency of SAA estimators
Proposition 1 Let f : X → R and fN : X → R be a sequence of
deterministic real-valued functions. Then the following two properties are
equivalent: (i) For an x ∈ X and any sequence xN ⊂ X converging to
x, fN(xN) converges to f(x); and (ii) the function f(x) is continuous on
X and fN(•) converges to f(•) uniformly∗ on any compact subset of X.
Proof:
(a) ((i)⇒ (ii)): Suppose that property (i) holds. Consider a point x ∈ Xand a sequence xN ⊂ X such that xN → x and a scalar ε > 0.
By considering a sequence x1, x1, . . ., we have that fN(x1)→ f(x1)
and there exists an N1 such that |fN1(x1) − f(x1)| < 12ε. Similarly,
∗Point-wise and uniform convergence: Let D be a subset of Rn and let fn be a sequence of functions defined on D.We say that fn converges pointwise on D if limn→∞ fn(x) exists for each point x ∈ D. Furthermore, fn convergesuniformly to f if given any ε > 0, there exists an N = N(ε) such that |fn(x)− f(x)| < ε for n > N(ε) and for every x ∈ D.
Stochastic Optimization 4
Uday V. Shanbhag Lecture 7
there exists an N2 > N1 such that |fN2(x2) − f(x2)| < 12ε. We may
now construct a sequence x′N defined as follows:
x′i = x1, i = 1, . . . , N
x′i = x2, i = N + 1, . . . ,2N...
It follows that x′N → x and therefore |fN(x′N)− f(x)| < 12ε for N large
enough. We also have that |fNk(x′Nk) − f(xk)| < 12ε, and hence we
have that for k large enough
|f(xk)−f(x)| ≤ |f(xk)−fNk(x′Nk
)|+|fNk(x′Nk
)−f(x)| < 12ε+1
2ε = ε.
This shows that f(xk)→ f(x) and hence f(•) is continuous at x.
Next, let C be a compact subset of X and proceed by contradiction.
Stochastic Optimization 5
Uday V. Shanbhag Lecture 7
Suppose that fN(•) does not converge to f(•) uniformly on C. It
follows that there exists a sequence xN ⊂ C such and an ε > 0
such that |fN(xN) − f(xN)| ≥ ε for all N large enough. Since C is
a compact set, we can assume that xN converges to an x ∈ C. It
follows that
|fN(xN)− f(xN) ≤ |fN(xN)− f(x)|+ |f(x)− f(xN)|.
Of these, the first term tends to zero by the hypothesis (i) and the
second term tends to zero by the continuity of f(•). Since both terms
are less than 12ε for sufficiently large N , a contradiction follows.
(b) ((i) ⇒ (ii)): Suppose that property (ii) holds. Consider a sequence
xN ⊂ X such that xN → X ∈ X. Assume that this sequence is
contained in a compact subset of X. Consider |fN(xN)− f(x)|. Then
Stochastic Optimization 6
Uday V. Shanbhag Lecture 7
we have the following:
|fN(xN)− f(x)| ≤ |fN(xN)− f(xN)|+ |f(xN)− f(x)|.
Of these the first term tends to zero given the uniform convergence of
fN to f while the second term tends to zero based on the continuity of
f . It follows that fN(xN) converges to f(x).
We now consider the consistency of the estimators θN and SN .
Definition 1 (Consistency of estimators) An estimator θN of a parame-
ter θ is said to be consistent if θN → θ w.p.1. as N →∞.
We begin by considering the consistency of the estimator of the optimal
value θN . For a fixed x ∈ X, we have that
θN ≤ fN(x).
Stochastic Optimization 7
Uday V. Shanbhag Lecture 7
Furthermore, if the pointwise LLN holds, we have that
lim supN→∞
θN ≤ limN→∞
fN(x) = f(x), w.p.1.
Proposition 2 Suppose that fN(x) converges to f(x) w.p.1. as N →∞,
uniformly on X. Then θN → θ∗ w.p.1. as N →∞.
Proof: First, we note that since θN ≤ fN(x) for all x,N , we have that
almost surely
lim supN→∞
θN ≤ limN→∞
fN(x∗) = f(x∗) = θ∗,
where the last equality follows from the convergence of fN to f uniformly
on x.
Stochastic Optimization 8
Uday V. Shanbhag Lecture 7
Furthermore, we may show that lim infN→∞ θN ≤ θ∗. Uniform convergence
of fN on X implies that for all x ∈ X and for sufficiently large N , we
have that fN(x) ≥ f(x) − ε. Since θN is attained on X, for sufficiently
large N , θN ≥ θ∗−ε. Since ε is arbitrary, we have that lim infN→∞ θN ≥ θ∗
in an almost sure sense.
It follows that θN → θ∗ a.s. as N →∞.
Consistency of the estimators of the solution set requires somewhat stronger
conditions. Note that D(A,B) is the deviation between sets A and B and
is defined as
D(A,B) , supx∈A
dist(x,B), where dist(x,A) , infx′∈A‖x− x′‖.
Theorem 3 Suppose that there exists a compact set C ⊂ Rn such that
the following hold: (i) the set of the optimal solutions of the true problem
Stochastic Optimization 9
Uday V. Shanbhag Lecture 7
is nonempty and contained in C; (ii) the function f(x) is finite valued
and continuous on C; (iii) fN(x) converges to f(x) w.p.1. as N → ∞uniformly in x ∈ C; and (iv) w.p.1, for N large enough, the set SN is
nonempty and SN ⊂ C. Then θN → θ∗ and D(SN , S) → 0 w.p.1. as
N →∞.
Proof: (i) and (ii) imply that both the true and the sample-average problem
can be restricted to the sets X ∩ C, implying that we can assume without
loss of generality that X is compact.
The earlier assertion implies that θN → θ∗ w.p.1. as N → ∞. It remains
to show that D(SN(ω), S) → 0 as N → ∞ for every ω ∈ Ω such that
θN → θ∗ and (ii) and (iii) hold.
We proceed by contradiction and drop the dependence on ω for notational
convenience (but the result is proved for every ω).Suppose D(SN , S) 6→ 0.
Since X is compact, we may consider a convergent subsequence and assume
Stochastic Optimization 10
Uday V. Shanbhag Lecture 7
that there exists an xN ∈ SN such that dist(xN , S) ≥ ε for some ε > 0
and that xN → x∗ ∈ X. Consequently, x∗ 6∈ S and f(x∗) > θ∗.
Further
fN(xN)− f(x∗) = [fN(xN)− f(xN)] + [f(xN)− f(x∗)].
Of these the first term tends to zero based on (iii) while the second term
tends to zero by the continuity of f . As a result, θN → f∗ > θ∗, which is a
contradiction. The required result follows.• By Prop. 1, (ii) and (iii) is equivalent to the condition that for any
sequence xN ⊂ C converging to a point x, it follows that fN(xN)→f(x) w.p.1.• (iv) holds in the above theorem if the feasible set X is closed and the
functions fN(x) are lower semicontinuous, and for some α > θ∗, the
level sets x ∈ X : fN(x) ≤ α
Stochastic Optimization 11
Uday V. Shanbhag Lecture 7
are uniformly bounded w.p.1. This is often called an inf-compactness
condition.
• Conditions for guaranteeing the uniform convergence of fN(x) to f(x)
can be provided. For instance, if f(x, ξ) is continuous at x for almost
every ξ ∈ Ξ, f(x, ξ) is dominated by an integrable function for x ∈ Xand the sample is iid.
If the problem is convex, we can relax the required regularity conditions.
We may allow f(x, ξ) to be an extended real-valued function and define
the following:
f(x, ξ) := f(x, ξ)+1lX(x), f(x) = f(x)+1lX(x), fN(x) := fN(x)+1lX(x).
Theorem 4 Suppose that (i) The function f(x, ξ) is random lsc, (ii) for
almost every ξ ∈ Ξ, the function f(x, ξ) is convex in x, (iii) X is closed
Stochastic Optimization 12
Uday V. Shanbhag Lecture 7
and convex, (iv) the function f(x) is lsc and there exists a x ∈ X such that
f(x) < +∞ for all x ∈ nbhd(x), (v) The set S of optimal solutions of the
true problem is nonempty and bounded, and (vi) the LLN holds pointwise.
Then θN → θ∗ and D(SN , S∗)→ 0 as N →∞ w.p.1.
Some observations:
• lsc† of f(•) follows from lsc of f(., ξ) provided that f(x, .) is bounded
from below by an integrable function
• It was assume that the LLN holds pointwise for all x ∈ Rn. It suffices
to assume that this holds for all x in a neighborhood of S.
†Recall that a function f(x) is lower semicontinuous at x0 if lim infx→x0 f(x) ≥ f(x0).
Stochastic Optimization 13
Uday V. Shanbhag Lecture 7
Randomness of feasible set X
Up to this point, we have assumed that the feasible set X of the SAA
problem is fixed and deterministic. Suppose we instead assume that XN is
a subset of Rn and is random.
Theorem 5 Suppose that in addition to the assumptions of Theorem 3, the
following hold:
(a) If xN ∈ XN and xN → x w.p.1., then x ∈ X.
(b) For some point x ∈ S, there exists a sequence xN ∈ XN such that xN → x
w.p.1.
Then θN → θ∗ and D(SN , S)→ 0 w.p.1. as N →∞.
Stochastic Optimization 14
Uday V. Shanbhag Lecture 7
Proof: Consider an xN ∈ SN . By compactness, we may assume that the
sequence xN converges to an x∗ ∈ Rn. Since SN ⊂ XN , we have that
xN ∈ XN and it follows that x∗ ∈ X.
First, note that θN = fN(xN) since xN ∈ SN . Furthermore from Prop. 1,
θN = fN(xN) tends w.p.1. to f(x∗). But f(x∗) ≥ θ∗ since x∗ is an
arbitrary point in X and it follows that
lim infN→∞
θN ≥ θ∗, w.p.1.
On the other hand, we have from (b), that there exists a sequence xN ∈ XN
converging to a point x ∈ S w.p.1. Consider an x∗ in X and there exists a
sequence xN that converges to x∗ such that xN ∈ XN . From Prop. 1,
we have that f(xN) → f(x∗) = θ∗ w.p.1. as N → ∞. But, since xN is
Stochastic Optimization 15
Uday V. Shanbhag Lecture 7
not necessarily a minimizer of fN over XN , f(xN) ≥ θN and it follows that
θN ≤ fN(xN)→ f(x∗) = θ∗, w.p.1.
Note that the second part of the above assertion follows from Prop. 1.
Hence lim supN→∞ θN ≤ θ∗. As a result, θN → θ∗ w.p.1.
The remainder of the proof follows in a fashion analogous to Theorem 3.
Stochastic Optimization 16
Uday V. Shanbhag Lecture 7
Asymptotics of the SAA optimal value
Consistency of the estimators is important in that it allows for claiming the
estimation error tends to zero as the sample size grows to infinity. It does
not, however, provide much indication of the error for a given sample.
Suppose a sample average estimator fN(x) of f(x) is unbiased with variance
σ2(x)/N := var(f(x, ξ)) assumed to be finite. Then by the central limit
theorem (CLT), we have
√N(fN(x)− f(x)
) D−→Yx,
where Yx ∼ N (0, σ2(x)). In effect, for sufficiently large N , fN(x) has
an approximately asymptotic normal distribution with a mean f(x) and a
variance σ2(x).
This allows for constructing an approximate 100(1 − α)% confidence
Stochastic Optimization 17
Uday V. Shanbhag Lecture 7
interval for f(x) given by
[fN(x)−
zα/2σ(x)√N
, fN(x) +zα/2σ(x)√
N
],
where zα/2 := Φ−1(1− α/2) and the sample variance is given by
σ2(x) :=1
N − 1
N∑j=1
[f(x, ξj)− fN(x)
]2.
Consider now the optimal value θN(x) of the SAA problem. For any x′ ∈ X,
we have that
fN(x) ≥ infx∈X
fN(x).
Stochastic Optimization 18
Uday V. Shanbhag Lecture 7
It follows that
E[fN(x)
]≥ E
[infx∈X
fN(x)].
Taking infima over X, we obtain
infx∈X
E[fN(x)
]≥ E
[infx∈X
fN(x)].
Since E[fN(x)] = f(x),, we have that
θ∗ = infx∈X
f(x) = infx∈X
E[fN(x)
]≥ E
[infx∈X
fN(x)]
= E[θN].
As a consequence, it can be said that θN is a downward biased estimator of
θ∗. The next proposition shows that this bias decreases monotonically with
sample size N .
Stochastic Optimization 19
Uday V. Shanbhag Lecture 7
Proposition 6 Let θN be the optimal value of the SAA problem and suppose
the sample is iid. Then we have the following:
(a) θ∗ ≥ E[θN] (downward biased)
(b) E[θN] ≤ E[θN+1] ≤ θ∗ for any N ∈ N.
Proof:
(a) See discussion prior to this proposition. (b) Recall that fN(x) can be
Stochastic Optimization 20
Uday V. Shanbhag Lecture 7
written as
fN+1(x) =1
N + 1
N+1∑i=1
[1
NNf(x, ξi)
]
=1
N(N + 1)
∑j
f(x, ξj)− f(x, ξ1)
+ . . .+
∑j
f(x, ξj)− f(x, ξN)
=
1
N + 1
N+1∑i=1
1
N
∑j 6=i
f(x, ξj)
.
Stochastic Optimization 21
Uday V. Shanbhag Lecture 7
By some elementary analysis, we see that
E[θN+1(x)] = E[
infx∈X
fN+1(x)]
= E
infx∈X
1
N + 1
N+1∑i=1
1
N
∑j 6=i
f(x, ξj)
≥ E
1
N + 1
N+1∑i=1
infx∈X
1
N
∑j 6=i
f(x, ξj)
.Since the samples are iid, we have that
E
1
N + 1
N+1∑i=1
infx∈X
1
N
∑j 6=i
f(x, ξj)
=1
N + 1
N+1∑i=1
E
infx∈X
1
N
∑j 6=i
f(x, ξj)
=
1
N + 1
N+1∑i=1
E[θN(x)] = E[θN].
Stochastic Optimization 22
Uday V. Shanbhag Lecture 7
First order asymptotics on the SAA optimal value
We begin with the following assumptions on f(x, ξ).
Assumption 1
(A1) For some point x ∈ X, the expectation E[f(x, ξ)2] <∞
(A2) There exists a measurable function C : Ξ → R+ such that E[C(ξ)]2] is
finite and
|f(x, ξ)− f(x′, ξ)| ≤ C(ξ)‖x− x′‖,
for all x, x′ ∈ X and a.e. ξ ∈ Ξ.
The above assumption allows one to claim that f(x) is Lipschitz continuous
with constant E[C(ξ)], based on using Jensen’s inequality and the convexity
Stochastic Optimization 24
Uday V. Shanbhag Lecture 7
of the norm function:
‖f(x)− f(x′)‖ = ‖E[f(x, ξ)− f(x′, ξ)]‖≤ E[‖f(x, ξ)− f(x′, ξ)‖]≤ E[C(ξ)]‖x− x′‖.
Consequently, if X is compact then the set of minimizers over X is
nonempty.
Let Yx be the random variables defined earlier and be denoted by Y (x).
Then by the multivariate CLT, we have that for any finite set x1, . . . , xm ⊂X, the random vector Y (x1), . . . , Y (xm) has a multivariate normal dis-
tribution with mean zero and a covariance matrix identical to that of
(f(x1, ξ), . . . , f(xm, ξ)).
Stochastic Optimization 25
Uday V. Shanbhag Lecture 7
Then by the functional central limit theorem, we have that from (A1), (A2),
and the compactness of the set,
√N(fN − f)
D−→ Y,
where Y is a random element of C(X)‡
Theorem 7 Let θN be the optimal value of the SAA problem. Suppose
that the sample is iid and assumptions (A1) and (A2) are satisfied. Then‡C(X) represents the space of continuous functions equipped with a sup-norm. A random element of C(X) is a map
Y : Ω→ C(X) from a probability space (Ω,F ,P) into C(X) which is a measurable function with respect to the Borel sigmaalgebra if C(X); Y (x) = Y (x, ω) can be viewed as a random map.
Stochastic Optimization 26
Uday V. Shanbhag Lecture 7
the following holds§:
θN = infx∈S
fN(x) + op(√N)
√N(θN − θ∗
) D−→ infx∈S
Y (x).
where Y (x) is a random function of x drawn from Yx. Recall that
√N[fN(x)− f(x)
] D−→Yx.
Moreover, if S is a singleton given by x, then we have that
√N(θN − θ∗
) D−→N (0, σ2(x)).
§Recall that r(h) is said to be op(h) if r(h)/‖h‖p → 0 as h→ 0.
Stochastic Optimization 27