Variance error, interpolation and experiment design

Automatica 49 (2013) 1117–1125

Contents lists available at SciVerse ScienceDirect

Automatica

journal homepage: www.elsevier.com/locate/automatica

Variance error, interpolation and experiment design✩

Kaushik Mahata 1

School of Electrical Engineering and Computer Science, University of Newcastle, Callaghan, NSW 2308, Australia

a r t i c l e i n f o

Article history:Received 16 November 2011Received in revised form2 October 2012Accepted 31 October 2012Available online 1 March 2013

Keywords:System identificationVariance errorExperiment designNevanlinna–Pick interpolation

a b s t r a c t

We investigate how the variance error associated with the prediction error identification is related to thepower spectral densities of the input and the additive noise at the output. Let Φ(eiω) be the ratio of theinput power spectral density (PSD) to the output-noise PSD. We characterize the set of all functions Φ

for which the variance error remains constant. This analysis results a minimal, finite-dimensional, affineparameterization of the variance error. This parameterization connects our analysis with the theory ofNevanlinna–Pick interpolation. It is shown that the set of all Φ for which the variance error remainsconstant can be characterized by the solutions of a Nevanlinna–Pick interpolation problem. This insighthas interesting consequences in optimal input design, where it is possible to use some recent tools inanalytic interpolation theory to tune shape of the input PSD to suit certain needs.

© 2013 Elsevier Ltd. All rights reserved.

1. Introduction

This paper aims to provide some insight on the large sampleasymptotic accuracy of the prediction error method (PEM). Weconsider the open-loop identification of a discrete-time, linear,time-invariant model

yt = G(z, θ)ut + H(z, ϑ)et (1)

from N samples of input {ut}Nt=1, and output {yt}Nt=1. In (1), G(z, θ)

is the transfer function from ut to yt , which is parameterized by anunknown parameter vector θ. The term H(z, ϑ)et in (1) representsan additive noise term in innovations form, where H(z, ϑ) is amonic transfer function parameterized by an unknown parameterϑ, and et is the innovations sequence.

In the sequel we use θ0 and ϑ0 to denote the true values of θand ϑ, respectively. The power spectral density (PSD) of the inputis denoted byΦu(eiω). The PSD of the additive noise yt −G(z, θ0)utis denoted by Φv(eiω), and is given by Φv(eiω) = |H(eiω, ϑ0)|

2σ 2.The ratio

Φ(eiω) =Φu(eiω)

Φv(eiω)

will play a crucial role in the sequel. Let θN be the PEM estimate ofθ0 based onN input–output data samples.When themodel order n

✩ The material in this paper was partially presented at the 18th IFAC WorldCongress, August 28–September 2, 2011,Milan, Italy. This paperwas recommendedfor publication in revised form by Associate Editor Wei Xing Zheng under thedirection of Editor Torsten Söderström.

E-mail address: [email protected] Tel.: +61 2 492 16422; fax: +61 2 492 16993.

0005-1098/$ – see front matter© 2013 Elsevier Ltd. All rights reserved.doi:10.1016/j.automatica.2013.01.021

associatedwithG is large, then the asymptotic normalized varianceerror associated with θN takes a simple form (Ljung, 1985):

limn→∞

limN→∞

NnVar{G(eiω, θN)} =

1Φ(eiω)

. (2)

Expression (2) holds for a large class of ‘shift-invariant’ modelstructures including FIR, ARX, ARMAX, output error and Box–Jenkins models. Although (2) is a nice expression, it may not bea good indicator of the variance error for a finite model order n,and several authors, e.g. Gevers, Ljung, and van den Hof (2001),Ninness and Hjalmarsson (2004, 2005) and Xie and Ljung (2001),have addressed this issue, and several different elegant expressionsfor the variance error have been given. In this paper we examinethe variance error expression from a slightly different perspective.We wish to characterize the class of functions Φ which lead to thesame asymptotic variance error function E. We analyze the map-ping from Φ to E, and derive some interesting properties thereof.

The large sample, normalized covariance matrix of θN satisfies

limN→∞

NE{(θN − θ0) (θN − θ0)ᵀ} = J−1, (3)

where

J :=12π

π

−π

dω G1(eiω)Φ(eiω)[G1(eiω)]∗ (4)

is the information matrix, see Ljung (1999). Note that A∗ denotesthe conjugate-transpose of a matrix A, and

G1(eiω) :=∂G(eiω, θ0)

∂θ.

http://dx.doi.org/10.1016/j.automatica.2013.01.021

http://www.elsevier.com/locate/automatica

http://www.elsevier.com/locate/automatica

http://crossmark.dyndns.org/dialog/?doi=10.1016/j.automatica.2013.01.021&domain=pdf

mailto:[email protected]

http://dx.doi.org/10.1016/j.automatica.2013.01.021

1118 K. Mahata / Automatica 49 (2013) 1117–1125

We show that the map M from Φ to J is neither injective, nor sur-jective, and the kernel of M can be used to characterize the set ofΦ which lead to the same J and the variance error E. This analy-sis yields a new, finite dimensional parameterization of J, whichis minimal. It is possible to use this parameterization in many ex-periment design problems where an affine parameterization of Jis necessary (see e.g. Jansson and Hjalmarsson (2005), and refer-ences therein). Using this parameterization we show that the setof Φ resulting in the same variance error is essentially the solu-tion set of a certainNevanlinna–Pick interpolation problem. Subse-quently,we investigate the consequences of our analysis in an openloop experiment design problem. It turns out that the solution setsof the underlying optimization problems associated with severalpopular experiment design methods are in fact identical to that ofcertain Nevanlinna–Pick interpolation problems. In this regard, itis envisaged that several new approaches for Nevanlinna–Pick in-terpolation (Byrnes, Georgiou, & Lindquist, 2001; Ferrante, Pavon,& Ramponi, 2008; Ferrante, Ramponi, & Ticozzi, 2011; Georgiou &Lindquist, 2003; Pavon & Ferrante, 2005, 2006; Ramponi, Ferrante,& Pavon, 2009) may be quite useful in experiment design.

The paper is organized as follows. In Section 2 we explain theparameterization of G used in this sequel, and derive an expres-sion for G1. This expression plays an important role the subse-quent analysis. In Section 3 we explore the properties of the mapM, and derive a minimal affine parameterization of J. In Section 4we derive the properties of the variance error, and establish therelevance of Nevanlinna–Pick interpolation in the current context.In Section 5 we review some basic results of Nevanlinna–Pick in-terpolation theory, and use them to characterize the set of all Φ

which result in the same variance error. In Section 6 we showhow the theory developed in the previous sections help in under-standing the underlying structures ofmany optimization problemsappearing in the experiment design literature. Throughout the pa-per we borrow heavily from the recent literature on analytic in-terpolation theory (Byrnes et al., 2001; Ferrante et al., 2008, 2011;Georgiou, 1999, 2001, 2002a,b, 2006; Georgiou & Lindquist, 2003,2008; Pavon & Ferrante, 2005, 2006; Ramponi et al., 2009). Sincethe paper targeted for the System Identification research commu-nity, some well-known results in interpolation theory have beenpresented in some detail.

2. Rational parameterization

A rational parameterization ofG iswidely used in system identi-fication literature. A rational G admits a state space representation

G(z, θ) = d1 + cᵀ1(zI − A1)

−1b1, (5)

where (A1, b1) is in controllable canonical form, i.e.

A1 =

−a1 · · · −an−1 −an1 · · · 0 0...

. . ....

...0 · · · 1 0

, b1 =

10...0

, (6)

and

θ = [a1 · · · an cᵀ1 d1]ᵀ. (7)

Note that we can impose the structures (5) and (6) without anyloss of generality. Throughout the paper we assume that themodelstructure (5)–(6) is rich enough to encompass any true underlyingdynamics. Next, we compute the derivative of G with respect to θ.It is straightforward that

∂G(z, θ)/∂d1 = 1, (8)

∂G(z, θ)/∂c1 = (zI − A1)−1b1. (9)

Let ek be the k-th row of the identity matrix. Then it can be verifiedthat

∂G(z, θ)/∂ak = −cᵀ1(zI − A1)

−1b1ek(zI − A1)−1b1

= −ek(zI − A1)−1b1c

ᵀ1(zI − A1)

−1b1.

Consequently, by denoting a = [a1 · · · an]ᵀ, we have

∂G(z, θ)/∂a = −(zI − A1)−1b1c

ᵀ1(zI − A1)

−1b1. (10)

Combining (7)–(10) we get

∂G(z, θ)∂θ

=

−(zI − A1)−1b1c

ᵀ1(zI − A1)

−1b1

(zI − A1)−1b1

1

. (11)

Now define for θ = θ0 that

A =

A1 −b1cᵀ1 0n×1

0n×n A1 b101×n 01×n 0

, b =

0n×10n×11

. (12)

Then after a few steps of algebra we can verify

G1(z) = (I − Az−1)−1b. (13)

We note the following fact by passing, which is needed repeatedlyin the sequel.

Proposition 1. If (A1, c1) is observable for θ = θ0, then (A, b) iscontrollable. Consequently, A is a circular matrix (a circular matrix isa matrix whose characteristic polynomial is the same as its minimalpolynomial).

Proof. Throughout the proof we assume θ = θ0. Let c = [ cᵀ1

01×(n+1)]ᵀ. Then from (12) and (13) it is readily verified that

cᵀ(zI − A)−1b = z−1[cᵀ

1(zI − A1)−1b1]

2. (14)

Note that (A1, b1) is controllable. Now if (A1, c1) is observable,then cᵀ

1(zI− A1)−1b1 must be of order n, and thus cᵀ(zI− A)−1b is

of order 2n + 1. Since we can find a c for which cᵀ(zI − A)−1b is oforder 2n + 1, we conclude that (A, b) is controllable, i.e.

C = [b Ab · · · A2nb] (15)

is a full rankmatrix. This implies thatA is circular. That is because ifAwas not a circular matrix, then we could find scalars µ0, . . . , µl,for some l < 2n such that µ0I + µ1A + · · · + µlAl

= 0, implying asingular C. �

From here on we assume that (A1, c1) is observable for θ = θ0,and thus, (A, b) is controllable.

3. Set of admissible information matrices

The content of this section follows directly from Georgiou’swork in Georgiou (2001), once we recognize J as the covariancematrix of the output of the filter (I−A/z)−1b fed by some station-ary stochastic process with a power spectral density Φ . However,instead of just stating the results, we provide some details in orderto emphasize the underlying mathematical structure. This insightwill be helpful later in the paper.

The space of square integrable, real-valued, even functions over[−π, π] is denoted by G. In particular, the PSD of any real valued,stationary stochastic process resides in G. Every Φ ∈ G admits aFourier series expansion

Φ(eiω) = φ0 + 2∞k=1

φk cos(ωk), (16)

K. Mahata / Automatica 49 (2013) 1117–1125 1119

and the set of functions ηk(ω) := cos(ωk), k = 0, 1, 2, . . . consti-tute a complete basis of G.

The linear map M : G → R(2n+1)×(2n+1) is defined such that

M(Φ) =12π

π

−π

dω (I − Ae−iω)−1b

× bᵀ(I − Aᵀeiω)−1 Φ(eiω). (17)

Every element in the range of M is a real valued symmetric matrixof size (2n + 1) × (2n + 1). From (4), (13) and (17) we have

J = M(Φ). (18)

Thus, J ∈ range(M). The domain of M is of infinite dimension, butthe range of M is of finite dimension. Hence M is not an injective(one-to-one) map. The following result shows M is not surjective(onto) either.

Lemma 1. The rank of M is 2n + 1. In particular, the matrices

Rk := AkS + S[Aᵀ]k, k = 0, 1, . . . , 2n, (19)

form a basis of range(M), where S is the solution to the Lyapunovequation

S = ASAᵀ+ bbᵀ. (20)

Proof. This result is implicit in Georgiou’s work in Georgiou(2001). Here we provide a proof for better exposition. We start bystudying the effect of M on the basis ηk(ω) = cos(ωk), k = 0, 1,2, . . . of G. It is well-known that

12π

π

−π

dω (I − Ae−iω)−1bbᵀ(I − Aᵀeiω)−1 eiωk

=

AkS, k ≥ 0,S(Aᵀ)k, k < 0.

Thus, using 2 cos(ωk) = eiωk+ e−iωk we have

M(ηk) =AkS + S(Aᵀ)k

2. (21)

However, A is a (2n + 1) × (2n + 1) matrix. Hence for ev-ery k ≥ 2n + 1 we can express Ak as a linear combinationof I,A,A2, . . . ,A2n. Hence we conclude that every element inrange(M) can be expressed as a linear combination of the matri-ces Rk, k = 0, 1, 2, . . . , 2n. It remains to show that these matricesform a linearly independent set. Suppose,

2nk=0

wkRk = 0

for somew = [w0 w1 · · · w2n]ᵀ. Then using (19) and (20) it follows

that

0 =

2nk=0

wk(Rk − ARkAᵀ)b

=

2nk=0

wk{Akbbᵀ+ bbᵀ

[Aᵀ]kb}

= Cwbᵀb + bwᵀCᵀb= [(bᵀb)I + bbᵀ

]Cw. (22)

Now, (bᵀb)I + bbᵀ is positive definite, and C is nonsingular. Hence(22) impliesw = 0, and this proves that the matrices Rk, k = 0, 1,2, . . . , 2n form a linearly independent set. �

Lemma 1 has a few interesting implications. The set of all real-valued (2n+1)×(2n+1) symmetric matrices form a vector spaceof dimension (n + 1) (2n + 1). Since the rank of M is 2n + 1,it is not a surjective map. Hence, given an arbitrary (2n + 1) ×

(2n+1) symmetric matrix J, theremay not exist aΦ satisfying J =

M(Φ). Even if a solution exists, the solution is non-unique becauseM is not injective. On the positive side, the range of M can beparameterized using only 2n+1 parameters. If J = M(Φ) for someΦ ∈ G, then there exist unique scalars w0, w1, . . . , w2n such that

J = w0R0 + w1R1 + · · · + w2nR2n. (23)

The next lemma summarizes the connection between the parame-ters w0, w1, . . . , w2n and Φ . This involves the function ΦĚ definedvia the Fourier series (16) of Φ as

ΦĚ(z) = φ0/2 +

∞k=1

φkzk. (24)

There is an one to one correspondence between Φ and ΦĚ. If weknow Φ , then we can find ΦĚ uniquely, and vice versa. Note thatΦĚ ∈ H2, which is the Hardy space of square integrable functions,that are analytic inside the open unit disc D in the complex plane.Hence we can evaluate ΦĚ at A:

ΦĚ(A) = φ0I/2 +

∞k=1

φkAk. (25)

It is possible to evaluate ΦĚ at A because A has all its eigenvaluesinside the open unit disc. This requires the realization of G in (5) tobe asymptotically stable, see (12).

Lemma 2. For any Φ ∈ G it holds that

M(Φ) = ΦĚ(A)S + S[ΦĚ(A)]ᵀ

= w0R0 + w1R1 + · · · + w2nR2n, (26)

where

w = [w0 w1 · · · w2n]ᵀ= C−1ΦĚ(A)b. (27)

Proof. The identity (26) is a direct consequence of (16), (21) and(25). Indeed,

M(Φ) = φ0M(η0) + 2∞k=1

φkM(ηk)

= φ0S +

∞k=1

φk(AkS + S[Ak]ᵀ)

= ΦĚ(A)S + S[ΦĚ(A)]ᵀ.

Since A is circular, for every k > 0 we can write Ak as a uniquelinear combination of I,A, . . . ,A2n. Consequently, by definition in(25) ΦĚ(A) can be written uniquely as

ΦĚ(A) =

2nk=0

wkAk. (28)

Combining (28) with the identity M(Φ) = ΦĚ(A)S + S[ΦĚ(A)]ᵀ,we get the second equality in (26). Now post-multiply (28) by b toget ΦĚ(A)b = Cw, which is the same as (27). �

Remark 1. The second equality in (26) and (28) are equivalent. SeeAppendix for a proof. �


Given Φ , we can use (26) and (27) to calculate M(Φ) and w.These expressions also show how Φ affects M(Φ). We shall usethese expressions further in the next section to analyze the vari-ance error.

Let G+ be the set of all positive functions in G. Both Φu and Φv

belong to G+, because they are power spectral densities. Conse-quently, Φ ∈ G+. We wish to parameterize the set of admissibleinformation matrices using (23). For that we must find a conditionon {wk}

2nk=0 ensuring the existence of a Φ ∈ G+ such that M(Φ) =2n

k=0 wkRk. Clearly, Φ ∈ G+ implies M(Φ) ≽ 0 (we write Q ≽ 0to convey that Q is a non-negative definite matrix, and Q ≻ 0means Q is a positive definite matrix). This turns out to be the suf-ficient condition as well.

Theorem 1 (Georgiou, 2001, Theorems 1, 2). There exists Φ ∈ G+

such that

M(Φ) = w0R0 + w1R1 + · · · + w2nR2n

if and only if

w0R0 + w1R1 + · · · + w2nR2n ≽ 0. � (29)

Theorem 1 allows us to parameterize the set of all admissiblematrices via 2n+1 scalar parameters via a linearmatrix inequality.Such parameterization can be used in experiment design methodswhere an affine parameterization of J is needed (Barenthin,Bombois, Hjalmarsson, & Scorletti, 2008; Barenthin &Hjalmarsson,2008; Bombois, Scorletti, Gevers, Van denHof, & Hildebrand, 2006;Hildebrand & Gevers, 2003; Hjalmarsson & Jansson, 2008; Jansson& Hjalmarsson, 2005).

Remark 2. The fact, that J admits a minimal affine parameteriza-tion in terms of 2n+1 scalar valued parameters, was first observedin Stoica and Söderström (1982). In fact, it can be shown that thereis a bijective correspondence betweenw and the parameterizationused in Stoica and Söderström (1982). The parameterization pro-posed in Stoica and Söderström (1982) has recently been used inWahlberg, Annergren, and Rojas (2001) for experiment design. �

Remark 3. Note thatΦ(eiω) = 2Re{ΦĚ(eiω)}. Thus,Φ ∈ G+ if andonly ifΦĚ is a positive real function. Theorem 1 gives the necessaryand sufficient condition on w such that there exists a positive realΦĚ satisfying (27) (or equivalently, (28)). �

4. Properties of the variance error

Suppose A has m distinct eigenvalues {pk}mk=1, where pk is aneigenvalue of multiplicity nk, i.e.,

mk=1 nk = 2n + 1. Since (A, b)

is controllable, there is a nonsingular matrix V such that the pair(VAV−1,Vb) is in the Jordan canonical form, see Kailath (1980, p.127) for details. Furthermore,we knowVAV−1 hasm Jordan blocks,one for each distinct eigenvalue, i.e.,

K := VAV−1=

K1 0 · · · 0

0 K2. . .

......

. . .. . . 0

0 · · · 0 Km

,

d := Vb =

d1d2...dm

,

where Kk ∈ Cnk×nk and dk ∈ Rnk are given by

Kk =

pk 1 · · · 0

0 pk. . .

......

. . .. . . 1

0 · · · 0 pk

, dk =

0...01

.

Note that if K is complex valued, then so is V.Now consider the normalized variance error E(eiω), which is

defined as

E(eiω) = limN→∞

NVar{G(eiω, θN)}.

It is well-known that, see e.g. Ljung (1999),

E(eiω) = G∗

1(eiω)J−1G1(eiω).

Next, we manipulate this expression further using (13) to get

E(eiω) = bᵀ(I − eiωAᵀ)−1J−1(I − e−iωA)−1b= bᵀ(I − eiωAᵀ)−1

{ΦĚ(A)S + S[ΦĚ(A)]ᵀ}−1

× (I − e−iωA)−1b= d∗(I − eiωK∗)−1

{ΦĚ(K)S + S[ΦĚ(K)]∗}−1

× (I − e−iωK)−1d, (30)

where S = VSV∗. Eq. (30) reveals that Φ influences E(eiω) via theterms ΦĚ(K), and otherwise E(eiω) depends only on {pk}mk=1 andthemultiplicities {nk}

mk=1. It is well-known, see Golub and van Loan

(1996, p. 557), (and straightforward to verify by direct calculations)that

ΦĚ(K) =

ΦĚ(K1) 0 · · · 0

0 ΦĚ(K2). . .

......

. . .. . . 0

0 · · · 0 ΦĚ(Km)

, (31)

where

ΦĚ(Kk) =

ΦĚ(pk) Φ

(1)Ě (pk) · · ·

Φ(nk−1)Ě (pk)

(nk − 1)!

0 ΦĚ(pk). . .

......

. . .. . . Φ

(1)Ě (pk)

0 · · · 0 ΦĚ(pk)

. (32)

Note that Φ(j)Ě denotes the jth derivative of ΦĚ. By inspection of

(32),we note that for any k, thematrixΦĚ(Kk) is known completelyif we know the vector

ΦĚ(Kk)dk =

Φ

(nk−1)Ě (pk)

(nk − 1)!· · · Φ

(1)Ě (pk) ΦĚ(pk)

ᵀ

.

Also the structure of ΦĚ(K) in (31) reveals that we can reconstructthe entire matrix ΦĚ(K) if we know the vector

ΦĚ(K)d ={ΦĚ(K1)d1}

ᵀ· · · {ΦĚ(Km)dm}

ᵀᵀ

consisting of the values of ΦĚ and its derivatives evaluated at theeigenvalues of A. Hence, regardless of the shape of Φ , the varianceerror E depends only on the values of ΦĚ and its derivativesevaluated at the eigenvalues of A. While this observation is beingmade, one should also note the following facts from (12):

• If G(z, θ0) has a pole of multiplicity m1 at p = 0, then p is aneigenvalue of A of multiplicity 2m1.

• If G(z, θ0) has a pole of multiplicity m1 at 0, then 0 is aneigenvalue of A of multiplicity 2m1 + 1.


• If G(z, θ0) has no pole at 0, then 0 is an eigenvalue of A ofmultiplicity 1.

Remark 4. Suppose (26) holds. Then using (27) in Lemma 2, wenote that

ΦĚ(K)d = VΦĚ(A)b = VCw

= [d Kd · · · K2nd]w. (33)

Since the controllability matrix [d Kd · · · K2nd] depends only onthe eigenvalues of A, and their multiplicities, so doesw. As a result,w implicitly specifies ΦĚ(K)d consisting of the values of ΦĚ and itsderivatives at the eigenvalues of A. As long as these values remainfixed, the variance error E does not change. The set ofΦ forwhich Eremains the same can be characterized by the solutions of a Nevan-linna–Pick (NP) interpolation problem where one seeks a positivereal interpolant ΦĚ satisfying the interpolation data consisting ofthe values of ΦĚ and its derivatives at the eigenvalues of A. The-orem 1 then gives the conditions under which the interpolationproblem has a solution. In that sense, Theorem 1 generalizes theclassical Pickmatrix condition (Rosenblum&Rovnyak, 1985, chap-ter 2) for solvability of classical NP interpolation problem. �

5. On the set of Φ yielding the same variance error

Suppose that {wk}2nk=0 are given, andwewish to find aΦĚ satisfy-

ing the linear equation (28). Note that by settingΦĚ(z) = W (z) :=2nk=0 wkzk, we satisfy (28). However, from linear algebra we know

this is not the only solution. Take any Q (z) such that Q (A) = 0, setΦĚ(z) = W (z) + Q (z), and (28) holds. It is well-known that Q (z)can be factorized as (Georgiou, 2001)

Q (z) = B(z)Γ (z),

where Γ ∈ H2, and B(z) is the all-pass transfer function (note thatB(A) = 0)

B(z) =det(zI − A)

det(I − Aᵀz). (34)

Hence, a general solution to (28) has a form (Georgiou, 2001)

ΦĚ(z) = Γ (z)B(z) +

2nk=0

wkzk. (35)

However, ΦĚ in (35) may not be positive real. If {wk}2nk=0 satisfy

(29), only then there is a Γ ∈ H2 such that ΦĚ in (35) is positivereal. Such Γ is unique only when thematrix

2nk=0 wkRk is positive

semidefinite. Such cases are of little interest in our context, becausethen we have non-persistent excitation, resulting a singular infor-mation matrix.

The representation (35)motivates a canonical decomposition ofΦĚ(z) into two mutually orthogonal parts:

ΦĚ(z) = Γ1(z)B(z) + bᵀ(I − Aᵀz)−1S−1ΦĚ(A)b

= Γ1(z)B(z) + bᵀ(I − Aᵀz)−1S−1Cw. (36)

The first part in the right hand side of (36) is the orthogonalprojection of ΦĚ onto the subspace B(z)H2; this part has no effecton the variance error. It represents the ‘‘non-persistent’’ part of theexcitation. The second term in the right hand side of (36) is theorthogonal projection of ΦĚ onto the orthogonal complement

K = H2 ⊖ B(z)H2

of B(z)H2 in H2. From the analysis presented in the previous sec-tion, it is easy to see that this orthogonal projection of ΦĚ on Kdetermines E, and represents the ‘persistent part’ of the excitation.Even if ΦĚ is positive real, its orthogonal projection on K need not

be positive real. If {wk}2nk=0 satisfy (29), only then there is a Γ1 ∈ H2

such that ΦĚ in (36) is positive real.The function ΦĚ need not be rational. Although, a rational ΦĚ is

often preferred in practical applications. Given some w satisfying(29), there is no guarantee that there is a positive real ΦĚ of degreeless than 2n such that (28) holds. The smallest degree, for whichwe can always find a positive real ΦĚ, is 2n. Thus, it is quite naturalto focus on the degree 2n case. The interesting point is: the set ofall 2n degreeΦĚ of the form (36), that leads to the same asymptoticvariance error, can be freely parameterized via the spectral zeros ofΦ . In other words, we can place 2n minimum phase spectral zerosof Φ wherever we like without affecting the asymptotic varianceerror. This remarkable result is formally given by the followingtheorem due to Byrnes et al. (2001).

Theorem 2. Given any degree 2n monic polynomial L(z) with all itsroots inside D, andw satisfying (29), there exists a unique solution tothe optimization problem

minΦ∈G+

12π

π

−π

dω|L(eiω)|2

|det(eiωI − A)|2log

1

Φ(eiω)

,

subject tow = C−1ΦĚ(A)b. (37)

Furthermore, the minimum phase spectral factor of the solution Φ isof degree 2n, and its zeros coincide with those of L(z). �

Theorem 2 is constructive. It tells how to compute a positivereal interpolant ΦĚ satisfying (27), such that the zeros of the stableminimum phase spectral factor of the solution Φ are given by thezeros of L(z). The optimization problem in Theorem 2 is convex,and can be solved efficiently using the standard techniques ofconvex optimization, see Byrnes et al. (2001) for details. In thepresent context, Theorem 2 has two interesting consequences:

(1) By restricting theminimal phase spectral factor ofΦ(eiω) to bean order 2n rational function in G+, we can cover the set of allpossible E(eiω). This fact could be used to restrict the ‘searchspace’ in input excitation design problems without any loss ofoptimality.

(2) By keeping w fixed, and changing the positions of the spectralzeros one can change the shape of Φ(eiω) to a considerableextent (almost freely), while E(eiω) remains unaffected.

The extent to which the shape of Φ can change without changingE is illustrated in Fig. 1 with an example. Here

G(z, θ0) =z + 0.5

z2 − 1.5z + 0.7. (38)

The plots have been generated as follows. First we take

Φ(eiω) =1

|1 + α1e−iω + α2e−i2ω + α3e−i3ω + α4e−i4ω|2,

where α1 = −1.7464, α2 = 1.2602, α3 = −0.4366, and α4 =

0.625. We calculate the value of w corresponding to the abovechoice of Φ using Lemma 2, which yields

w = [4.9780 9.8845 2.7698 7.6819 0.5967]ᵀ.

By keeping w fixed, the variance error remains invariant. We usethis w to solve the optimization problem (37), where we take L(z)of the form

L(z) = (z − re0.35i) (z − re−0.35i) (z − re0.55i) (z − re−0.55i).

Note that the parameter r controls the locations of the zeros of L,which are identical to the locations of the minimum phase zerosof the PSD obtained by solving (37). Here we consider 5 differentvalues of r , denoted by

rk = 0.9k/5, k = 1, . . . , 5.


(a) Plots of Φk, k = 1, . . . , 5. (b) Analytical and empirical E.

Fig. 1. Illustration of different Φ functions leading to the same variance error. (a) Five different choices of Φ which give the same variance error; (b) comparison betweenthe analytical variance error and empirical variance errors for the above choices of Φ .

The corresponding solutions to (37), denoted by Φk(eiω), k = 1,. . . , 5, are plotted as functions of ω in Fig. 1(a). In Fig. 1(b)we compare the associated analytical normalized variance errorwith empirical normalized variance errors calculated from thesimulations. For each k, the empirical normalized variance errorresult is the based on 100 independent Monte-Carlo simulations,where we set Φu = Φk, and Φv(eiω) = 1, ∀ω. In each simulation500 input–output data samples are used in a prediction errormethod to compute an estimate.

6. Notes on experiment design

In this section we briefly discuss how our findings so far canbe used in the open-loop experiment design problem. Here weconsider a typical example. We assume that the true system isgiven by2

yt = G(z, θ0)ut + et . (39)

We wish to identify a model of the plant from N samples of theinput–output data. The ultimate modeling objective is to design acontroller such that the resulting closed loop system is stable, witha complementary sensitivity function close to T (eiω). Suppose thatthe model G(eiω, θN) identified by an identification experiment isused for control design and implementation. Then the H-infinitynorm of the weighted relative error

∆(eiω, θN) = T (eiω)G(eiω, θ0) − G(eiω, θN)

G(eiω, θ)

is a measure of the robust stability and performance. One popularobjective in experiment design is to ensure that the normalizedvariance

NVar{∆(eiω, θN)} < U(eiω), ∀ω, (40)

whereU ∈ G+ depends onN , desired confidence interval, and pos-sibly some ω dependent function. The constraint (40) is approxi-mately equivalent to Jansson and Hjalmarsson (2005)

Ξ(eiω) :=|G(eiω, θ0)|

2U(eiω)

|T (eiω)|2≥ E(eiω), ∀ω. (41)

2 Note that we are assuming H(eiω, ϑ) ≡ 1 for all ω. Later we will comment onthe case when this assumption does not hold.

We wish to find the ‘‘least costly’’ input power spectral density bysolving, see Bombois et al. (2006), Hildebrand and Gevers (2003)and Jansson and Hjalmarsson (2005),

minΦu∈G+

12π

π

−π

dω Φu(eiω),

subject toΞ(eiω) ≥ E(eiω). (42)

Note that E(eiω) is an implicit function of Φu. Also both E(eiω) andΞ(eiω) depend on θ0 and ϑ0, which are unknown. This problemis often tackled by computing some initial estimates of θ0 and ϑ0,which are then used to replace θ0 and ϑ0 in the relevant expres-sions (Jansson & Hjalmarsson, 2005). The utility of experiment de-sign frameworks like (42) is well established, e.g., Barenthin et al.(2008), Barenthin and Hjalmarsson (2008), Bombois et al. (2006),Hildebrand and Gevers (2003), Hjalmarsson and Jansson (2008)and Jansson and Hjalmarsson (2005). Here our main objective is togain some insight on the solution set of the optimization problemslike (42). These findings are also useful in understanding other op-timization problems encountered in experiment design literature,where an affine parameterization of the information matrix andinput power spectral density is used (Barenthin et al., 2008; Bar-enthin & Hjalmarsson, 2008; Bombois et al., 2006; Hildebrand &Gevers, 2003; Hjalmarsson & Jansson, 2008; Jansson & Hjalmars-son, 2005).

In the followingwe show that the infinite dimensional problemin (42) can be solved via a semidefinite program in w parameters.The following lemma is the first step in that goal.

Lemma 3. If Φv(eiω) = σ 2 then

12π

π

−π

dω Φu(eiω) = 2w0σ2,

wherew and w0 are as in (27).

Proof. Sincew satisfies (27), (35) holds. However, from (12) it canbe verified that det(A) = 0. Consequently, B(0) = 0, see (34). Thusby setting z = 0 in (35) we conclude that ΦĚ(0) = w0. However,from (24) we know

2ΦĚ(0) = φ0 =12π

π

−π

dω Φu(eiω)/σ 2,

which leads to the result. �


The next lemma shows that the constraint Ξ(eiω) ≥ E(eiω)is equivalent to a linear matrix inequality in a finite number ofvariables includingw. This result is similar in spirit and utility to ananalogous result derived in Jansson and Hjalmarsson (2005), but issomewhat simpler to prove and use.

Lemma 4. Supposew satisfies (27). Let the positive real half spectrumΞĚ be given in state space form as

ΞĚ(z) = d2/2 + bᵀ2(I/z − Aᵀ

2)−1c2.

Then Ξ(eiω) ≥ E(eiω), ∀ω, if and only if there exits a symmetricmatrix P satisfying

K(w, P) :=

FPFᵀ

− P G + FPHᵀ

Gᵀ+ HPFᵀ D(w) + HPHᵀ

≽ 0, (43)

where the matrices F,G,H and D are defined as

F =

A2 0n2×(2n+1)

0(2n+1)×n2 A

,

G =

b2 0n2×(2n+1)b 0(2n+1)×(2n+1)

,

H =

cᵀ2 01×(2n+1)

0(2n+1)×n2 I

,

D(w) =

d2 01×(2n+1)

0(2n+1)×1

2nk=0

wkRk

,

and n2 is the number of rows in A2.

Proof. Since (27) holds, then

Ξ(eiω) − E(eiω) = Ξ(eiω) − G∗

1(eiω)J−1G1(eiω)

= Ξ(eiω) − bᵀ(I − eiωAᵀ)−1

2nk=0

wkRk

−1

(I − e−iωA)−1b.

Hence, using Schur complement, Ξ(eiω) − E(eiω) ≥ 0, ∀ω if andonly if Ξ(eiω) bᵀ(e−iωI − Aᵀ)−1

(eiωI − A)−1b2nk=0

wkRk

≽ 0,

∀ω ∈ [−π, π]. (44)

However, (44) holds if and only if

Λ(z) :=

ΞĚ(z) bᵀ(I/z − Aᵀ)−1

02n+1,1

2nk=0

wkRk/2

is positive real. It can be verified via direct calculations that

Λ(z) = Gᵀ(z−1I − Fᵀ)−1Hᵀ+ D(w)/2.

Hence, by positive real lemma,Ξ(eiω)−E(eiω) ≥ 0, ∀ω if and onlyif there exist some symmetric matrix P such that (43) holds. �

Note that the LMI (44) implies (29). Since (44) and (43) areequivalent, the following result holds.

Corollary 1. The LMI (43) implies (29).

Thus, whether a Φ ∈ G belongs to the feasible set of the opti-mization problem (42) is determined completely by the associated

value of w. Furthermore, the cost function in (42) is the first com-ponent of w. Consequently, the orthogonal projection of Φ on Kis all that matters as far as the problem (42) is concerned. This isstated formally in form of the following lemma, which is the mainresult of this section.

Lemma 5. Consider the optimization problem

minw

w0,

subject to K(w, P) ≽ 0. (45)

Let w∗ be the solution to (45). Then Φ is a solution to (42) if and onlyif

w∗ = CΦĚ(A)b. � (46)

Clearly, the solution to (42) is non-unique, and based on ourdiscussion in Section 5, the solution set coincide with that of aNevanlinna–Pick interpolation problem determined byw∗ and thepoles of A. While choosing a particular interpolant Φ from this set,one enjoys a significant degree of freedom. For instance, one canuse Theorem 2 to ‘place’ the zeros in desired locations.

It should be noted that the recent results in Nevanlinna–Pickinterpolation theory offers some solutions which might be attrac-tive to experiment design problems. For instance, given a suitableΨ ∈ G+ we can choose an interpolantΦ such that the distance be-tweenΦ andΨ is minimized. The distancemeasure is chosen suchthat the resulting optimization problem is convex, and has a finitedimensional dual, so that it can be solved efficiently. Two popu-lar choices of distance measures have been advocated in the liter-ature. These are Kullback–Leibler distance (Ferrante et al., 2011;Georgiou & Lindquist, 2003; Pavon & Ferrante, 2005, 2006), andHellinger distance (Ferrante et al., 2008; Ramponi et al., 2009).

Next, we illustrate the above interpolation approach to exper-iment design with a numerical example. We use the same set upas in Jansson and Hjalmarsson (2005). The true transfer function inthe model (39) is given by

G(z, θ0) =0.36zz − 0.7

.

We take σ = 1, and the target complementary sensitivity functionas

T (eiω) =(1 − 0.1353)2z(z − 0.1353)2

. (47)

For simplicity, we set U(eiω) = 1, ∀ω. These choices give

ΞĚ(eiω) =0.1604z2 + 0.05137z + 0.004244

z2 + 0.7z.

The corresponding solution to the optimization problem (45) is

w∗ = [5.2003 0.3467 − 1.2323]ᵀ. (48)

In Fig. 2(a) we plot the function Ξ and the corresponding optimalvariance error E (obtained from w∗). Note that the inequality (41)holds for all frequencies.

In Fig. 2(b)–(d) we plot three possible choices of Φ , whichlead to the variance error E plotted in Fig. 2(a). For each case inFig. 2(b)–(d), we have chosen the interpolant Φ ∈ G+ such that:(i) the interpolation condition (46) holds, and (ii) the Kullback–Leibler distance (Georgiou & Lindquist, 2003)

dis(Φ, Ψ ) :=12π

π

−π

dω Ψ (eiω) log

Ψ (eiω)

Φ(eiω)


(a) Comparison of Ξ and E. (b) Result of minimizing KL distance from Ψ1 .

(c) Result of minimizing KL distance from Ψ2 . (d) Result of minimizing KL distance from Ψ3 .

Fig. 2. (a) Solution to the equivalent optimization (45) in w. The optimal E is compared with Ξ . (b)–(d) Interpolants obtained by interpolating w∗ in (48) by minimizingdis(Φ, Ψ ), whereΨ (eiω) = |(e2iω +0.5eiω +0.25)/(e2iω −0.5eiω +0.25)|2 in (b),Ψ (eiω) = |(e2iω −0.81)/(e2iω +0.81)|2 in (c), andΨ (eiω) = |(e2iω +0.81)/(e2iω −0.81)|2in (d).

betweenΦ and a givenΨ ∈ G+ is minimum.We have used a fixedpoint iteration algorithm proposed in Pavon and Ferrante (2006)(see also Ferrante et al., 2011 and Pavon & Ferrante, 2005) to com-pute the solutions. In each of Fig. 2(b)–(d) we plot3 Ψ and the asso-ciated Φ which minimizes dis(Φ, Ψ ). In Fig. 2(b) we use Ψ (eiω) =

|(e2iω + 0.5eiω + 0.25)/(e2iω − 0.5eiω + 0.25)|2. In Fig. 2(c) weuse Ψ (eiω) = |(e2iω − 0.81)/(e2iω + 0.81)|2. In Fig. 2(d) we useΨ (eiω) = |(e2iω + 0.81)/(e2iω − 0.81)|2. Note that the shape of Φcan be controlled to a significant extent by choosing a Ψ accord-ingly, while leaving the variance error invariant.

7. Conclusions

In this paper we have explored the properties of the asymp-totic variance error in system identification. We have found thatthe variance error E is controlled by the projection of Φ onto a

3 In order to make the plots in Fig. 2(b)–(d) more legible, we have scaled up Ψ

by a factor λ such that π

−π

dω Φ(ω) = λ

π

−π

dω Ψ (ω).

Note that, minimizing dis(Φ, λΨ ) gives the same solution as that obtained byminimizing dis(Φ, Ψ ).

finite dimensional space K of G. As a consequence, there are in-finitely many choices of Φ for which E does not change. The set ofall Φ which lead to a given admissible variance error is essentiallythe solution set of aNevanlinna–Pick interpolation problem,whereone specifies the values and derivatives of the ΦĚ at the origin, andthe poles of the system.Nextwehave examined the impact of thesefacts on experiment design.Wehave considered a common experi-ment design approach, and have shown that the solution set of theunderlying optimization problem is identical to that of a Nevan-linna–Pick interpolation problem.Wehave also demonstrated howsome recent tools in Nevanlinna–Pick interpolation theory can beused to tune the shape of the interpolant.

It should be noted that our treatment of experiment design israther restricted to output-error models like (39). For instance, theresults in Section 6 does not apply to Box–Jenkins models. This isbecause the assumptions of Lemma 3 are quite strong. Neverthe-less, it can be shown that the underlying geometric structures ofsolution sets for output error model and Box–Jenkins model arevery similar. In fact, the solution set of the experiment design prob-lem involving a Box–Jenkins model structure can be characterizedby some generalized interpolation problem (Sarason, 1967). In theoutput error case we have found that the orthogonal projection ofΦ on K is all that matters as far as the problem (42) is concerned.For Box–Jenkins model we can find a similar finite dimensionalsubspace K1 of H2, and only the orthogonal projection of Φ on K1determines whether Φ is an optimal solution.


Appendix. Proof of Remark 1

We wish to show that the following equalities are equivalent:

(i) ΦĚ(A)S + S[ΦĚ(A)]ᵀ=

2nk=0 wkRk;

(ii) ΦĚ(A) =2n

k=0 wkAk;(iii) ΦĚ(A)b = Cw.

It is straightforward to see that (ii) implies (i). Conversely,suppose (i) holds. Hence,

ΦĚ(A)S + S[ΦĚ(A)]ᵀ−

2nk=0

wkRk = 0. (A.1)

Since A is circular, there exist scalars wk, k = 0, 1, . . . , 2n suchthat

ΦĚ(A) =

2nk=0

wkAk. (A.2)

Combining (A.1) and (A.2) gives2nk=0

(wk − wk)Rk = 0,

which by linear independence of {Rk}2nk=0 implies wk = wk, k =

0, 1, . . . , 2n. Hence (i) implies (ii).It is straightforward to show (ii) implies (iii) by post-

multiplying (ii) by b. Conversely, suppose (iii) holds. Then for anyj ≥ 0,

AjΦĚ(A)b = ΦĚ(A)Ajb =

2nk=0

wkAj+kb.

Hence,

ΦĚ(A)[b Ab · · · A2nb] =

2nk=0

wkAk[b Ab · · · A2nb],

from which (ii) follows, because C = [b Ab · · · A2nb] is non-singular.

References

Barenthin, M., Bombois, X., Hjalmarsson, H., & Scorletti, G. (2008). Identification forcontrol of multivariable systems: controller validation and experiment designvia LMIs. Automatica, 44, 3070–3078.

Barenthin, M., & Hjalmarsson, H. (2008). Identification and control: joint inputdesign and h-infinity state feedback with ellipsoidal parametric uncertainty viaLMIs. Automatica, 44(2), 543–551.

Bombois, X., Scorletti, G., Gevers, M., Van den Hof, P. M. J., & Hildebrand, R. (2006).Least costly identification experiment for control. Auomatica, 42(3), 1651–1662.

Byrnes, C. I., Georgiou, T. T., & Lindquist, A. (2001). A generalized entropy criterionfor Nevanlinna–Pick interpolation with degree constraint. IEEE Transactions onAutomatic Control, 46(6), 822–839.

Ferrante, A., Pavon, M., & Ramponi, F. (2008). Hellinger vs. Kullback–Leiblermultivariable spectrum approximation. IEEE Transactions on Automatic Control,53(5), 954–967.

Ferrante, A., Ramponi, F., & Ticozzi, F. (2011). On the convergence of an efficientalgorithm for Kullback–Leibler approximation of spectral densities. IEEETransactions on Automatic Control, 56(3), 506–515.

Georgiou, T. T. (1999). The interpolation problem with a degree constraint. IEEETransactions on Automatic Control, 44(3), 631–635.

Georgiou, T. T. (2001). Spectral estimation via selective harmonic amplification. IEEETransactions on Automatic Control, 46(1), 29–42.

Georgiou, T. T. (2002a). Spectral analysis based on the state covariance: themaximum entropy spectrum and linear fractional parametrization. IEEETransactions on Automatic Control, 47(11), 1811–1823.

Georgiou, T. T. (2002b). The structure of state covariances and its relation to thepower spectrum of the input. IEEE Transactions on Automatic Control, 47(7),1056–1066.

Georgiou, T. (2006). Relative entropy and the multivariable multidimensionalmoment problem. IEEE Transactions on Information Theory, 52(3), 1052–1066.

Georgiou, T. T., & Lindquist, A. (2003). Kullback–Leibler approximation of spectraldensity functions. IEEE Transactions on Information Theory, 49(11), 2910–2917.

Georgiou, T., & Lindquist, A. (2008). A convex optimization approach to ARMAmodeling. IEEE Transactions Automatic Control, 53(5), 1108–1119.

Gevers, M., Ljung, L., & van den Hof, P. (2001). Asymptotic variance expressions forclosed loop identification. Automatica, 37, 781–786.

Golub, G. H., & van Loan, C. F. (1996).Matrix computations. John Hopkins Press.Hildebrand, R., & Gevers, M. (2003). Identification for control: optimal input design

with respect to a worst-case ν-gap cost function. SIAM Journal of Control andOptimization, 41(5), 1586–1608.

Hjalmarsson, H., & Jansson, H. (2008). Closed loop experiment design for linear timeinvariant dynamical systems via LMIs. Automatica, 44(3), 623–636.

Jansson, H., & Hjalmarsson, H. (2005). Input design via LMIs admitting frequency-wise model specifications in confidence regions. IEEE Transactions on AutomaticControl, 50(10), 1534–1549.

Kailath, T. (1980). Linear systems. Prentice Hall.Ljung, L. (1985). Asymptotic variance expressions for identified black-box transfer

function models. IEEE Transactions on Automatic Control, AC-30, 834–844.Ljung, L. (1999). System identification—theory for the user (2nd ed.). Upper Saddle

River, NJ, USA: Prentice Hall.Ninness, B., & Hjalmarsson, H. (2004). Variance error quantifications that are exact

for finite model order. IEEE Transactions on Automatic Control, 49, 1275–1291.Ninness, B., & Hjalmarsson, H. (2005). On the frequency domain accuracy of closed

loop estimates. Automatica, 41, 1109–1122.Pavon, M., & Ferrante, A. (2005). A new algorithm for Kullback–Leibler approxima-

tion of spectral densities. In 44th IEEE conference on decision and control. Sevile,Spain, December.

Pavon,M., & Ferrante, A. (2006). On theGeorgiou–Lindquist approach to constrainedKullback–Leibler approximation of spectral densities. IEEE Transactions onAutomatic Control, 51(4), 639–644.

Ramponi, F., Ferrante, A., & Pavon, M. (2009). A globally convergent matricialalgorithm for multivariate spectral estimation. IEEE Transactions on AutomaticControl, 54(10), 2376–2388.

Rosenblum, M., & Rovnyak, J. (1985). Hardy classes and operator theory. Dover.Sarason, D. (1967). Generalized interpolation in H∞ . Transactions of the American

Mathematical Society, 127(2), 179–203.Stoica, P., & Söderström, T. (1982). A useful input parameterization for optimal input

design. IEEE Transactions on Automatic Control, AC-27(4), 986–989.Wahlberg, B., Annergren, M., & Rojas, C.R. (2001). On optimal input signal design for

identification of output error models. In IEEE conference on decision and control.Orlando, FL, December.

Xie, L. L., & Ljung, L. (2001). Asymptotic variance expressions for estimatedfrequency functions. IEEE Transactions on Automatic Control, 46, 1887–1899.

Kaushik Mahata received his Ph.D. in signal processingfrom Uppsala University, Sweden in 2003. Since then hehas been with the University of Newcastle, Australia. Hisresearch interest includes system identification, spectralanalysis, compressive sensing and telecommunicationnetworks.

Documents

Variance error, interpolation and experiment design