Upload
willa-fleming
View
212
Download
0
Embed Size (px)
Citation preview
2
Overview
LinearAdaptiveSystems
CriterionCriterion
AlgorithmAlgorithm
TopologyTopology
MSEMSE
LMS/RLSLMS/RLS
FIR, IIRFIR, IIR
AEC
AECAlgorithms
3
Why another criterion?
MSE gives biased parameter estimates with noisy data
x(n)Adaptive
Filter e(n)
d(n)
w
v(n)
+-
++
+
u(n)
T. Söderström, P. Stoica. “System Identification.” Prentice-Hall, London, United Kingdom, 1989.
4
Is the Wiener-MSE solution optimal?
white input noise:
W=(R+σ2I)-1P
Unknown σ2
Assumptions:
1. v(n), u(n) are uncorrelated with input & desired
2. v(n) and u(n) are uncorrelated with each other
colored input noise:
W=(R+V)-1P
Unknown V
Solution will change with changing noise statistics
50 5 10 15 20 25 30 35 40 45 50-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
RLS True
An example
Input SNR = 0dB
taps
6
Existing solutions…
Gives exact unbiased estimate
Total Least Squares
iff
v(n) and u(n) are iid with
equal variances !!
Input is noisy and desired is noise-free
Y.N. Rao, J.C. Principe. “Efficient Total Least Squares Method for System Modeling using Minor Component Analysis.” IEEE Workshop on Neural Networks for Signal Processing XII, 2002.
7
Existing solutions …
Extended Total Least Squares
Gives exact unbiased estimate
with colored v(n) and u(n)
iff
noise statistics are known!!
J. Mathews, A. Cichocki. “Total Least Squares Estimation.” Technical Report, University of Utah, USA and Brain Science Institute Riken, 2000.
8
Going beyond MSE - Motivation
Assumption:
1. v(n) and u(n) are white
The input covariance matrix is,
R=Rx+σ2I
Only the diagonal terms are corrupted!
We will exploit this fact
9
Going beyond MSE - Motivation
w = estimated weights ( length L )
wT = True weights ( length M )
wvwwx )()(])[()( nnunne TT
T
wvvw
wwxxww
)]()([
])][()([][)]()([)(
nnE
nnEneneETT
TTT
Te
If Δ ≥ L, w = wT ρe(Δ) = 0
J.C. Principe, Y.N. Rao, D. Erdogmus. “Error Whitening Wiener Filters: Theory and Algorithms.” Chapter-10, Least-Mean-Square Adaptive Filters, S. Haykin, B. Widrow, (eds.), John Wiley, New York, 2003.
10
Augmented Error Criterion (AEC)
22 )]()([5.0)]([)]()([ neneEneEneneE
Define )]()([)( nenene
)]([)]([)( 22 neEneEJ w AEC
MSEMSE Error penaltyError penalty
11
AEC can be interpreted as…
β > 0 Error constrained (penalty) MSE
Error smoothness constraint
)]([)]([)( 22 neEneEJ w
Joint MSE and error entropy
12
From AEC to Error Whitening
With β = -0.5, AEC cost function reduces to,
)]()([)( neneEJ w
β < 0 Simultaneous minimization of MSE and
maximization of error entropy
When J(w) = 0, the resulting w
partially whitens the error signal!
and is unbiased (Δ>L) even with white noise
13
Optimal AEC solution w*
QPSRw 1*
)])([( ],[
],))([( ],[
LnnLnnnn
TLnnLnn
Tnn
ddEdE
EE
xxQxP
xxxxSxxR
Irrespective of β, the stationary point of the
AEC cost function is
Choose a suitable lag L
14
In summary AEC…
)]([)]([)( 22 neEneEJ w
β=0 β=-0.5 β>0
MSE EWC AEC
Minimization Minimization Root finding!
Shape of Performance Surface
15
Searching for AEC-optimal w
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
w1
w2
β>0
16
Searching for AEC-optimal w
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
w1
w 2
β<0
17
Searching for AEC-optimal w
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
w1
w2
Decreasing
Increasing
β<0
18
Stochastic search – AEC-LMS
Problem
The stationary point for AEC with β < 0 can be a
global min, global max or a saddle point
Theoretically, a saddle point is unstable and a single sign step-size can never converge to a
saddle point
Use sign information
19
)]()(5.0)()())[(5.0)(sgn()()()1( 22 nnennenenennn xxww
QPSRw 5.05.0 1*
Convergence in MS sense iff
222
5.0
)5.0(20
kkkk
kk
eeE
eeE
xx
AEC-LMS: β = -0.5
]))((][)([2
)( 22222ˆ vuave keETrL
wIR
Y.N. Rao, D. Erdogmus, G.Y. Rao, J.C. Principe. “Stochastic Error Whitening Algorithm for Linear Filter Estimation with Noisy Data.” Neural Networks, June 2003.
24
Quasi-Newton AEC
QPSRw 1*
Problem
Optimal solution requires matrix inversion
Solution
Matrices R and S are positive-definite, symmetric and allow rank-1 recursion
Overall, T = R + βS has a rank-2 update
25
Quasi-Newton AEC
TLnnn ))()()(( xxx
T(n) = R(n) + βS(n)
)())()(2()1()( nLnnnn TxxxTT
Use Sherman-Morrison-Woodbury identity
1111111)( ADB)ADB(CAABCDA TTT
Y.N. Rao, D. Erdogmus, G.Y. Rao, J.C. Principe. “Fast Error Whitening Algorithms for System Identification and Control.” IEEE Workshop on Neural Networks for Signal Processing XIII, September 2003.
26
Quasi-Newton AEC
Initialize c is a large positive constant
Initialize
At every iteration, compute
])())()(2([ nLnn xxxB
,)0(1 IQ c
0w )0(
]))()(()([ Lnnn xxxD
1122
1 )1()1()( BQDIBQκ nnn T
x
)1()()( nnny T wx )1()()( nLnLny T wx
))]()(()()();()([)( LnyLndnyndnyndn e
)()()1()( nnnn eκww
)1()()1()( 111 nnnn TZDκZZ
28
Quasi-Newton AEC analysis
Fact 1: Convergence achieved in finite number of steps
Fact 2: Estimation error covariance is bound from above
Fact 3: Trace of error covariance is mainly dependent on the smallest eigenvalue of R+βS
2
220
42
2
1][ LLun
Tn TrTrE RRRεε
Y.N. Rao, D. Erdogmus, G.Y. Rao, J.C. Principe. “Fast Error Whitening Algorithms for System Identification and Control with Noisy Data.” NeuroComputing, to appear in 2004.
31
Minor Components based EWC
)(2 LdTL
LL
P
PRGAugmented Data Matrix
][ ][ nLnLnnLTnLn
TLnnL ddEE xxPxxxxR
LL PRw 1*
Optimal EWC solution
Symmetric, indefinite matrix
QPSRw 5.05.0 1*
motivated from TLS
32
Minor Components based EWC
Problem
Computing eigenvector corresponding to zero eigenvalue of an indefinite matrix
)1(
)1()1(
)()1()1( 1
n
nn
nnn
w
ww
wGw
Inverse iteration
EWC-TLS
Y.N. Rao, D. Erdogmus, J.C. Principe. “Error Whitening Criterion for Adaptive Filtering: Theory and Algorithms.” IEEE Transactions on Signal Processing, to appear.
34
Inverse control using EWC
Adaptive
controller
Plant
(model)
Reference
Model
-
AR
plant
FIR
model
noise
35
-20 -15 -10 -5 0 5 10 150
100
200
300
400
500
600
700
800
900
1000nu
mbe
r of
sam
ples
Histogram plot of the errors
EWCMSE
positive error negative error 0 100 200 300 400 500 600 700 800 900 1000
-10
-8
-6
-4
-2
0
2
4
6
8
10
samples
Performance with EWC controller-plant pair
outputdesired
0 100 200 300 400 500 600 700 800 900 1000-20
-15
-10
-5
0
5
10
15
samples
Performance with MSE controller-plant pair
outputdesired
36
Going beyond white noise…
EWC can be extended to handle colored noise if Noise correlation depth is known Noise covariance structure is known
Otherwise the results will be biased by the noise terms
Exploit the fact that the output and desired signals have independent noise terms
37
Modified cost function
N – filter length (assume sufficient order)
e – error signal with noisy data
d – noisy desired signal
Δ – lags chosen (need many!)
N
kkkk dedeEJ1
][)(w
Y.N. Rao, D. Erdogmus, J.C. Principe. “Accurate Linear Parameter Estimation in Colored Noise.” International Conference on Acoustics, Speech and Signal Processing, May 2004.
38
Cost function…
][][][][
][][][][
kkTkk
TTT
Tkk
TTkk
kkTkk
TTT
Tkk
TTkk
uuEEEdeE
uuEEEdeE
wxxwwxxw
wxxwwxxw
If noise in the desired signal is white
Input Noise drops out completely!
0][ kkuuE
N
TTT
TTJ
1
)( wRwwRww ][ Tkk
TkkE xxxxR
39
Optimal solution by root-finding
There is a single unique solution for the
*w
0)( * wJ Tww *
.
and equation
][
....
][
][
][
....
][
][
2 2
1
1
22
11
*
Nkk
kk
kk
TkNk
TNkk
Tkk
Tkk
Tkk
Tkk
ddE
ddE
ddE
ddE
ddE
ddE
xx
xx
xx
w
40
Stochastic algorithm
)()(1
1 kkkk
N
kkkkkk dddedesign
xxww
Asymptotically converges to the optimal solution iff
21
)(
][2
k
N
kkkk
JE
dedeE
w
43
Extensions to colored noise in desired signal
][][][][
][][][][
kkTkk
TTT
Tkk
TTkk
kkTkk
TTT
Tkk
TTkk
uuEEEdeE
uuEEEdeE
wxxwwxxw
wxxwwxxw
If the noise in desired signal is colored, then
][2][][ kkTTT
TTkkkk uuEdeEdeE wRwwRw
Introduce a penalty term in the cost function
such that the overall cost converges to
][2 kkuuE
44
But, we do not know ][2 kkuuE
Introduce estimators of in the cost! ][2 kkuuE
NNNN
kzkzJ1
2
1
22
11
2 ])([)()( w
kkkk dedekz )(Define
The constants α and β are positive real
numbers that control the stability
45
Gradients…
)]()([2))((2),,(
11kkkk
NN
kkkk ddkzddkzJ
xxxxw
θλw
2])([2),,(
kzJ θλw
2])([),,( 2kz
J θλw
46
Parameter updates
NNNN
kzkzJ1
2
1
22
11
2 ])([)()( w
k
kkkkk
k
kkkkk
k
kkkkk
J
J
J
),,(
),,(
),,(
,1,
,1,
1
θλw
θλw
w
θλwww w
48
Summary
Noise is everywhere
MSE is not optimal even for linear systems
Proposed AEC and its extensions handle noisy data
Simple online algorithms optimize AEC
51
Future Thoughts
Complete analysis of the modified algorithm
Extensions to non-linear systems Difficult with global non-linear models Using Multiple Models ?
Unsupervised learning Robust subspace estimation Clustering ?
Other applications