73
Independent Component Analysis 主主主 主主主

Independent Component Analysis

  • Upload
    nura

  • View
    50

  • Download
    1

Embed Size (px)

DESCRIPTION

Independent Component Analysis. 主講人:虞台文. Content. What is ICA? Nongaussianity Measurement — Kurtosis ICA By Maximization of Nongaussianity Gradient and FastICA Algorithms Using Kurtosis Measuring Nongaussianity by Negentropy FastICA Using Negentrophy. Independent Component Analysis. - PowerPoint PPT Presentation

Citation preview

Page 1: Independent Component Analysis

Independent Component Analysis

主講人:虞台文

Page 2: Independent Component Analysis

Content

What is ICA? Nongaussianity Measurement — Kurtosis ICA By Maximization of Nongaussianity Gradient and FastICA Algorithms Using Kurt

osis Measuring Nongaussianity by Negentropy FastICA Using Negentrophy

Page 3: Independent Component Analysis

Independent Component Analysis

What is ICA?

Page 4: Independent Component Analysis

Motivation

Example: three people are speaking simultaneously in a room that has three microphones.

Denote the microphone signals by x1(t), x2(t), and x3(t).

They are mixtures of sources s1(t), s2(t), and s3(t).

The goal is to estimate the original speech signals using only the recorded signals.

This is called the cocktail-party problem.

11 12 13

21 22 23

31

1 2 3

1 2 3

1 2 3323 3

1

2

3

( ) ( ) ( )

( ) (

( )

( )

( )

) ( )

( ) ( ) ( )

x t

x t

x t

a a a

a a

s t s t s t

s t s t s t

s t s

a

ta sa t a

Page 5: Independent Component Analysis

The Cocktail-Party Problem

The original speech signals The mixed speech signals

Page 6: Independent Component Analysis

The Cocktail-Party Problem

The original speech signals The estimated sources

Page 7: Independent Component Analysis

The Problem

)()()()(

)()()()(

)()()()(

3332321313

3232221212

3132121111

tsatsatsatx

tsatsatsatx

tsatsatsatx

Asx

Find the sources s1(t), s2(t) and s

3(t), and the coefficients aij’s from the observed signals x1(t), x2

(t), and x3(t).

It turns out that the problem can be solved just by assuming that the sources si(t) are nongaussian and statistically independent.

)()()()(

)()()()(

)()()()(

3332321313

3232221212

3132121111

txbtxbtxbts

txbtxbtxbts

txbtxbtxbts

BxxAs 1

Page 8: Independent Component Analysis

Applications

Cocktail party problem: separation of voices or music or sounds

Sensor array processing, e.g. radar Biomedical signal processing with multiple sensors: EEG,

ECG, MEG, fMRI Telecommunications: e.g. multiuser detection in CDMA Financial and other time series Noise removal from signals and images Feature extraction for images and signals Brain modelling

Page 9: Independent Component Analysis

Basic ICA Model

nitsatsatsatx niniii ,,2,1 ),()()()( 2211

Mixing signals(observable) Latent variables

)(1 tx

)(2 tx

),( 21 xxp

1x

2x

)( 1xp

)( 2xp

Asx Asx

Page 10: Independent Component Analysis

The Basic Assumptions

The independent components are assumed statistically independent.

The independent components must have nongaussian distributions.

For simplicity, we assume that the unknown mixing matrix A is square.

Asx Asx

Page 11: Independent Component Analysis

Assumption I:Statistical Independence

Basically, random variables y1, y2, …, yn are said to be independent if information on the value of yi does not give any information on the value of yj for i j.

Mathematically, the joint pdf is factorizable in the following way:

p(y1, y2, …, yn) = p1(y1) p2(y2)…pn(yn) Note that uncorrelatedness does not necessary im

ply independence.

Asx Asx

Page 12: Independent Component Analysis

Assumption II:Nongaussian Distributions

Note that in the basic model we do not have to know what the nongaussian distributions of the ICs look like.

Asx Asx

Page 13: Independent Component Analysis

Assumption III:Mixing Matrix is square

In other words, the number of independent components is equal to the number of observed mixtures.– This simplifies our discussion in the first stage.

However, in the basic ICA model, this is no restriction as long as originally the number of observations xi is at least as large as the number of sources sj.

Asx Asx

Page 14: Independent Component Analysis

Ambiguities of ICA

We cannot determine the variances (energies) of IC’s.

– This also implies E[x]=0 (centering of x) and sign of si is unimportant.

We cannot determine the order of IC’s.

Asx Asx

1

1n

i ii i

is

x a Therefore, we assume

1][ 2 isE

0][ isE

PsAPx 1 where P is any permutation matrix.

Page 15: Independent Component Analysis

Illustration of ICA

otherwise

ssp i

i

0

3||32

1)(

210

105A

Mixing

Asx

Page 16: Independent Component Analysis

Whitening Is Only Half of ICA

whiteningwhitening

Vxz Whitening

Matrix

Asx

Page 17: Independent Component Analysis

Whitening Is Only Half of ICA

Vxz

)( izp

By whitening, we have E[zzT] = I.

This, however, doesn’t imply zi’s are independent, i.e., we may have

n

iiin zpzzzp

121 )(),,,(

Uncorrelatedness is related to independence, but is weaker than independence.

Page 18: Independent Component Analysis

Independent Component Analysis

Vxz Central limit theorem implicitly tells us that the additive of components, makes the distribution to become ‘more’ Gaussian.Therefore, nongaussianity is an important criterion for ICA.

Degaussian is hence the central theme in ICA.

Page 19: Independent Component Analysis

Independent Component Analysis

Nongaussianity Measurement — Kurtosis

Page 20: Independent Component Analysis

Moments

dxxpxxE jj

j )(][

dxxpxxE jj

j )(][The jth moment:

Mean:

dxxpmxxE j

xj

j )()(])[( 1

dxxpmxxE j

xj

j )()(])[( 1The jth central m

oment:

Variance:

][1 xEmx

])[( 22

2xx mxE

Skewness: ])[()( 33 xmxExskew

Page 21: Independent Component Analysis

Moment Generating Function

The moment generating function MX(t) of a random variable X is defined by:

X~N(, 2)

Z~N(0, 1)

dxxpeeEtM txtX

X )(][)(

dxxpeeEtM txtX

X )(][)(

2/22

)( ttX eetM

2/2

)( tZ etM

!3

][

!2

][

!1

][1][)(

3322 tXEtXEtXEeEtM tX

X

!3

][

!2

][

!1

][1][)(

3322 tXEtXEtXEeEtM tX

X

Page 22: Independent Component Analysis

Standard Normal Distribution N(0, 1)

2/2

)( tZ etM 2/2

)( tZ etM

0][ 12 kZE

!2

!2][ 2

k

kZE

kk

!32!22!12

13

6

2

42 ttt

!3

][

!2

][

!1

][1][)(

3322 tXEtXEtXEeEtM tX

X

!3

][

!2

][

!1

][1][)(

3322 tXEtXEtXEeEtM tX

X

Zero for all odd moments

1][ 2 ZE

3][ 4 ZE

Page 23: Independent Component Analysis

Kurtosis

Kurtosis of a zero-mean random variable X is defined by

Normalize kurtosis:

224 ])[(3][)( XEXEXkurt 224 ])[(3][)( XEXEXkurt

3])[(

][)(~

22

4

XE

XEX 3

])[(

][)(~

22

4

XE

XEX

0)( Zkurt

Page 24: Independent Component Analysis

Gaussianity

Gaussian

Supergaussian

Subgaussian

||

2)( .,. xexpge

||

2)( .,. xexpge

2 / 21. ., ( )

2xe g p x e

2 / 21. ., ( )

2xe g p x e

],[ ,2

1)( .,. aax

axpge ],[ ,

2

1)( .,. aax

axpge

Page 25: Independent Component Analysis

Kurtosis for Supergaussian

||

2)( xexp Consider Laplacian Distribution:

dxexXE x||22

2][

dxex x

0

22

2

0][ XE

dxexXE x||44

2][

dxex x

0

44

24

2

24

23

24)(

Xkurt4

12

3)(~ X > 0

224 ])[(3][)( XEXEXkurt 224 ])[(3][)( XEXEXkurt

Page 26: Independent Component Analysis

Kurtosis for Supergaussian

||

2)( xexp Consider Laplacian Distribution:

dxexXE x||22

2][

dxex x

0

22

2

0][ XE

dxexXE x||44

2][

dxex x

0

44

24

2

24

23

24)(

Xkurt4

12

3)(~ X > 0

224 ])[(3][)( XEXEXkurt 224 ])[(3][)( XEXEXkurt

3])[(

][)(~

22

4

XE

XEX 3

])[(

][)(~

22

4

XE

XEX

Page 27: Independent Component Analysis

Kurtosis for Subgassian

Consider Uniform Distribution:

dxxa

XEa

a 22

2

1][

3

2a

0][ XE

dxxa

XE

44

2

1][

5

4a

224

33

5)(

aaXkurt

15

2 2a

5

6)(~ X < 0

],[ ,2

1)( aax

axp

224 ])[(3][)( XEXEXkurt 224 ])[(3][)( XEXEXkurt

3])[(

][)(~

22

4

XE

XEX 3

])[(

][)(~

22

4

XE

XEX

Page 28: Independent Component Analysis

Nongaussianity Measurement By Kurtosis

Kurtosis, or rather is absolute value, has been widely used as a measure of nongaussianity in ICA and related fields.

Computationally, kurtosis can be estimated simply by using the 4th moment of the sample data (if the variance is kept constant).

224 ])[(3][)( XEXEXkurt 224 ])[(3][)( XEXEXkurt

Page 29: Independent Component Analysis

Properties of Kurtosis

Let X1 and X2 be two independent variables both have zero mean.

)()()( 2121 XkurtXkurtXXkurt

)()( 14

1 XkurtXkurt

224 ])[(3][)( XEXEXkurt 224 ])[(3][)( XEXEXkurt

Page 30: Independent Component Analysis

Independent Component Analysis

ICA By Maximization of Nongaussianity

Page 31: Independent Component Analysis

Restate the Problem

Asx

zero mean (observable)

mixing matrix (unknown)

zero mean & unit variance IC’s(latent; unknown)

xAs 1Ultimate goal

How?

Page 32: Independent Component Analysis

Simplification

Asx xAs 1Ultimate goal

whi

teni

ng

For simplicity, we assume sources are i.i.d.

To estimate an independent component by

xbTy AsbT sqTIf b is properly identified, qT = bTA contains only one nonzero entry with value one.

This implies that b will be one row of identified, A1.

Page 33: Independent Component Analysis

Nongaussian Is Independent

Asx xAs 1Ultimate goal

whi

teni

ng

For simplicity, we assume sources are i.i.d.

To estimate an independent component by

xbTy AsbT sqT

We will take b that maximizes the nongaussianity of bTx.

sqAsbxb TTTy sqAsbxb TTTy

Page 34: Independent Component Analysis

Nongaussian Is Independent

sqAsbxb TTTy sqAsbxb TTTy

s1

s2

210

105A

MixingMixing

Asx

Page 35: Independent Component Analysis

Nongaussian Is Independent

sqAsbxb TTTy sqAsbxb TTTy

Asx Vxz

whiteningwhitening

Page 36: Independent Component Analysis

Nongaussian Is Independent

sqAsbxb TTTy sqAsbxb TTTy

Vxz )( izp

Additive of components becomes more Gaussian

Page 37: Independent Component Analysis

Nongaussian Is Independent

sqAsbxb TTTy sqAsbxb TTTy

Wzy Vxz

RotationRotation y1

y2

T),( 21 wwW

Page 38: Independent Component Analysis

Nongaussian Is Independent

sqAsbxb TTTy sqAsbxb TTTy

yi

p(yi)

Estimated density Wzy

y1

y2

Page 39: Independent Component Analysis

Nongaussian Is Independent

Wzy

y1

y2

sqAsbxb TTTy sqAsbxb TTTy

Consider to get one independent component.

Ty w z VAswT sqT

bT

1||||))((|||| 22 wwVAVAwq TTT

x

Page 40: Independent Component Analysis

Nongaussian Is Independent

sqAsbxb TTTy sqAsbxb TTTy

Consider to get one independent component.

Ty w z

1||||))((|||| 22 wwVAVAwq TTT

Project the whitened data to a unit vector w to get an independent component.

w

Page 41: Independent Component Analysis

Nongaussian Is Independent

sqAsbxb TTTy sqAsbxb TTTy

2211 sqsqy

)()( 2211 sqsqkurtykurt )()( 2

421

41 skurtqskurtq

1)()(][ 2221

21

2 sVarqsVarqyE

)( 42

41 qqc

122

21 qq

We require that

The search space is

q1

q2

Using kurtosis as nongaussianity measurement.

Page 42: Independent Component Analysis

Independent Component Analysis

Gradient Algorithm Using Kurtosis

Page 43: Independent Component Analysis

Criterion for ICA Using Kurtosis

maximize |)(| zwTkurt

224 ][3][)( XEXEXkurt 224 ][3][)( XEXEXkurt

Subject to 1|||| 2w

|])[(3])[(||)(|224 zwzwzw ww

TTT EEkurt

|||||3])[(|224 wzww TE

}||||3])[()){((4 23 wwzzwzw TT Ekurtsign

Page 44: Independent Component Analysis

Gradient Algorithm

}||||3])[()){((4|)(| 23 wwzzwzwzww TTT Ekurtsignkurt

maximize |)(| zwTkurt

Subject to 1|||| 2w

unrelated

])[())(( 3 zzwzww TT Ekurtsign||||/ www

Page 45: Independent Component Analysis

FastICA Algorithm

}||||3])[()){((4|)(| 23 wwzzwzwzww TTT Ekurtsignkurt

maximize |)(| zwTkurt

Subject to 1|||| 2w

At a stable point, the gradient must point in the direction of w.Using fixed-point interation, then

23 ||||3])[( wwzzww TE sign is not important

FastICA wzzww 3])[( 3 TE ||||/ www

Page 46: Independent Component Analysis

Independent Component Analysis

Measuring Nongaussianity by Negentrop

y

Page 47: Independent Component Analysis

Critique of Kurtosis

Kurtosis can be very sensitive to outliers.– Kurtosis may depend on only a few observations i

n the tails of the distribution.

Not a robust measure of nongaussianity.

224 ][3][)( XEXEXkurt 224 ][3][)( XEXEXkurt

Page 48: Independent Component Analysis

Negentropy

dxppH )(log)()( xxX XX

dxppH )(log)()( xxX XX

Differential Entropy

Negentropy Entropy

)()()( XXX HHJ gauss )()()( XXX HHJ gauss ≧0Negentropy is zero only when the random variable is Gaussian distributed.

]2log1[2

detlog2

1)(

nH gaussX ]2log1[

2detlog

2

1)(

nH gaussX

It is invariant by a invertible linear transformation.

Page 49: Independent Component Analysis

Approximation of Negentropy (I)

48

)(

12

)()(

24

23 xx

XJ

][)( 33 XEx

3][)( 44 XEx

Skewness

Kurtosis

For a zero mean and unit variance random variable.

Using approximation is helpless because it is sensitive to outliers.

Page 50: Independent Component Analysis

Approximation of Negentropy (II)

2222

211 )]([)]([)]([)( ZGEXGEkXGEkXJ

Measures the asymmetry

Measures the dimension of bimodality

vs. peak at zero

G1(x) odd

G2(x) even

Choose two nonpolynomial functions

The first term is zero if the underlying density is zero.

such that

dzzZGZGE ]2/exp[)(2

1)]([ 2

22

dzzZGZGE ]2/exp[)(

2

1)]([ 2

22

Usually, only the second term is used.

Page 51: Independent Component Analysis

Approximation of Negentropy (II)

2)]([)]([)( ZGEXGEXJ

If only an even nonpolynomial function, say, G is used, we have

The following two functions are useful

xaa

xG 11

1 coshlog1

)(

]2/exp[)( 22 xxG

21 1 a

G1

G2

G3(x)=x4

Page 52: Independent Component Analysis

Degaussian

2)]([)]([)( ZGEXGEXJ

For ICA, we want to maximize this quantity.

Specifically, let z = Vx be the whitened data.

For one-unit ICA, we want to find a rotation, say, w to

2)]([)]([)( ZGEGEkJ TT zwzwmaximize

1|||| 2wsubject to

Page 53: Independent Component Analysis

Gradient Algorithm

2)]([)]([)( ZGEGEkJ TT zwzwmaximize

Fact: 1])[( 2 zwTE

)],([)( zwzzwwTT gEJ

)]}([)]([{2 ZGEGEk T zwAlgorithm

)]([ zwzw TgE

||||/ www

constant

batch mode

)( zwzw Tg

||||/ www

On-line mode

Gg

Page 54: Independent Component Analysis

Analysis

2)]([)]([)( ZGEGEkJ TT zwzwmaximize

Consider the term inside the braces.

)]([)]([)( ZGEXGEXf

G1

G2

G3(x)=x4

xaa

xG 11

1 coshlog1

)(

]2/exp[)( 22 xxG

43 )( xxG

The functions G’s we used have the following property:

0)( Xf

0)( Xf

if X is supergaussian

if X is subgaussian

Page 55: Independent Component Analysis

Analysis

2)]([)]([)( ZGEGEkJ TT zwzwmaximize

G1

G2

G3(x)=x4

xaa

xG 11

1 coshlog1

)(

]2/exp[)( 22 xxG

43 )( xxG

0)( Xf

0)( Xf

if X is supergaussian

if X is subgaussian

Consider the term inside the braces.

)]([)]([)( ZGEXGEXf

The functions G’s we used have the following property:

Minimize E[G(wTz)] if IC is suppergaussian.

Maximize E[G(wTz)] if IC is subgaussian.

Page 56: Independent Component Analysis

Analysis

xaxg 11 tanh)(

]2/exp[)( 22 xxxg

33 4)( xxg

g1

g2

g3

xaa

xG 11

1 coshlog1

)(

]2/exp[)( 22 xxG

43 )( xxG

G1

G2

G3(x)=x4

Both g1 and g2 are more insensitive on outliers than g3.

Page 57: Independent Component Analysis

Analysis

)]}([)]([{2 ZGEGEk T zwAlgorithm

)]([ zwzw TgE

||||/ www

batch mode

)( zwzw Tg

||||/ www

On-line mode

Controls the search direction.The sign is dependent on the super/subgaussianity of samples

Nonlinearity g(wTt) is for weighting samples.

Nonlinearity g(wTt) is for weighting samples.

Page 58: Independent Component Analysis

Stability Analysis

Assume that the input data follows the ICA model with whiten data: z = VAs.

And, G is a sufficiently smooth even function. Then, the local maxima (resp. minima) of E[G(wTz)] unde

r the constraint ||w|| = 1 include those rows of the inverse of the mixing matrix VA such that the corresponding independent components si satisify

)0 .( 0)]()([ respsgsgsE iii

Page 59: Independent Component Analysis

Stability Analysis

Assume that the input data follows the ICA model with whiten data: z = VAs.

And, G is a sufficiently smooth even function. Then, the local maxima (resp. minima) of E[G(wTz)] unde

r the constraint ||w|| = 1 include those rows of the inverse of the mixing matrix VA such that the corresponding independent components si satisify

0)]()([)]()([ ZGsGEsgsgsE iiii

This condition is, in general, true for reasonable choices of G.This condition is, in general, true for reasonable choices of G.

Page 60: Independent Component Analysis

Independent Component Analysis

FastICA Using Negentropy

Page 61: Independent Component Analysis

Clue From Gradient Algorithm

GradientAlgorithm

)]([ zwzw TgE

||||/ www

batch mode

)( zwzw Tg

||||/ www

On-line mode

Fixed-point iteration suggested:

)]([ zwzw TgE||||/ www

Nonpolynomial moments do not have the same nice algebraic properties as kurtosis. Such a iteration scheme is poor.

Page 62: Independent Component Analysis

Newton’s Method

)]([ zwTGEMaximize or minimize

1|||| 2wsubject to

Construct the Lagrangian as follows:

wwzwzw TTT GEL )]([)(

w

zw

w

zww

)()(

1

2

2 TT LL

Newton’s method finds an extreme point by letting:

Page 63: Independent Component Analysis

Newton’s Methodwwzwzw TTT GEL )]([)( wwzwzw TTT GEL )]([)(

w

zw

w

zww

)()(

1

2

2 TT LL

Newton’s method finds an extreme point of the by letting:

wzwzw

zw

)]([)( T

T

gEL

Izwzzw

zw

)]([

)(2

2TT

T

gEL

wzwzIzwzzw

)]([)]([1 TTT gEgE

Evaluate the Hessian matrix and its inverse

is time consuming.We want to

approximate it.

Page 64: Independent Component Analysis

Newton’s Methodwwzwzw TTT GEL )]([)( wwzwzw TTT GEL )]([)(

wzwzw

zw

)]([)( T

T

gEL

Izwzzw

zw

)]([

)(2

2TT

T

gEL

wzwzIzwzzw

)]([)]([1 TTT gEgE

Izwzwzzzwzz )]([)]([][)]([ TTTTT gEgEEgE

IzwIzwzz )]([)]([ TTT gEgE A diagonal matix

Page 65: Independent Component Analysis

Newton’s Methodwwzwzw TTT GEL )]([)( wwzwzw TTT GEL )]([)(

wzwzw

zw

)]([)( T

T

gEL

Izwzzw

zw

)]([

)(2

2TT

T

gEL

wzwzIzwzzw

)]([)]([1 TTT gEgE

IzwIzwzz )]([)]([ TTT gEgE A diagonal matix

1)]([)]([

zwwzwzw TT gEgE

Page 66: Independent Component Analysis

FastICA 1

)]([)]([

zwwzwzw TT gEgE 1)]([)]([

zwwzwzw TT gEgE

1)]([)]([

zwwzwzww TT gEgE

wzwzwzwwzw )]([)]([)]([

1 TTT gEgEgE

)]([)]([ zwzwzw TT gEgE

The algorithm:

||||/ www

Page 67: Independent Component Analysis

FastICA

)]([)]([ zwzwzw TT gEgE

||||/ www

1. Center the data to make mean zero.

2. Whiten the data to give z.

3. Choose the initial vector w of unit norm.

4.

5.

6. If not converged, go back to step 4.

Page 68: Independent Component Analysis

FastICA

)]([)]([ zwzwzw TT gEgE

||||/ www

1. Center the data to make mean zero.

2. Whiten the data to give z.

3. Choose the initial vector w of unit norm.

4.

5.

6. If not converged, go back to step 4.

xaa

xG 11

1 coshlog1

)(

]2/exp[)( 22 xxG

43 )( xxG

xaxg 11 tanh)(

]2/exp[)( 22 xxxg

33 4)( xxg

)tanh1()( 12

11 xaaxg

]2/exp[)1()( 222 xxxg

23 12)( xxg

Page 69: Independent Component Analysis

FastICAxa

axG 1

11 coshlog

1)(

]2/exp[)( 22 xxG

43 )( xxG

xaxg 11 tanh)(

]2/exp[)( 22 xxxg

33 4)( xxg

)tanh1()( 12

11 xaaxg

]2/exp[)1()( 222 xxxg

23 12)( xxg

-2

-1

0

1

2

3

-4 -3 -2 -1 0 1 2 3 4

1G

2G

3G

-3

-1

1

3

-4 -3 -2 -1 0 1 2 3 4cc

1g

2g

3g

-1

0

1

2

3

-4 -3 -2 -1 0 1 2 3 4

1g

2g

3g

Page 70: Independent Component Analysis

Estimating Several IC’s

Deflation Orthogonalization– Based on Gram-Schmidt Method

Symmetric Orthogonalization– Adjust vectors in parallel

Page 71: Independent Component Analysis

Deflation Orthogonalization

1. Center the data to make mean zero.

2. Whiten the data to give z.

3. Choose m, the number of IC’s to estimate, set counter p←1

4. Choose an initial vector wp of unit norm, randomly.

5.

6.

7.

8. If wp not converged, go back to step 5.

9. Set p← p +1, if p<m, go back to step 4.

)]([)]([ zwzwzw Tp

Tpp gEgE

||||/ ppp www

1

1)(

p

j jjTppp wwwww

Page 72: Independent Component Analysis

Symmetric Orthogonalization

1. Choose the number of independent components to estimate, say, m.

2. Initialize the wi, i=1,…,m.

3. Do an iteration of one-unit algorithm on every wi in paralle

l.

4. Do a symmetric orthogonalization of matrixW=(w1,… , wn).

5. If not converged, go back to step3.

Page 73: Independent Component Analysis

Symmetric Orthogonalization

Method 1: (Classic Method)

WWWW 2/1)( T

1. Let

2. Let

3. If WWT is not close enough to identity, go back to step

2.

Method 2: (Iteration Method)

||||/ WWW WWWWW T

21

23