20
Linear Prediction Analysis of Speech Sounds References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6 3. J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247 Berlin Chen 2003

SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

  • Upload
    buihanh

  • View
    233

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

Linear Prediction Analysis of Speech Sounds

References:1. X. Huang et. al., Spoken Language Processing, Chapters 5, 62. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-63. J. W. Picone, “Signal modeling techniques in speech recognition,”

proceedings of the IEEE, September 1993, pp. 1215-1247

Berlin Chen 2003

Page 2: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

2

Linear Predictive Coefficients (LPC)

• An all-pole filter with a sufficient number of poles is a good approximation to model the vocal tract (filter) for speech signals

– It predicts the current sample as a linear combination of its several past samples

• Linear predictive coding, LPC analysis, auto-regressive modeling

Vocal Tract Parameterspaaa ,...,, 21

( ) ( )( ) ( )

[ ] [ ] [ ]

[ ] [ ]∑

=

=

=

−=

+−=∴

=−

==

p

kk

p

kk

p

k

kk

knxanx

neknxanx

zAzazEzXzH

1

1

1

~

1

1

1

[ ]nx [ ]ne

Page 3: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

3

Short-Term Analysis: Algebra Approach

• Estimate the corresponding LPC coefficients as those that minimize the total short-term prediction error (minimum mean squared error)

[ ] [ ] [ ]( )

[ ] [ ]∑ ∑

∑∑

−−=

−==

=n

p

jmjm

nmm

nmm

jnxanx

nxnxneE

2

1

22

~

Framing/Windowing, The total short-term

prediction errorfor a specific frame m

[ ] [ ]

[ ] [ ] [ ]

[ ] [ ]{ } 1 ,0

1 ,0

1 ,0

1

2

1

piinxne

piinxjnxanx

pia

jnxanx

aE

nmm

nm

p

jmjm

i

n

p

jmjm

i

m

≤≤∀=−

≤≤∀=

−−

≤≤∀=∂

−−∂

=∂∂

∑ ∑

∑ ∑

=

=

10 , −≤≤ Nn

Take the derivative

[ ]ne m The error vector is orthogonalto the past vectors

This property will be used later on!

Page 4: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

4

Short-Term Analysis: Algebra Approach

[ ] [ ] [ ]

[ ] [ ] [ ] [ ][ ]

[ ] [ ][ ] [ ] [ ][ ]∑∑ ∑

∑∑ ∑

∑ ∑

≤≤∀−=−−⇒

≤≤∀−=

−−⇒

≤≤∀=

−−

=

=

=

nmm

p

j nmmj

nmm

n

p

jmmj

nm

p

jmjm

pinxinxjnxinxa

pinxinxjnxinxa

piinxjnxanx

1 ,

1 ,

1 0,

1

1

1

ΨΦa =⇒

[ ] [ ] [ ][ ]∑ −−=n

mmm jnxinxji, :tscoefficienn correlatio Define

φ

[ ] [ ] 1 , 0, , 1

piijia m

p

jmj ≤≤∀=⇒ ∑

=φφ

To be used in next page !

i

m

aE∂∂

Page 5: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

5

Short-Term Analysis: Algebra Approach

• The minimum error for the optimal, pja j ≤≤1 ,

[ ] [ ] [ ]( ) [ ] [ ]

[ ] [ ] [ ] [ ] [ ]∑ ∑ ∑ ∑∑∑

∑ ∑∑∑

−−+

−−=

−−=−==

= ==

=

n n

p

j

p

kmkmj

p

jmjm

nm

n

p

jmjm

nmm

nmm

knxajnxajnxanxnx

jnxanxnxnxneE

1 11

2

2

1

22

2

~

[ ] [ ]

[ ] [ ]( )

[ ] [ ]∑ ∑

∑ ∑ ∑

∑ ∑ ∑

=

= =

= =

−=

−−=

−−

p

j nmmj

p

j

p

k nmmkj

n

p

j

p

kmkmj

nxjnxa

knxjnxaa

knxajnxa

1

1 1

1 1

[ ] [ ] [ ]( )

[ ] [ ]∑

∑ ∑∑

=

=

−=

−−=

p

jmjm

p

j nmmj

nmm

ja

jnxnxanxE

1

1

2

,00,0

φφ

equal

Total Prediction ErrorThe error can be monitored tohelp establish p

Use the property derived in the previous page !

Page 6: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

6

Short-Term Analysis: Geometric Approach

• Vector Representations of Error and Speech Signals

pi

,

≤≤∀

=

1 ,

0im mxe

The prediction error vector must be orthogonal to the past vectors

[ ] [ ] [ ] 10 , 1

N-n neknxanx m

p

kmkm ≤≤+−= ∑

=

[ ] [ ] [ ][ ] [ ] [ ]

[ ] [ ] [ ]

[ ][ ]

[ ]

[ ][ ]

[ ]

=

+

−−−−−−

−−−−−−

1...10

1...10

.

.

.

1...2111..................

1...2111...21

2

1

Nx

xx

Ne

ee

a

aa

pNxNxNx

pxxxpxxx

m

m

m

m

m

m

pmmm

mmm

mmm

[ ] [ ] [ ]( )[ ] [ ] [ ]( )iNxixix

Neee

mmmT

mmmT

-1-,...,-1,-

1-,...,1,0

=

=im

mxe

mea[ ]( )P

mmm xxxX .... 21=mx

( )

( ) mTT

mTT

mT

mT

m

mm

xXXXa

xXXaX0XaxX

0eXexeXa

1

if minimal is

−=⇒

=⇒

=−⇒

=

=+

This property has been shown previously

the past vectors

are as column vectors

Page 7: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

7

Short-Term Analysis: Autocorrelation Method

• is identically zero outside 0 ≤ n ≤ N-1• The mean-squared error is calculated within n=0~N-1+p

[ ]nxm

[ ] [ ] [ ] −≤≤+

=otherwise ,0

1n0 ,

NnwmLnxnxm

L: Frame Period , the lengthof time between successiveframes

[ ]nx

mL+N-1

[ ] [ ]mLnxnx m +=~

mL0shift

N-1

N-10

0

Framing/Windowing

[ ] [ ] [ ]nwnxnx mm~=

Page 8: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

8

Short-Term Analysis: Autocorrelation Method

• The mean-squared error will be:

[ ] [ ] [ ]( )∑∑+−

=

+−

=

−==pN

nmm

pN

nmm nxnxneE

1

0

21

0

2 ~

Why?

N+P-10 N-1 N+P-1 0 N-1

[ ]nx m

[ ]nem

[ ] [ ]

[ ] [ ] [ ]

[ ] [ ]

[ ] ( )[ ]( )∑

−−−

=

+−

=

−+

=

=

−+=

−−=

−−=

≤≤∀=⇒

jiN

nmm

jN

inmm

pN

nmmm

m

p

jmj

jinxnx

jnxinx

jnxinxji

piijia

1

0

1

1

0

1

,

1 , 0, ,

φ

φφ

j N-1+j

i N-1+i

[ ]jnx m −

[ ]inx m − N-1+p0

0

N-1

[ ]nx m

0

Take the derivative: i

m

aE∂∂

Page 9: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

9

Short-Term Analysis: Autocorrelation Method

• Alternatively, – Where is the autocorrelation function of

– And

• Therefore:

[ ] [ ]jiRjim −=,φ

[ ] [ ] [ ]∑−−

=+=

kN

nmmm knxnxkR

1

0

[ ]nx m

[ ] [ ]kRkR mm −=

[ ] [ ]

[ ] [ ] 1 ,

1 , 0, ,

1

1

pikRkRa

piijia

m

p

jmj

m

p

jmj

≤≤∀=⇒

≤≤∀=

=

=φφ

[ ] [ ] [ ][ ] [ ] [ ]

[ ] [ ] [ ]

[ ][ ]

[ ]

=

−−

−−

pR

RR

a

aa

RPxPR

pRRRpRRR

m

m

m

pmmm

mmm

mmm

.

.

.21

.

.

.

0...21..................

2...011...10

2

1A Toeplitz Matrix:symmetric and all elementsof the diagonal are equal

Why?

Page 10: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

10

Short-Term Analysis: Autocorrelation Method

• Levinson-Durbin Recursion1.Initialization

2. Iteration. For i=1…,p do the following recursion

3. Final Solution:

( ) [ ]00 mRE =

( )[ ] ( ) [ ]

( )1

11

1

−−−=

∑−

=

iE

jiRiaiRik

mi

jjm

( ) ( )[ ]( ) ( ) ( ) 11- where,11 2 ≤≤−−= ikiEikiE

( ) ( ) ( ) ( ) 11for ,11 i-jiaikiaia jijj ≤≤−−−= −

( ) ( )ikia i =

( ) pjpaa jj ≤≤= 1for

A new, higher order coefficient is produced at each iteration i

Page 11: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

11

Short-Term Analysis: Covariance Method

• is not identically zero outside 0 ≤ n ≤ N-1– Window function is not applied

• The mean-squared error is calculated within n=0~N-1

• The mean-squared error will be:

[ ]nxm

[ ]nx

mL+N-1

[ ] [ ]mLnxnx m +=

mL0shift

N-10

[ ] [ ] [ ]( )∑∑−

=

=

−==1

0

21

0

2 ~ N

nmm

N

nmm nxnxneE

Page 12: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

12

Short-Term Analysis: Covariance Method

[ ] [ ]

[ ] [ ] [ ]

[ ] [ ]

[ ] ( )[ ]∑

−−

−=

=

=

=

−+=

−−=

−−=

≤≤∀=⇒

iN

inmm

N

nmm

N

nmmm

m

p

jmj

jinxnx

jnxinx

jnxinxji

piijia

1

1

0

1

0

1

,

1 , 0, ,

φ

φφ

j N-1+j

i N-1+i

[ ]jnxm −

[ ]inxm −

N-1

N-1

0

0

[ ] [ ] Piijia mP

jmj ≤≤∀=∑

=1 , 0, ,

1φφ

[ ] [ ] [ ][ ] [ ] [ ]

[ ] [ ] [ ]

[ ][ ]

[ ]

=

0,...

0,20,1

.

.

.

,...2,1,..................,2...2,21,2,1...2,11,1

2

1

pa

aa

pppp

pp

m

m

m

Pmmm

mmm

mmm

φ

φφ

φφφ

φφφφφφ Not A Toeplitz Matrix:

symmetric and but not all elementsof the diagonal are equal

[ ] [ ] [ ]ppmmm ,...2,21,1 φφφ ≠≠

Take the derivative: i

m

aE∂∂

Page 13: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

13

( ) ( )jwjw eHGeH ⋅=′

LPC Spectra

• LPC spectrum matches more closely the peaks than the valleys

– Because the regions where contribute more to the error than those where

[ ] ( ) ( )( )

ωπ

ωπ

π

π ω

ωπ

π

ω deH

eXGdeEneE

j

jm

pN

n

jmmm ∫∑ ∫ −

+−

=−

==== 2

21

0

222

21

21

( ) ( )ωω jjm eHeX >

( ) ( )ωω jm

j eXeH >

Parseval’s theorem

Page 14: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

14

LPC Spectra

• LPC provides estimate of a gross shape of theshort-term spectrum

Order = 6 Order = 14

Order = 24 Order = 128

Page 15: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

15

LPC Prediction Errors

Page 16: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

16

MFCC vs. LPC Cepstrum Coefficients

• MFCC outperforms LPC Cepstrum coefficients– Perceptually motivated mel-scale representation indeed helps

recognition

• Higher-order MFCC does not further reduce the error rate in comparison with the 13-order MFCC

• Another perceptually motivated features such as first- and second-order delta features can significantly reduce the recognition errors

Page 17: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

17

Description of Project-2 Fall 2003

• Try to implement the short-term linear prediction coding (LPC) for speech signals

• You should follow the following instructions:

1. Using the autocorrelation method with Levinson-Durbin Recursion and Rectangular/Hamming windowing

2. Analyzing the vowel (or FINAL) portions of speech signal withdifferent model orders (different P, e.g. P=6, 14, 24 and 128)

3. Plotting the LPC spectra as well as the original speech spectrum

4. Using the speech wave file, bk6_1.wav (no header, PCM 16KHz raw data), as the exemplar

Page 18: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

18

Description of Project-2 Fall 2003

• Hints:1. When the LPC coefficients aj are derived, you can construct

impulse response signal h[n], 0≤n ≤ N-1 (N: frame size) by:

2. The prediction Error E can be expressed by the autocorrelation function:

Page 19: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

19

Description of Project-2 Fall 2003

3. The MATLab example code:

x=[184.6400 184.1251 . . . . . . . 197.7890 -26.8000 ]; % original signal, dimension: frame sizey=[1.0000 2.0105 . . . . . . . 0.0738 0.0565 ]; % filter's impulse response h[n], dimension: frame sizegain=valG; % valG: the prediction Error EX=fft(x,512); % fast Fourier Transform, so the frame size < 512Y=fft(y,512); % fast Fourier TransformX(1)=[]; % remove the X(1), the DC Y(1)=[]; % remove the Y(1), the DC M=512; powerX=abs(X(1:M/2)).^2; % the power spectrum of XlogPX=10*log(powerX); % the power spectrum of X in dBpowerY=abs(Y(1:M/2)).^2; % the power spectrum of YlogPY=10*log(powerY)+10*log(gain); % the power spectrum of Y in dB

% plus the gain (Error) in dBnyquist=8000; % maximal frequency indexfreq=(1:M/2)/(M/2)*nyquist; % an array store the frequency indicesfigure(1);

plot(freq,logPX,'b',freq,logPY,'r'); % plot the result,% b: blue line for the power spectrum of the original signal% r: red line for the power spectrum of the filter

Page 20: SP2003F Lecture06 Linear Prediction Analysis of Speech ...berlin.csie.ntnu.edu.tw/PastCourses/2003F-SpeechSignalProcessing... · Linear Prediction Analysis of Speech Sounds References:

20

Description of Project-2 Fall 2003

• Example Figures of LPC Spectra

Order = 6 Rectangle windowNo pre-emphasis

Order = 14 Rectangle windowNo pre-emphasis

Order = 24 Rectangle windowNo pre-emphasis

Order = 128 Rectangle windowNo pre-emphasis

Order = 128Rectangle windowPre-emphasis

Order = 128 Hamming windowPre-emphasis

Order = 128 Hamming windowNo pre-emphasis Plotted by Roger Kuo, Fall 2002