Upload
buihanh
View
233
Download
0
Embed Size (px)
Citation preview
Linear Prediction Analysis of Speech Sounds
References:1. X. Huang et. al., Spoken Language Processing, Chapters 5, 62. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-63. J. W. Picone, “Signal modeling techniques in speech recognition,”
proceedings of the IEEE, September 1993, pp. 1215-1247
Berlin Chen 2003
2
Linear Predictive Coefficients (LPC)
• An all-pole filter with a sufficient number of poles is a good approximation to model the vocal tract (filter) for speech signals
– It predicts the current sample as a linear combination of its several past samples
• Linear predictive coding, LPC analysis, auto-regressive modeling
Vocal Tract Parameterspaaa ,...,, 21
( ) ( )( ) ( )
[ ] [ ] [ ]
[ ] [ ]∑
∑
∑
=
=
=
−
−=
+−=∴
=−
==
p
kk
p
kk
p
k
kk
knxanx
neknxanx
zAzazEzXzH
1
1
1
~
1
1
1
[ ]nx [ ]ne
3
Short-Term Analysis: Algebra Approach
• Estimate the corresponding LPC coefficients as those that minimize the total short-term prediction error (minimum mean squared error)
[ ] [ ] [ ]( )
[ ] [ ]∑ ∑
∑∑
−−=
−==
=n
p
jmjm
nmm
nmm
jnxanx
nxnxneE
2
1
22
~
Framing/Windowing, The total short-term
prediction errorfor a specific frame m
[ ] [ ]
[ ] [ ] [ ]
[ ] [ ]{ } 1 ,0
1 ,0
1 ,0
1
2
1
piinxne
piinxjnxanx
pia
jnxanx
aE
nmm
nm
p
jmjm
i
n
p
jmjm
i
m
≤≤∀=−
≤≤∀=
−
−−
≤≤∀=∂
−−∂
=∂∂
∑
∑ ∑
∑ ∑
=
=
10 , −≤≤ Nn
Take the derivative
[ ]ne m The error vector is orthogonalto the past vectors
This property will be used later on!
4
Short-Term Analysis: Algebra Approach
[ ] [ ] [ ]
[ ] [ ] [ ] [ ][ ]
[ ] [ ][ ] [ ] [ ][ ]∑∑ ∑
∑∑ ∑
∑ ∑
≤≤∀−=−−⇒
≤≤∀−=
−−⇒
≤≤∀=
−
−−
=
=
=
nmm
p
j nmmj
nmm
n
p
jmmj
nm
p
jmjm
pinxinxjnxinxa
pinxinxjnxinxa
piinxjnxanx
1 ,
1 ,
1 0,
1
1
1
ΨΦa =⇒
[ ] [ ] [ ][ ]∑ −−=n
mmm jnxinxji, :tscoefficienn correlatio Define
φ
[ ] [ ] 1 , 0, , 1
piijia m
p
jmj ≤≤∀=⇒ ∑
=φφ
To be used in next page !
i
m
aE∂∂
5
Short-Term Analysis: Algebra Approach
• The minimum error for the optimal, pja j ≤≤1 ,
[ ] [ ] [ ]( ) [ ] [ ]
[ ] [ ] [ ] [ ] [ ]∑ ∑ ∑ ∑∑∑
∑ ∑∑∑
−−+
−−=
−−=−==
= ==
=
n n
p
j
p
kmkmj
p
jmjm
nm
n
p
jmjm
nmm
nmm
knxajnxajnxanxnx
jnxanxnxnxneE
1 11
2
2
1
22
2
~
[ ] [ ]
[ ] [ ]( )
[ ] [ ]∑ ∑
∑ ∑ ∑
∑ ∑ ∑
=
= =
= =
−=
−−=
−−
p
j nmmj
p
j
p
k nmmkj
n
p
j
p
kmkmj
nxjnxa
knxjnxaa
knxajnxa
1
1 1
1 1
[ ] [ ] [ ]( )
[ ] [ ]∑
∑ ∑∑
=
=
−=
−−=
p
jmjm
p
j nmmj
nmm
ja
jnxnxanxE
1
1
2
,00,0
φφ
equal
Total Prediction ErrorThe error can be monitored tohelp establish p
Use the property derived in the previous page !
6
Short-Term Analysis: Geometric Approach
• Vector Representations of Error and Speech Signals
pi
,
≤≤∀
=
1 ,
0im mxe
The prediction error vector must be orthogonal to the past vectors
[ ] [ ] [ ] 10 , 1
N-n neknxanx m
p
kmkm ≤≤+−= ∑
=
[ ] [ ] [ ][ ] [ ] [ ]
[ ] [ ] [ ]
[ ][ ]
[ ]
[ ][ ]
[ ]
−
=
−
+
−−−−−−
−−−−−−
1...10
1...10
.
.
.
1...2111..................
1...2111...21
2
1
Nx
xx
Ne
ee
a
aa
pNxNxNx
pxxxpxxx
m
m
m
m
m
m
pmmm
mmm
mmm
[ ] [ ] [ ]( )[ ] [ ] [ ]( )iNxixix
Neee
mmmT
mmmT
-1-,...,-1,-
1-,...,1,0
=
=im
mxe
mea[ ]( )P
mmm xxxX .... 21=mx
( )
( ) mTT
mTT
mT
mT
m
mm
xXXXa
xXXaX0XaxX
0eXexeXa
1
if minimal is
−=⇒
=⇒
=−⇒
=
=+
This property has been shown previously
the past vectors
are as column vectors
7
Short-Term Analysis: Autocorrelation Method
• is identically zero outside 0 ≤ n ≤ N-1• The mean-squared error is calculated within n=0~N-1+p
[ ]nxm
[ ] [ ] [ ] −≤≤+
=otherwise ,0
1n0 ,
NnwmLnxnxm
L: Frame Period , the lengthof time between successiveframes
[ ]nx
mL+N-1
[ ] [ ]mLnxnx m +=~
mL0shift
N-1
N-10
0
Framing/Windowing
[ ] [ ] [ ]nwnxnx mm~=
8
Short-Term Analysis: Autocorrelation Method
• The mean-squared error will be:
[ ] [ ] [ ]( )∑∑+−
=
+−
=
−==pN
nmm
pN
nmm nxnxneE
1
0
21
0
2 ~
Why?
N+P-10 N-1 N+P-1 0 N-1
[ ]nx m
[ ]nem
[ ] [ ]
[ ] [ ] [ ]
[ ] [ ]
[ ] ( )[ ]( )∑
∑
∑
∑
−−−
=
+−
=
−+
=
=
−+=
−−=
−−=
≤≤∀=⇒
jiN
nmm
jN
inmm
pN
nmmm
m
p
jmj
jinxnx
jnxinx
jnxinxji
piijia
1
0
1
1
0
1
,
1 , 0, ,
φ
φφ
j N-1+j
i N-1+i
[ ]jnx m −
[ ]inx m − N-1+p0
0
N-1
[ ]nx m
0
Take the derivative: i
m
aE∂∂
9
Short-Term Analysis: Autocorrelation Method
• Alternatively, – Where is the autocorrelation function of
– And
• Therefore:
[ ] [ ]jiRjim −=,φ
[ ] [ ] [ ]∑−−
=+=
kN
nmmm knxnxkR
1
0
[ ]nx m
[ ] [ ]kRkR mm −=
[ ] [ ]
[ ] [ ] 1 ,
1 , 0, ,
1
1
pikRkRa
piijia
m
p
jmj
m
p
jmj
≤≤∀=⇒
≤≤∀=
∑
∑
=
=φφ
[ ] [ ] [ ][ ] [ ] [ ]
[ ] [ ] [ ]
[ ][ ]
[ ]
=
−−
−−
pR
RR
a
aa
RPxPR
pRRRpRRR
m
m
m
pmmm
mmm
mmm
.
.
.21
.
.
.
0...21..................
2...011...10
2
1A Toeplitz Matrix:symmetric and all elementsof the diagonal are equal
Why?
10
Short-Term Analysis: Autocorrelation Method
• Levinson-Durbin Recursion1.Initialization
2. Iteration. For i=1…,p do the following recursion
3. Final Solution:
( ) [ ]00 mRE =
( )[ ] ( ) [ ]
( )1
11
1
−
−−−=
∑−
=
iE
jiRiaiRik
mi
jjm
( ) ( )[ ]( ) ( ) ( ) 11- where,11 2 ≤≤−−= ikiEikiE
( ) ( ) ( ) ( ) 11for ,11 i-jiaikiaia jijj ≤≤−−−= −
( ) ( )ikia i =
( ) pjpaa jj ≤≤= 1for
A new, higher order coefficient is produced at each iteration i
11
Short-Term Analysis: Covariance Method
• is not identically zero outside 0 ≤ n ≤ N-1– Window function is not applied
• The mean-squared error is calculated within n=0~N-1
• The mean-squared error will be:
[ ]nxm
[ ]nx
mL+N-1
[ ] [ ]mLnxnx m +=
mL0shift
N-10
[ ] [ ] [ ]( )∑∑−
=
−
=
−==1
0
21
0
2 ~ N
nmm
N
nmm nxnxneE
12
Short-Term Analysis: Covariance Method
[ ] [ ]
[ ] [ ] [ ]
[ ] [ ]
[ ] ( )[ ]∑
∑
∑
∑
−−
−=
−
=
−
=
=
−+=
−−=
−−=
≤≤∀=⇒
iN
inmm
N
nmm
N
nmmm
m
p
jmj
jinxnx
jnxinx
jnxinxji
piijia
1
1
0
1
0
1
,
1 , 0, ,
φ
φφ
j N-1+j
i N-1+i
[ ]jnxm −
[ ]inxm −
N-1
N-1
0
0
[ ] [ ] Piijia mP
jmj ≤≤∀=∑
=1 , 0, ,
1φφ
[ ] [ ] [ ][ ] [ ] [ ]
[ ] [ ] [ ]
[ ][ ]
[ ]
=
0,...
0,20,1
.
.
.
,...2,1,..................,2...2,21,2,1...2,11,1
2
1
pa
aa
pppp
pp
m
m
m
Pmmm
mmm
mmm
φ
φφ
φφφ
φφφφφφ Not A Toeplitz Matrix:
symmetric and but not all elementsof the diagonal are equal
[ ] [ ] [ ]ppmmm ,...2,21,1 φφφ ≠≠
Take the derivative: i
m
aE∂∂
13
( ) ( )jwjw eHGeH ⋅=′
LPC Spectra
• LPC spectrum matches more closely the peaks than the valleys
– Because the regions where contribute more to the error than those where
[ ] ( ) ( )( )
ωπ
ωπ
π
π ω
ωπ
π
ω deH
eXGdeEneE
j
jm
pN
n
jmmm ∫∑ ∫ −
+−
=−
==== 2
21
0
222
21
21
( ) ( )ωω jjm eHeX >
( ) ( )ωω jm
j eXeH >
Parseval’s theorem
14
LPC Spectra
• LPC provides estimate of a gross shape of theshort-term spectrum
Order = 6 Order = 14
Order = 24 Order = 128
15
LPC Prediction Errors
16
MFCC vs. LPC Cepstrum Coefficients
• MFCC outperforms LPC Cepstrum coefficients– Perceptually motivated mel-scale representation indeed helps
recognition
• Higher-order MFCC does not further reduce the error rate in comparison with the 13-order MFCC
• Another perceptually motivated features such as first- and second-order delta features can significantly reduce the recognition errors
17
Description of Project-2 Fall 2003
• Try to implement the short-term linear prediction coding (LPC) for speech signals
• You should follow the following instructions:
1. Using the autocorrelation method with Levinson-Durbin Recursion and Rectangular/Hamming windowing
2. Analyzing the vowel (or FINAL) portions of speech signal withdifferent model orders (different P, e.g. P=6, 14, 24 and 128)
3. Plotting the LPC spectra as well as the original speech spectrum
4. Using the speech wave file, bk6_1.wav (no header, PCM 16KHz raw data), as the exemplar
18
Description of Project-2 Fall 2003
• Hints:1. When the LPC coefficients aj are derived, you can construct
impulse response signal h[n], 0≤n ≤ N-1 (N: frame size) by:
2. The prediction Error E can be expressed by the autocorrelation function:
19
Description of Project-2 Fall 2003
3. The MATLab example code:
x=[184.6400 184.1251 . . . . . . . 197.7890 -26.8000 ]; % original signal, dimension: frame sizey=[1.0000 2.0105 . . . . . . . 0.0738 0.0565 ]; % filter's impulse response h[n], dimension: frame sizegain=valG; % valG: the prediction Error EX=fft(x,512); % fast Fourier Transform, so the frame size < 512Y=fft(y,512); % fast Fourier TransformX(1)=[]; % remove the X(1), the DC Y(1)=[]; % remove the Y(1), the DC M=512; powerX=abs(X(1:M/2)).^2; % the power spectrum of XlogPX=10*log(powerX); % the power spectrum of X in dBpowerY=abs(Y(1:M/2)).^2; % the power spectrum of YlogPY=10*log(powerY)+10*log(gain); % the power spectrum of Y in dB
% plus the gain (Error) in dBnyquist=8000; % maximal frequency indexfreq=(1:M/2)/(M/2)*nyquist; % an array store the frequency indicesfigure(1);
plot(freq,logPX,'b',freq,logPY,'r'); % plot the result,% b: blue line for the power spectrum of the original signal% r: red line for the power spectrum of the filter
20
Description of Project-2 Fall 2003
• Example Figures of LPC Spectra
Order = 6 Rectangle windowNo pre-emphasis
Order = 14 Rectangle windowNo pre-emphasis
Order = 24 Rectangle windowNo pre-emphasis
Order = 128 Rectangle windowNo pre-emphasis
Order = 128Rectangle windowPre-emphasis
Order = 128 Hamming windowPre-emphasis
Order = 128 Hamming windowNo pre-emphasis Plotted by Roger Kuo, Fall 2002