22
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Under standing, 2007. ASRU. IEEE Workshop on Author : Liang-che Sun , Chang-wen Hsu, and Lin-shan L ee Professor : 陳陳陳 Reporter : 陳陳陳

MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

  • View
    230

  • Download
    3

Embed Size (px)

Citation preview

Page 1: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION

Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop onAuthor : Liang-che Sun , Chang-wen Hsu, and Lin-shan LeeProfessor : 陳嘉平Reporter : 楊治鏞

Page 2: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

Introduction

The performance of speech recognition systems is very often degraded due to the mismatch between the acoustic conditions of the training and testing environments.

In this paper, we propose a new approach for modulation spectrum equalization in which the modulation spectra of noisy speech utterances are equalized to those of clean speech.

Page 3: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

Introduction

The first is to equalize the cumulative density functions (CDFs) of the modulation spectra of clean and noisy speech, such that the differences between them are reduced.

The second is to equalize the magnitude ratio of lower to higher components in the modulation spectrum.

Page 4: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

Modulation spectrum (1/2)

Given a sequence of feature vectors for an utterance, each including D feature parameters,

where n is the time index, and is the parameter index.

{ ( ), 1, 2,..., }x n n N

( ) [ ( ,1), ( , 2),..., ( , )] , 1,...,Tx n x n x n x n D n N

1,...,d D

Page 5: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author
Page 6: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

Modulation spectrum (2/2)

The modulation spectrum of the -th time trajectory can be obtained by applying discrete Fourier transform:

( )dY k d

1

0

( ) ( ) exp( 2 / )N

d dn

Y k y n j nk N

1,2,...,d D0,1,2,..., 1;k N

Page 7: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

Spectral Histogram Equalization

We first calculate the cumulative distribution function (CDF) of the magnitudes of the modulation spectra, , for all utterances in the clean training data of AURORA 2 to be used as the reference CDF, .

For any test utterance, the CDF for its modulation spectrum magnitude, , can be similarly obtained as .

( )dY k

refCDF [ ]

, ( )d testY k

testCDF [ ]

Page 8: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author
Page 9: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

Spectral Histogram Equalization

Hence the equalized magnitude of modulation spectrum is

1, ,

ˆ ( ) ( [ ( ) ])d test ref test d testY k CDF CDF Y k

,ˆ ( )d testY k

Page 10: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

Magnitude Ratio Equalization

We first define a magnitude ratio (MR) for lower to higher frequency components for each parameter index as follows:

where is the cut-off frequency used here, is the order of the discrete Fourier transform.

d

0

[ ] 12

0

( )

( )

ck

dk

d N

dk

Y kMR

Y k

ck N

Page 11: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author
Page 12: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

Magnitude Ratio

We can observe from this figure that the mean value of is degraded when SNR is degraded, and thus is highly correlated with SNR.

It is therefore reasonable to equalize the value of for a noisy utterance to a reference

value obtained from clean training data.

dMR

dMR

dMR

dMR

Page 13: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

Magnitude Ratio Equalization

We first calculate the average of for all utterances in the clean training data of AURORA 2 as the reference value .

We then calculate the value of for each test utterance as .

dMR

,d refMR

dMR,d testMR

Page 14: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

Magnitude Ratio Equalization

We equalize the magnitude of the modulation spectrum for the test utterance as

where is the weighted-power for the scaling factor.

, ( )d testY k

0 1p

,, c

,

,, c

, (1 )

,

( ) ( ) ,k k

ˆ ( ) 1( ) ,k>k

( )

d ref pd test

d test

d testd test

d ref p

d test

MRY k

MR

Y kY k

MR

MR

Page 15: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

EXPERIMENTAL SETUP

AURORA 2 The speech features were extracted by the

AURORA WI007 front-end. Figure 4

Page 16: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author
Page 17: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author
Page 18: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author
Page 19: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author
Page 20: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author
Page 21: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author

Integration of MRE with Other FeatureNormalization Techniques

We only consider MRE here because the additional improvements obtainable with SHE+MRE as shown in Table 1 were found to be limited, and indeed involved much higher computational costs.

Page 22: MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author