28
Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition 朱朱朱 89/12/06

Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

  • Upload
    morse

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition. 朱國華 89/12/06. References. 簡仁宗、廖國鴻 ,“ 具有累進學習能力之貝氏預測法則在汽車語音辨識之應用” , ROCLING XIII, pp.179~197, 2000. - PowerPoint PPT Presentation

Citation preview

Page 1: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Bayesian Predictive Classification With Incremental Learning for Noisy Speech

Recognition

朱國華

89/12/06

Page 2: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

References

簡仁宗、廖國鴻 ,“ 具有累進學習能力之貝氏預測法則在汽車語音辨識之應用” , ROCLING XIII, pp.179~197, 2000. H. Jiang, K. Hirose and Q. Huo, “Robust Speech Recognition

Based on a Bayesian Prediction Approach”,IEEE Transaction on Speech and Audio Processing, Vol. 7, no. 4, pp. 426-440, July,1999.

J.T. Chien, “Online Hierarchical Transformation of Hidden Markov Models for Speech Recognition”, IEEE Transaction on Speech and Audio Processing, Vol. 7, no. 6, pp. 656-667, November 1999.

Page 3: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Contents Introduction Problem Formulation Some Decision Rules for ASR Transform-Based Bayesian Predictive

Classification(TBPC) Derivation of the Bayesian Predictive Likelihood

Measurement (BPLM) Online Prior Evolution(OPE) Experiments and Discussions

Page 4: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Introduction

Transform-based Bayesian predictive classification robustness decision rules for noisy speech recognition.

Online prior evolution to copy with the nonstationary testing environment (both for environmental and speaker’s variation).

Page 5: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Problem Formulation

Approximate MAP(Quasi Bayesian, QB) Estimation of ASR:

n : index of the input test utteranceW : word content or syllable string of input utternceη: acoustic transformation parameter(function)

n : ={X1,X2,…, Xn } be the i.i.d. and successively observed block samplesφ(n-1) : represents the estimated environmental statistic from previous input uttera

nce X1,X2,…, Xn-1 .

Page 6: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Problem Formulation(cont.)

Assume W and ηare independent, so the previous QB estimation can be rewritten as follow: {p(W) : Language Model}

Page 7: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Some Decision Rules for ASR

Plug-In MAP Rule: The performance of plug-in MAP decision rule de

pends on the choice of estimation approach (ML, MAP, discriminative training, etc.)., the nature and size of the training data, and the degree of the mismatch between training and testing conditions.

Point estimation.

Page 8: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Some Decision Rules for ASR(cont.)

Minimax Rule : Nonparametric Compensation. Minimizes the upper bound of the worst-case probability

of classification error.

Assume the unknown true parameter is a r. v. with uniform distribution in a neighborhood region .

Page 9: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

TBPC

Transform Bayesian Predictive Classification(TBPC) Rule:

where likelihood is obtained by:

Page 10: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

TBPC (cont.)

TBPC treat the transformed parameter as a random variable(not the point estimation).

The average is taken both with respect to the sampling variation in the expected testing data and with respect to the uncertainty described by the prior pdf p(Xn|W,).

TBPC can be applied both to supervised and unsupervised learning environment.

Page 11: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

TBPC (cont.)

Transformation-based Adaptation: For a given HMM model with L states and K

mixtures ={i}={ik,ik,rik}, i=1~L, k=1~K, the estimated transformation function G(n)() of the given testing utterance n is defined as :

where c is the index of the transformation cluster (hierarchical transformation).

Page 12: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Implementation: (Approach I) Considering the missing data problem, we use the Viterbi

TBPC for the likelihood :

Frame-synchronous Viterbi Bayesian search algorithm can be utilized to overcome the memory space and computation load(Jiang, IEEE SAP 1999).

TBPC (cont.)

Page 13: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Implementation: (Approach I cont.)• In Jiang ( IEEE SAP 1999), they only considered the

uncertainty of the mean vectors of CDHMM with diagonal covariance matrices and assume they are uniformly distributed in a neighborhood of pretrained means(no online adaptation).

TBPC (cont.)

Page 14: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Implementation: (Approach II) Bayesian Predictive Density Based Model

Compensation(BP-MC) of the K mixture state observation pdf is :

• where f(xt(n)|ik) is the Bayesian predictive density and is defined below:

TBPC (cont.)

Page 15: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Implementation: (Approach II cont.) The choice of prior pdf:

• In Chien (RocLing 2000), he adopted the multivariate Gaussian pdf which is based on the conjugate prior of statistical.

TBPC (cont.)

Page 16: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Derivation of the BPLM

Since p(xt(n)|ik,c) and p(c|c

(n-1)) are both Gaussian, we can derivate the f(xt

(n)|ik) : (assume both kc and rik are diagonal precision matrix)

Page 17: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Viterbi Approach:

where (sn*,ln*) is the most likely state

and mixture sequence corresponding to Xn, respectively.

Online Prior Evolution

Page 18: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

The parameter statistics of the c th cluster are:

Online Prior Evolution (cont.)

Page 19: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Where

From above derivation, we can online adapted(learning) the c

(n-1) from c(n-1).

Online Prior Evolution (cont.)

Page 20: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

We can estimate the initial parameter c

(0) from the prior given training data.

Online Prior Evolution (cont.)

Page 21: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Experiments Training and testing data set I: (Mic1,clean)

70 males and 70 females , each person records 10 continuous Mandarin digit sentence. 50 males’ + 50 females’ utterance are for training, the other 20 males’ and females’ are for testing.

Training and testing data set II: (Mic2,noisy) 2 males+2 females in Toyota Corolla 1.8 3 males+3females in Nissan Sentra 1.6 Each speaker records individually 10 sentences in idle speed, 20 sentences in 50km speed, and 30 sentences in 90 km speed . Arbitrary choose 5 sentences for training, others for testing.

Page 22: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Experiments (cont.)

Signal to Noise Ratio :SNR (dB)

Sentra Corolla Average0km 5.63 10.3 7.9650km -6.53 0.34 -3.190km -10.14 -3.77 -6.96clean 25.1

Page 23: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Experiments (cont.) Recognizer Structure

Features : 12 order LPC derived cepstrum and -cepstrum plus and log energy.

HMM Model : 7 states and 4 mixtures for each digit model, plus 3 different single state background noise model.

Page 24: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Experiments (cont.)

Baseline results:

Test sentence Digit error rate (DER)

Clean 10.60km 25.6150km 54.9790km 62.33

Page 25: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Experiments (cont.)

Supervised DER corresponding to the number of training data.

Page 26: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Experiments (cont.) Unsupervised TBPC-OPE DER . (parentheses means the % improvement)

In 2 clusters case , 10 digits are one cluster , and 3 background noise model are the other.

1 cluster 2 cluster0km 14.32(44.0) 12.53(51.0)50km 39.94(27.3) 36.24(34.0)90km 51.65(17.1) 46.32(25.6)

Page 27: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Experiments (cont.) Unsupervised Performance Comparison

of different BPC approaches:Baseline Jiang

(‘99 IEEE SAP)Surendran

(1998 ICASSP)TBPE-OPE (ROCLing 2000)

Clean 10.6 8.51(19.2) 8.4(20.7) 7.47(29.5)0km 25.6 18.6(27.3) 15.43(39.7) 12.53(51.0)50km 55 49.83(9.4) 38.38(30.2) 36.24(34.1)90km 62.3 60.25(3.3) 50.91(18.3) 46.32(25.6)

Page 28: Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Discussions Jiang’s results are the worst among all because of the fixed prior distribution. Surendran’s results are worse than TBPC-OPE because the adaptation of prior pdf is just count on the current input utterance but not the accumulated ones. We can also adjust the weight (Dirichlet dist.) and variance (Wishart dist.) with the mean at the same time of the HMM model of the BP-

MC approach.