指導教授：陳柏琳博士研究生：陳鴻彬

On the Study of Energy-Based Approaches for Speech Feature Normalization and Apply to Voice Active Detection

(FES )I (SLEN I)II (SLEN II)I (LEDRN I)II (LEDRN II)

(FES )

(FLES )K

I (SLEN I) (VAD)

II (SLEN II)III

I (LEDRN I)

II (LEDRN II)

I(LER I) I(Quantile)

I(LER I)

M

I(LER I)(M)100

I(LER I)

II(LER II)I

II()(Least Squares Regression)

(SPW)(vowel)(sonorant consonant)(fricative)(glide)(nasal)

(SPW)(autocorrelation function)

(SPW)(Non-Casual Auto Regression Moving Average)

Aurora-2()Set ASet BSet C Set ASet BSet C(Front-End processing)12(Log Energy) (Back-end recognizer)HTK

I I (=100) I

Set ASet BSet CAverageClean trainingTrain Set+ Test Set74.36 76.72 63.83 73.20 Test Set Only72.20 75.45 60.58 71.18 Multi trainingTrain Set+ Test Set86.31 86.27 81.22 85.28 Test Set Only77.76 81.98 69.86 77.87

I50705001000 I

Clean trainingMulti trainingScale SizeSet ASet BSet CAverageSet ASet BSet CAverageBaseLine58.94 58.48 59.97 58.96 85.22 83.99 80.67 83.82 M = 5074.10 76.71 63.07 72.94 86.33 86.25 81.04 85.24 M = 7074.32 76.79 63.46 73.13 86.34 86.28 81.05 85.26 M = 9074.35 76.70 63.67 73.15 86.33 86.25 81.22 85.27 M = 11074.34 76.65 63.83 73.17 86.37 86.28 81.25 85.31 M = 13074.28 76.56 63.90 73.11 86.38 86.20 81.31 85.29 M = 15074.23 76.43 63.90 73.05 86.39 86.19 81.38 85.31 M = 50073.47 75.21 63.33 72.14 86.75 85.85 81.62 85.36 M = 100073.24 74.90 63.74 72.00 86.67 85.89 81.68 85.36

II 5dB10dB15dB20dB(Multi) II

Train ConditionParametersSet ASet BSet CAverage05dB: 0.90 : 0.7164.23 67.69 50.27 62.82 10 dB: 0.95 : 0.4370.11 73.17 56.23 68.56 15 dB: 0.98 : 0.3272.19 75.22 58.70 70.70 20 dB: 0.98 : 0.2573.26 75.85 60.11 71.66 Multi: 0.98 : 0.3471.24 74.36 57.91 69.82

Clean trainingSetASetBSetCAverageBaseLine58.94 58.48 59.97 58.96 SPW72.87 65.10 67.94 68.77 Muti trainingSetASetBSetCAverageBaseLine85.22 83.99 80.67 83.82 SPW86.43 84.20 80.95 84.45

(Energy)

(Energy Entropy)

(LTSD) N

(LTSD)

AURORA 2.0 (Hand-LabelHL)AURORA 2.0ABHR0(non-speech hit-rate)HR1(speech hit-rate)

AveSet(A&B)Ave 20~0dBnon-speech errorspeech errortotal errorLogEn53.40 10.02 31.71 LogEn_LER55.39 9.67 32.53 Entropy76.91 10.54 43.73 Entropy_LER65.63 15.48 40.56 LTSD_ws766.38 6.95 36.66 LTSD_ws7_LER61.81 7.11 34.46

over-estimation factorflooring factor

Documents

指導教授：陳柏琳 博士 研究生：陳鴻彬

指導教授：陳柏琳博士研究生：陳鴻彬