39
以以以以以以以以以以以以以以以以以以以以以以以以以以以以 On the Study of Energy-Based Approaches for Speech Feature Nor malization and Apply to Voice Active Detection 以以以以 以以以 以以 以以以 以以以

指導教授:陳柏琳 博士 研究生:陳鴻彬

  • Upload
    jamil

  • View
    78

  • Download
    2

Embed Size (px)

DESCRIPTION

以能量為基礎之語音正規化方法研究及其於語音端點偵測之應用 On the Study of Energy-Based Approaches for Speech Feature Normalization and Apply to Voice Active Detection. 指導教授:陳柏琳 博士 研究生:陳鴻彬. 大綱. 研究動機 參考文獻 研究內容 實驗設定 實驗結果 語音端點偵測之應用 結論與未來展望. 研究動機. 噪音能量 - PowerPoint PPT Presentation

Citation preview

  • On the Study of Energy-Based Approaches for Speech Feature Normalization and Apply to Voice Active Detection

  • (FES )I (SLEN I)II (SLEN II)I (LEDRN I)II (LEDRN II)

  • (FES )

  • (FLES )K

  • I (SLEN I) (VAD)

  • II (SLEN II)III

  • I (LEDRN I)

  • I (LEDRN I)

  • I (LEDRN I)

  • II (LEDRN II)

  • II (LEDRN II)

  • II (LEDRN II)

  • I(LER I) I(Quantile)

  • I(LER I)

    M

  • I(LER I)(M)100

  • I(LER I)

  • II(LER II)I

    II()(Least Squares Regression)

  • (SPW)(vowel)(sonorant consonant)(fricative)(glide)(nasal)

  • (SPW)(autocorrelation function)

  • (SPW)

  • (SPW)

  • (SPW)

  • (SPW)(Non-Casual Auto Regression Moving Average)

  • Aurora-2()Set ASet BSet C Set ASet BSet C(Front-End processing)12(Log Energy) (Back-end recognizer)HTK

  • I I (=100) I

    Set ASet BSet CAverageClean trainingTrain Set+ Test Set74.36 76.72 63.83 73.20 Test Set Only72.20 75.45 60.58 71.18 Multi trainingTrain Set+ Test Set86.31 86.27 81.22 85.28 Test Set Only77.76 81.98 69.86 77.87

  • I50705001000 I

    Clean trainingMulti trainingScale SizeSet ASet BSet CAverageSet ASet BSet CAverageBaseLine58.94 58.48 59.97 58.96 85.22 83.99 80.67 83.82 M = 5074.10 76.71 63.07 72.94 86.33 86.25 81.04 85.24 M = 7074.32 76.79 63.46 73.13 86.34 86.28 81.05 85.26 M = 9074.35 76.70 63.67 73.15 86.33 86.25 81.22 85.27 M = 11074.34 76.65 63.83 73.17 86.37 86.28 81.25 85.31 M = 13074.28 76.56 63.90 73.11 86.38 86.20 81.31 85.29 M = 15074.23 76.43 63.90 73.05 86.39 86.19 81.38 85.31 M = 50073.47 75.21 63.33 72.14 86.75 85.85 81.62 85.36 M = 100073.24 74.90 63.74 72.00 86.67 85.89 81.68 85.36

  • II 5dB10dB15dB20dB(Multi) II

    Train ConditionParametersSet ASet BSet CAverage05dB: 0.90 : 0.7164.23 67.69 50.27 62.82 10 dB: 0.95 : 0.4370.11 73.17 56.23 68.56 15 dB: 0.98 : 0.3272.19 75.22 58.70 70.70 20 dB: 0.98 : 0.2573.26 75.85 60.11 71.66 Multi: 0.98 : 0.3471.24 74.36 57.91 69.82

  • Clean trainingSetASetBSetCAverageBaseLine58.94 58.48 59.97 58.96 SPW72.87 65.10 67.94 68.77 Muti trainingSetASetBSetCAverageBaseLine85.22 83.99 80.67 83.82 SPW86.43 84.20 80.95 84.45

  • (Energy)

  • (Energy Entropy)

  • (LTSD) N

    (LTSD)

  • I

  • AURORA 2.0 (Hand-LabelHL)AURORA 2.0ABHR0(non-speech hit-rate)HR1(speech hit-rate)

  • AveSet(A&B)Ave 20~0dBnon-speech errorspeech errortotal errorLogEn53.40 10.02 31.71 LogEn_LER55.39 9.67 32.53 Entropy76.91 10.54 43.73 Entropy_LER65.63 15.48 40.56 LTSD_ws766.38 6.95 36.66 LTSD_ws7_LER61.81 7.11 34.46

  • over-estimation factorflooring factor