27
Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim ,etc.,”Multi-Channel Signal Separ ation by Decorrelation”,IEEE Trans. on ASSP,405-413,1993 2.Yunxin Zhao,etc.,”Adaptive Co-channel Speech Separat ion and Recognition”,IEEE Trans. On SAP,138-151,1999 3.Ing Yang Soon,etc.,”Noisy Speech Enhancement Using D iscrete Cosine Transform”,Speech communication,249-257,1998

Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Embed Size (px)

Citation preview

Page 1: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Speech Enhancement for ASR

by Hans Hwang 8/23/2000

Reference 1. Alan V. Oppenheim ,etc.,”Multi-Channel Signal Separation by Decorrelation”,IEEE Trans. on ASSP,405-413,1993 2.Yunxin Zhao,etc.,”Adaptive Co-channel Speech Separation an

d Recognition”,IEEE Trans. On SAP,138-151,1999 3.Ing Yang Soon,etc.,”Noisy Speech Enhancement Using Discret

e Cosine Transform”,Speech communication,249-257,1998

Page 2: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Outline Signal Separation by S-ADF/LMS Speech Enhancement by DCT Residual Signal Reduction Experimental Results

Page 3: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Speech Signal Separation Introduction: -To Recover the desired signal and identify the unknown system from the observation signal -Speech signal recovered from SSS will increase SNR and improve the speech recognition accuracy -Specifically consider the two-channel case

Page 4: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SSS cont’d Two-channel model description

A and B are cross-coupling effect between channels and we ignore the transfer function of each channel. xi(t) is source signal and yi(t) is acquired signal

1

1

B

AH

Page 5: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SSS (cont’d) Source separation system (separate source signals out from acquired signals)

and called decoupling filters and modeled as FIR filter

1

1

1

11

B

A

ABH

^

A

^^

1 BAC

^

B

Page 6: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SSS by ADF Calculate the FIR coeff. by adaptive decorre- lation filter(ADF) proposed by A. V. Oppenheim in 1993 -The objective is to design decoupling filter s.t., the estimated signals are uncorrelated. -The decoupling filtering coeff.’s are estimated iteratively based on the previous estimated filter coeff.’s and current observations

Page 7: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SSS by ADF (cont’d) The closed form of decoupling filters

where

Page 8: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SSS by ADF (cont’d) Choice of adaptation gain -As time goes to infinite the adaptation gain goes to zero for the system stable consideration.

-Optimal choice adaptation gain for the system

stability and convergence. -

trt )(

Page 9: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SSS by ADF (cont’d) The experiment of : )(t

tt /5)( tttt /)(&/2.0)(

Page 10: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Source Signal Detection(SSD)

Introduction -If one of the two is inactive then the estimated signals will be poor by ADF and cause the recog- nition errors. -So the ASR and ADF are performed within active region of each target signal.

Page 11: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SSD (cont’d)

Page 12: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SSD (cont’d) SSD by coherence function

If then If then

EE KK ,2,1 0)( k

EEEE KKKKor

,2,1,2,1 1)( k

Page 13: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SSD (cont’d) - decision variable

-Decision Rule:

Page 14: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SSD (cont’d)-Implementation using DFT and Result

Page 15: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SSD (cont’d)

Page 16: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Improved Filter Estimation Widrow’s LMS algorithm proposed in 1975 -If we don’t know A or B in observation(i.e., one of the source signals is inactive) then the estimation of filters will cause much errors compared to the actual filters. -If we know source signal 2 is inactive(using SSD) then we only estimate filter B and remain filter A unchanged.

Page 17: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Improved Filter Estimation LMS algorithm and result

Page 18: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Experimental Results-Evaluate in terms of WRA and SIR

Page 19: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Experimental Result (cont’d) *Use 717 TIMIT

sentences to train 62 phone units.

Front-end feature is PLP and its dynamic. Grammar perplexity is

105.

After acoustic normalization

Page 20: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Speech Enhancement usingDiscrete Cosine Transform

Motivation -DCT provides significantly higher compaction as

compared to the DFT

Page 21: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

SE Using DCT (cont’d) -DCT provides higher spectral resolution than DFT -DCT is real transform so it has only binary phases. Its phase won’t be changed unless added noise is strong.

1

0

1

0

)2

)12(cos()()()(

)2

)12(cos()()()(

N

k

N

n

N

knkXknx

N

knnxkkX

Nk

N

Nnk

2)(&

1)0(

1,0

Page 22: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Estimating signal by MMSE Intorduction -y(t)=x(t)+n(t) and Y(k)=X(k)+N(k) Assume DCT coeff.’s are statistically independent and estimated signal is less diffenent from the original signal. -

,

)](/)([)(^

KYkXEkX

by Bayes’ ruleand signal model 1)(

)()(

^

k

kkX

])([

])([

)(

)()( 2

2

kNE

kXE

k

kk

n

x

Page 23: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

MMSE (cont’d) Estimating signal source by Decision Directed Estimation(DDE) (proposed by Ephraim & Malah in ‘8

4)

= 0.98 in computer simulation

}0),()(max{)1()()( 2^^

kkYkk npxx

Page 24: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Reduction of Residual Signal

Introduction -If the source signal more likely exists then the

estimated is more reliable. -two states of inputs H0:speech absent

H1:speech present

: modified filter output

)())(/()(^

1 kXkYHPkA

Page 25: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Reduction of Residual Signal

- where

)/)(()()/)(()(

)/)(()())(/(

2211

111 HkYpHpHkYpHp

HkYpHpkYHp

))()(,0);(()/)((

))(,0);(()/)((

1

0

kknYNHkYp

kkYNHkYp

xn

n

)(1)))()((2)()(

exp(1

1))(/( 21

kkknkYk

kYHp

x

Page 26: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Experimental Results Measure in Segmental SNR

* EMF DETF DETF2

6.27 11.93 11.82 11.27

-10.17 -0.07 1.93 2.09

-1.05 11.34 13.69 13.32

-21.99 -6.99 -0.04 0.95

White noise added

Fan noise added

n

x x f

x f

n2

) (

2

) (

^

log 101

Page 27: Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans

Experimental Results