30
Regularized Superresolution-Based Binaural Signal Separation with Nonnegative Matrix Factorization Daichi Kitamura, Hiroshi Saruwatari, Yusuke Iwao, Kiyohiro Shikano (Nara Institute of Science and Technology, Nara, Japan) Kazunobu Kondo, Yu Takahashi (Yamaha Corporation Research & Development Center, Shizuoka, Japan)

Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Embed Size (px)

Citation preview

Page 1: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Regularized Superresolution-Based

Binaural Signal Separation

with Nonnegative Matrix Factorization

Daichi Kitamura, Hiroshi Saruwatari,

Yusuke Iwao, Kiyohiro Shikano

(Nara Institute of Science and Technology, Nara, Japan)

Kazunobu Kondo, Yu Takahashi

(Yamaha Corporation Research & Development Center, Shizuoka, Japan)

Page 2: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Outline

• 1. Research background

• 2. Conventional method

– Nonnegative matrix factorization

– Penalized supervised nonnegative matrix factorization

– Directional clustering

– Hybrid method

• 3. Proposed method

– Regularized superresolution-based nonnegative matrix

factorization

• 4. Experiments

• 5. Conclusions

2

Page 3: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Outline

• 1. Research background

• 2. Conventional method

– Nonnegative matrix factorization

– Penalized supervised nonnegative matrix factorization

– Directional clustering

– Hybrid method

• 3. Proposed method

– Regularized superresolution-based nonnegative matrix

factorization

• 4. Experiments

• 5. Conclusions

3

Page 4: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Background

• Music signal separation technologies have received much

attention.

• Music signal separation based on nonnegative matrix

factorization (NMF) has been a very active area of the

research.

• The extraction performance of NMF markedly degrades for the

case of many source mixtures.

4

• Automatic music transcription• 3D audio system, etc.

Applications

We propose a new method for multichannel signal separation with NMF utilizing both spectral and spatial cues included in mixtures of multiple instruments.

Page 5: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Outline

• 1. Research background

• 2. Conventional method

– Nonnegative matrix factorization

– Penalized supervised nonnegative matrix factorization

– Directional clustering

– Hybrid method

• 3. Proposed method

– Regularized superresolution-based nonnegative matrix

factorization

• 4. Experiments

• 5. Conclusions

5

Page 6: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

NMF

• NMF is a type of sparse representation algorithm that

decomposes a nonnegative matrix into two nonnegative

matrices. [D. D. Lee, et al., 2001]

6

Time

Freq

uen

cy

AmplitudeFr

equ

ency

Am

plit

ud

e

Observed matrix(Spectrogram)

Basis matrix(Spectral bases)

Activation matrix(Time-varying gain)

Time

Ω: Number of frequency bins

𝑇: Number of frames

𝐾: Number of bases

𝒀: Observed matrix

𝑭: Basis matrix

𝑮: Activation matrix

Page 7: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Penalized Supervised NMF (PSNMF)

• In PSNMF, the following decomposition is addressed under

the condition that is known in advance. [Yagi, et al., 2012]

7

Separation process Fix trained bases and update .

is forced to become uncorrelated with

Update

Training process

Supervised bases

of the target sound

Supervision sound

Page 8: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Penalized Supervised NMF (PSNMF)

• In PSNMF, the following decomposition is addressed under

the condition that is known in advance. [Yagi, et al., 2012]

8

Separation process Fix trained bases and update .

is forced to become uncorrelated with

Update

Training process

Supervised bases

of the target sound

Supervision sound

Problem of PSNMF: When the signal includes many sources,

the extraction performance markedly degrades.

Page 9: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Directional Clustering

• Directional clustering can estimate sources and their direction

in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009]

• This method can separate sources with spatial information in

an observed signal.

9

L R L-c

hin

pu

t sig

na

l

R-ch input signal

:Source component

:Centroid vector

Page 10: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Directional Clustering

• Directional clustering can estimate sources and their direction

in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009]

• This method can separate sources with spatial information in

an observed signal.

10

L R L-c

hin

pu

t sig

na

l

R-ch input signal

:Source component

:Centroid vector

Problem of directional clustering:

This method cannot separate sources in the same direction.

Page 11: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Hybrid method

• Conventional hybrid method utilizes PSNMF after the

directional clustering. [Iwao, et al., 2012]

• This method consists of two techniques.

– Directional clustering

– PSNMF

11

Directional

clusteringL R PSNMF

Spatial

separation

Source

separation

Conventional Hybrid method

Page 12: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Problem of hybrid method

• The signal extracted by the hybrid method suffers from the

generation of considerable distortion due to the binary

masking in directional clustering.

• The signal in the target direction, which is obtained by

directional clustering, has many spectral chasms.

• The resolution of the spectrogram is degraded.

12

1 0 0 0 0 0 0

0 1 1 0 0 1 1

1 0 0 0 0 0 0

0 1 0 1 1 0 1

1 0 0 0 0 0 0

1 1 1 0 1 1 0

Time

Fre

qu

en

cy

: Target direction Time

Fre

qu

en

cy

TimeF

req

ue

ncy

: Other direction :Hadamard product (product of each element)

Input spectrogram Binary mask Separated cluster

Directional Clustering

Page 13: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Outline

• 1. Research background

• 2. Conventional method

– Nonnegative matrix factorization

– Penalized supervised nonnegative matrix factorization

– Directional clustering

– Hybrid method

• 3. Proposed method

– Regularized superresolution-based nonnegative matrix

factorization

• 4. Experiments

• 5. Conclusions

13

Page 14: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Proposed hybrid method

14

Input stereo signal

L-ch R-ch

STFT

Directional clustering

Center component

L-ch R-ch

center cluster

Index of

based SNMF

Superresolution-

based SNMF

Superresolution-

ISTFT ISTFT

Mixing

Extracted signal

Input stereo signal

L-ch R-ch

STFT

Directional clustering

Center component

PSNMFPSNMF

L-ch R-ch

ISTFT ISTFT

Mixing

Extracted signal

Conventional

hybrid method

Proposed

hybrid method

Employ a new supervised NMF algorithm as an alternative

to the conventional PSNMF in the hybrid method.

Page 15: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Regularized superresolution-based NMF

• In proposed supervised NMF, the spectral chasms are treated

as unseen observations using index matrix.

15

: Chasms

Time

Fre

qu

en

cy

Separated clusterChasms

Treat chasms as

unseen observations.

1 0 0 0 0 0 0

0 1 1 0 0 1 1

1 0 0 0 0 0 0

0 1 0 1 1 0 1

1 0 0 0 0 0 0

1 1 1 0 1 1 0

Time

Fre

qu

en

cy

Index matrix

Page 16: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Regularized superresolution-based NMF

• The spectrogram of the target sound is reconstructed using

more matched bases because chasms are treated as unseen.

• The components of the target sound lost after directional

clustering can be extrapolated using supervised bases.

16

Time

Fre

qu

en

cy

Separated cluster

Time

Fre

qu

en

cy

Reconstructed spectrogram: Chasms

Supervised

bases

Superresolution

using supervised

bases

Page 17: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

17

Regularized superresolution-based NMF

• Signal flow of the proposed hybrid method

Center RightLeftDirection

sou

rce

com

po

nen

t

(a)

Freq

ue

ncy

of

Observedspectra

Target source

Page 18: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

18

Target direction

Regularized superresolution-based NMF

• Signal flow of the proposed hybrid method

Center RightLeftDirection

sou

rce

com

po

nen

t

z

(b)

Freq

ue

ncy

of

Afterdirectionalclustering

Target source

Center RightLeftDirection

sou

rce

com

po

nen

t

(a)

Freq

ue

ncy

of

Observedspectra

Center sources lose some

of their components

Directional

clustering

Page 19: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

19

Regularized superresolution-based NMF

• Signal flow of the proposed hybrid method

Center RightLeftDirection

sou

rce

com

po

nen

t

z

(b)

Freq

ue

ncy

of

Afterdirectionalclustering Center sources lose some

of their components

Page 20: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

20

Regularized superresolution-based NMF

• Signal flow of the proposed hybrid method

Center RightLeftDirection

sou

rce

com

po

nen

t

z

(b)

Freq

ue

ncy

of

Afterdirectionalclustering Center sources lose some

of their components

Superresolution-

based NMF

Center RightLeftDirection

sou

rce

com

po

nen

t

(c)

Freq

ue

ncy

of

Aftersuper-resolution-based SNMF

Extrapolated

target source

Page 21: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Regularized superresolution-based NMF

• The basis extrapolation includes an underlying problem.

• If the time-frequency spectra are almost unseen in the

spectrogram, which means that the indexes are almost zero, a

large extrapolation error may occur.

• It is necessary to regularize the extrapolation.

21

4

3

2

1

0

F

requency [

kH

z]

43210 Time [s]

Extrapolation error

(incorrectly modifying the activation)

Time

Fre

quency

Separated cluster

Almost unseen frame

Page 22: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Regularized superresolution-based NMF

• We propose two types of regularizations.

22

Regularization of the temporal continuity

Regularization of the norm minimization

𝑰 : Index matrix ∙ : Binary complement

𝑖𝜔,𝑡: Entry of index matrix 𝑰 𝑔𝑘,𝑡: Entry of matrix 𝑮𝑓𝜔,𝑘: Entry of matrix 𝑭

Previous

frame

The intensity of these regularizations are proportional to the

number of chasms in each frame.

Page 23: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Regularized superresolution-based NMF

• The cost function in regularized superresolution-based NMF is

defined using the index matrix as

23

: Regularization term

: Penalty term to force and to

become uncorrelated with each other

: Weighting parameter

Page 24: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Regularized superresolution-based NMF

• The update rules that minimize the cost function are obtained

as follows:

24

Page 25: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Outline

• 1. Research background

• 2. Conventional method

– Nonnegative matrix factorization

– Penalized supervised nonnegative matrix factorization

– Directional clustering

– Hybrid method

• 3. Proposed method

– Regularized superresolution-based nonnegative matrix

factorization

• 4. Experiments

• 5. Conclusions

25

Page 26: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Evaluation experiment

• We compared four methods.

– Conventional hybrid method using PSNMF (Conventional method)

– Proposed hybrid method using superresolution-based NMF without

regularization (Proposed method 1)

– Proposed hybrid method using superresolution-based NMF with

regularization of the temporal continuity (Proposed method 2)

– Proposed hybrid method using superresolution-based NMF with

regularization of the norm minimization (Proposed method 3)

26

Input stereo signal

L-ch R-ch

STFT

Directional clustering

Center component

PSNMFPSNMF

L-ch R-ch

ISTFT ISTFT

Mixing

Extracted signal

Input stereo signal

L-ch R-ch

STFT

Directional clustering

Center component

L-ch R-ch

center clusterIndex of

based SNMFSuperresolution-

based SNMFSuperresolution-

ISTFT ISTFT

Mixing

Extracted signal

Page 27: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Evaluation experiment

• We used stereo-panning signals ( ) and binaural-

recorded signals ( ) containing four instruments, Ob.,

Fl., Tb., and Pf., generated by MIDI synthesizer.

• The sources are mixed as the same power.

• Target source is always located in the center direction (no.1).

• We used the same type of MIDI sounds of the target

instruments as supervision for training process.

27

Center

12 3

Left Right

Target source

Supervision

sound

Two octave notes that cover all notes of the target signal

Page 28: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Experimental results (panning signal)• Average SDR, SIR, and SAR scores for each method, where the 4

instruments are shuffled with 12 combinations.

28

12

10

8

6

4

2

0

SD

R [dB

]

24

20

16

12

8

4

0

SIR

[dB

]

10

8

6

4

2

0

SA

R [dB

]

SDR :quality of the separated target sound

SIR :degree of separation between the target and other sounds

SAR :absence of artificial distortion

Proposed method 1 :no regularization

Proposed method 2 :regularization of temporal continuity

Proposed method 3 :regularization of norm minimization

SDR SIR SARGood

Bad

Page 29: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Experimental results (binaural signal)• Average SDR, SIR, and SAR scores for each method, where the 4

instruments are shuffled with 12 combinations.

29

6

5

4

3

2

1

0

SA

R [dB

]

20

16

12

8

4

0

SIR

[dB

]

10

8

6

4

2

0

SD

R [dB

]

SDR :quality of the separated target sound

SIR :degree of separation between the target and other sounds

SAR :absence of artificial distortion

SDR SIR SAR

Proposed method 1 :no regularization

Proposed method 2 :regularization of temporal continuity

Proposed method 3 :regularization of norm minimization

Bad

Good

Page 30: Regularized superresolution-based binaural signal separation with nonnegative matrix factorization

Conclusions

• We propose a new supervised NMF algorithm, which is

superresolution-based method, for the hybrid method to

separate stereo or binaural signals.

• The proposed hybrid method can separate the target signal

with high performance compared with conventional method.

• The regularization of norm minimization is effective for the

proposed supervised NMF algorithm.

30

Thank you for your attention!