Blind Source Separation with Distributed Microphone Pairs ...iwaenc.org/proceedings/2010/HTML/Uploads/1048.pdf · Blind Source Separation with Distributed Microphone Pairs Using Permutation

Blind Source Separation with DistributedMicrophone Pairs Using Permutation Correction

by Intra-Pair TDOA ClusteringTakuma Ono, Shigeki Miyabe, Nobutaka Ono and Shigeki Sagayama

Graduate School of Information Science and Technology, the University of Tokyo7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan

�tono, miyabe, onono, sagayama�@hil.t.u-tokyo.ac.jp

Abstract—In this paper, we present a novel framework ofdistributed microphone array for blind source separation (BSS),where stereo microphones or proximately-placed microphonepairs are distributed. Unlike distributing all microphones indi-vidually, the time difference of arrival (TDOA) in the pairedchannels can be robustly estimated without suffering spatialaliasing. Based on it, sound sources are separated by thefrequency-domain independent component analysis (ICA) withthe permutation correction by clustering the intra-pair TDOAs.The experimental results in real reverberant environment arealso shown.

I. INTRODUCTION

Blind source separation (BSS) is a technique to recoveroriginal source signals from mixtures without any mixingconditions, which has been actively investigated in arraysignal processing field. In particular, the frequency-domainindependent component analysis (ICA) is one of the standardapproach of BSS in convolutive mixing [1], [2] and closely-arrayed microphones are preferably used to avoid spatialaliasing [3]–[5]. However, setting closely-spaced microphonearray in one position has limitation because we cannot locatethe microphone array near to all sources to earn high signal-to-noise ratio, and some of sources can stand in the samedirection.

In contrast, using spatially-distributed microphones wouldbe more suitable with separating wide-spread sources. Improv-ing separation performance with combining distributed multi-ple recording devices is also attractive scenario. Recently, theframework of distributed microphones has been investigatedfor speech recognition system [6] and teleconferencing tech-nology [7], and it will facilitate speech-based man-machineinteraction in prospective system such as “smart room” [8].Several methods for localization or separation using distributedmicrophones has been also proposed [7], [9]–[11].

Since frequency-domain ICA finds independent componentsat every frequency bands, permutation inconsistency generallyremains. A robust approach to solving permutation is clus-tering of the relative positions of sources and microphones,especially by exploiting time differences of arrival (TDOAs)[4]. However, in the distributed microphone array scenario,distances between microphones are often too large to estimatereliable TDOAs because of the spatial aliasing. Although thereare several reports about utilizing magnitude ratios for theclustering [7], [10], the level difference is not as robust asTDOA for arbitrary arrangement of sources and microphones.Another approach is to use temporal structure of signals [12].

It is known that this approach alone is less robust than TDOA-based one, but integration with TDOA is effective [4]. Theintegration with temporal structure is out of focus in this paper.

In this paper, we present a novel framework of distributedmicrophone array for BSS, where stereo microphones orproximately-placed microphone pairs are distributed. The con-cept of distributing subarrays, which consist of closely spacedmicrophones for avoiding spatial aliasing, has been also usedin [13], [14]. We focus on specifically distributing pairedmicrophones because many recording devices have stereochannels and it would be very suitable with the context of thedistributed microphone array. Based on it, sound sources areseparated by the frequency-domain ICA with the permutationcorrection by clustering the intra-pair TDOAs since the TDOAin the paired channels can be robustly estimated. In theproposed framework, we can use multiple independent stereorecording devices where inter-pair channels are asynchronized.Finally, we compare the separation performance of the pro-posed method and several existing methods by experimentalresults in real environment.

II. BSS WITH DISTRIBUTED MICROPHONE PAIRS

A. Problem formulation

Let’s consider that stereo microphones, which are pairsof closely-spaced two microphones, are distributed and �sources are observed by them. Let � � �� denote thenumber of microphones and the �� th and the ��thmicrophone be paired (� � �� ). We assume theoverdetermined case (� � � ) and the number of sources(� ) is known a priori. First, suppose that all channels aresynchronized. The problem here is to recover � sources from� observations.

B. Frequency-domain ICA

First, we summarize a typical frequency-domain ICA inthe overdetermined case. In time-frequency representation, theobserved signal �� , where� �� denotes vector transpose, is denoted as

��

��

�� ...

. . ....

��

�� (1)

where and � are source and microphone indices, re-spectively, � is a frame index, �� is the frequency

characteristic from the th source to the �th microphone and��

� is the source signal.In the overdetermined case, dimension reduction is first

applied using principle component analysis (PCA) [15], whichis performed by transforming the observed signal into thesubspace signal as

��

� �� (2)

where �� is a � � � matrix and obtained by the �eigenvectors corresponding to the � largest eigenvalues ofthe covariance matrix of �� .

Then, source signals are estimated as

�� (3)

The demixing matrix �� is iteratively estimated by theupdate rule with natural gradient [16]:

�� (4)

where � is a step size, � is a nonlinear function, and ��denotes expectation operation, which is practically replacedby time average over all frames.

Generally in frequency-domain ICA, scale ambiguity andpermutation inconsistency remain. The scale is usually deter-mined by projection back [12] and we also follow it. In thenext section, we discuss the permutation correction as the mainissue of this paper.

III. PERMUTATION CORRECTION BY CLUSTERINGINTRA-PAIR TDOA VECTORS

A. Estimation of Intra-pair TDOA

TDOA-based permutation correction is one of the popularapproach in overdetermined BSS, and it has been also wellused in underdetermined BSS [17]. However, utilizing TDOAis not straightforward in the context of distributed microphonearray. When microphones are distantly located, the correlationbetween their channels degrades and estimating TDOA be-comes difficult. The spatial aliasing is also significant.

Our approach is to distribute paired microphones and usingtheir intra-pair TDOAs. After frequency-domain ICA, withminimizing �� , the mixing matrixcan be estimated as

�� (5)

where �� is a �� matrix and the th row vector gives asteering vector for the th source. Using it, the TDOA for theth source within the �th pair of microphone pairs is obtainedas

�� (6)

where �� is the ��th element of ��.Here we define an intra-pair TDOA vector in the �th

frequency:

��

� � (7)

This vector gives us definite clue for the permutation correc-tion even in distributed microphone array since it should beobtained without spatial aliasing due to the close distance ofthe paired microphones.

Permutation

Source 2

Source 1

intra-pair TDOA 1

intra-pair

TDOA 2

Fig. 1. Clustering the intra-pair TDOA vectors

B. Clustering Intra-pair TDOA vectors

Since the intra-pair TDOA vectors ideally do not dependon frequency but location of sources and microphones, thepermutation could be corrected by grouping them, and forthat, k-means algorithm can be applied in the ��-dimensionaleuclidean space. In this clustering, considering that the TDOAat the frequency band with more significant energy is morereliable, we introduce weighting with reliability of each esti-mation. Let �� be the estimation of the th source atthe �th microphone, which is obtained by Projection Back.Based on them, we define the weight coefficient as

��

�� (8)

�� (9)

where �� is the time average over all frames of the thsource amplitude at the �th frequency. Then the weightedsquare sum of the distances from the mean of all entries isminimized by the following iterative updates for all the sourcesand microphone pairs:

��

��

��

��

��

��

��(10)

��

��

��

��

� ��

�� (11)

where � of ��

� ��

is the �th row of a vector, ��is a permutation pattern and ��

� is the centroid of theth source within the �th microphone pair, which representsthe estimation of the intra-pair TDOA. Note that ��

�s takedifferent values for different � depending on the location andthe orientation of each pair. Grouping the intra-pair TDOAsin (10) and determination of the centroids in (11) correspondsto the permutation correction and intra-pair TDOA estimation,respectively.

IV. BSS WITH MULTIPLE STEREO RECORDING DEVICES

In the previous section, we suppose that all microphonesare synchronized. Next, in this section let’s discuss BSS withmultiple stereo recording devices, which means we assumethat intra-pair channels are synchronized while inter-pair chan-nels are asynchronized. In this paper, we also assume thatthe sampling frequencies of all devices are identical and theirmismatch is negligible as a simple case.

In this case, observed signals have different time offsets.However, the demixing filter could be obtained by frequency-domain ICA if each channel is aligned approximatively and

TABLE IEXPERIMENTAL CONDITION

number of sources 3number of microphones 4

room size 6.3 m �7.7 m �2.7 mreverberation time 300 ms

sampling rate 16 kHzframe shift 75% overlap

ICA algorithm Infomax

Source 1

Stereo

microphone

1 [m]

20°30°

2 1

1 [m]1 [m]

0.6 [m]4

3

80°

Source 3

Source 2

Fig. 2. Arrangement of sources and microphones in the experiment

Fig. 3. A photo of real acoustic environment

the apparent TDOAs between channels for all sources areadequately small compared with the frame length. The roughalignment can be performed by shifting � �� by �� (� �� ) where

��

�� (12)

and �� denotes the cross correlation between �� and��. Note that the permutation correction based on the intra-pair TDOA vector is not affected at all even when the intra-pairchannels are exactly synchronized.

V. EXPERIMENTAL RESULTS

In this section, we evaluate the separation performance ofthe proposed method in real acoustic environment. Intendingto simulate multiple recording in a meeting, two microphonepairs were placed inside three loudspeakers in a room with�� ms as shown in Fig. 2 and Fig. 3, where theintra-pair and the inter-pair distances were � cm and �� cm,respectively. The male and the female speech was used assource signals. Other experimental conditions were shown inTable I.

256 512 1024 2048 4096 8192 16384-2

0

2

4

6

8

10

12

frame length [points]

output SIR [dB]

MMIVAproposedideal

Fig. 4. Output SIRs for different frame lengths for 6s-length signal

2 6 123

4

5

6

7

8

9

10

11

12

13

signal length [s]

output SIR [dB]

MMIVAproposedideal

Fig. 5. Output SIRs for different signal lengths. Note that an appropriateframe length is chosen for each condition

-0.2 -0.1 0 0.1 0.2-0.2

-0.1

0

0.1

0.2

intra-pair TDOA (microphone 1- 2) [ms]

intra-pair TDOA (microphone 3- 4) [ms]

source 1source 2source 3

Fig. 6. Example of the clustered intra-pair TDOA vectors

In the first experiment, instead of using really asynchronousrecordings, we used synchronous recordings (i.e., which meansall microphones were connected to an audio board withsynchronous inputs), and gave random time offsets betweeninter-pair channels for quantitative evaluation. We roughlyaligned the simulatedly-asynchronous signals as discussed insection IV, applied the infomax-type frequency-domain ICA,and then, corrected the permutation by our method presentedin section III.

Confined to usable approaches with distributed micro-phones, our method was compared with

� MM: the infomax-type frequency-domain ICA witha magnitude-based permutation correction (maximum-magnitude) in [7],

0 4 8 12-1

0

1

time [s]

amplitude

(a) Observed signal (microphone 1)

0 4 8 12-1

0

1

time [s]

amplitude

(b) Observed signal (microphone 3)

0 4 8 12

1

0

-1time [s]

amplitude

(c) Original signal (source 1)

0 4 8 12-1

0

1

time [s]

amplitude

(d) Estimated signal (source 1)

0 4 8 12-1

0

1

time [s]

amplitude

(e) Original signal (source 2)

0 4 8 12-1

0

1

time [s]

amplitude

(f) Estimated signal (source 2)

0 4 8 12-1

0

1

time [s]

amplitude

(g) Original signal (source 3)

0 4 8 12-1

0

1

time [s]

amplitude

(h) Estimated signal (source 3)

Fig. 7. Results by using multiple stereo recording devices

� IVA: a permutation-free blind source separation (indepen-dent vector analysis) in [18], and

� ideal: the infomax-type frequency-domain ICA with theideal permutation correction to maximize SIR (Signal-to-Interference Ratio) where the original signals are givenas reference.

The separation performance was evaluated by the gain of SIR.Fig. 4 and Fig. 5 show the separation performance in the

different frame length and the different signal length, respec-tively. In all conditions, the proposed method showed superiorperformance to other two methods. The difference betweenIVA and our method is small when the frame length is shortor the signal length is long, but our method is especially strongfor the long frame length or the short signal length. This naturewould be very desirable in block-wise processing, which isnecessary for separating moving sources or compensatingdrifted time offset between channels. We also show an exampleof clustered intra-pair TDOAs in the frequency band lowerthan �� Hz for ��s-length real recorded signals in Fig. 6.We can see that the intra-pair TDOA vectors are appropriatelygrouped in the 2-dimensional euclidean space.

In the second experiment, we used two asynchronous stereo

recording devices (digital voice recorders, SANYO ICR-PS603RM), and other experimental conditions were the sameas the first experiment. Fig. 7 shows the recorded signals bythe different devices (microphone 1 and 3), the pre-recordedsources at microphone 1 and the estimated sources fromthe mixtures by the proposed method. We can see that theproposed method worked well and the sources were separatedeven when the recordings were really asynchronous and haddifferent time offsets.

VI. CONCLUSION

In this paper, the framework of distributing microphonepairs for BSS based on the frequency-domain ICA and thepermutation correction suitable with it were presented. Inthe proposed method, the permutation is corrected basedon clustering TDOAs within the paired channels. The BSSusing multiple stereo recording devices was also discussed.In the experiments, the proposed method showed the betterseparation performance than other existing methods and it alsoworked well in multiple stereo recordings.

REFERENCES

[1] A. Hyvarinen, J. Karhumen, and E. Oja, Independent ComponentAnalysis, John Wiley & Sons, 2001.

[2] S. Ikeda and N. Murata, “A method of ICA in time-frequency domain,”Proc. ICA, pp. 365–371, 1999.

[3] R. Mukai, H. Sawada, S. Araki, and S. Makino, “Near-Field FrequencyDomain Blind Source Separation for Convolutive Mixtures,” Proc.ICASSP, pp.49–52, 2004.

[4] S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, “Evalu-ation of blind signal separation method using directivity pattern underreverberant conditions,” Proc. ICASSP, pp. 3140–3143, 2000．

[5] H. Sawada, R. Mukai, S. Araki, and S. Makino, “A robust approach tothe permutation problem of frequency-domain blind source separation,”Proc. ICASSP, pp. 381–384, 2003.

[6] Y. Zhao, S. Shin, E. Robledo-Arnuncio, and B.H. Juang, “A study onrecognizing distorted speech over local distributed transducer networks,”Proc. ICASSP, pp. 4181–4184, 2009.

[7] J.P. Dmochowski, Z. Liu, and P.A. Chou, “Blind source separation ina distributed microphone meeting environment for improved teleconfer-encing,” Proc. ICASSP, pp. 89–92, 2009.

[8] C. Busso, S. Hernanz, C. Chu, S. Kwon, S. Lee, P. G. Georgiou,I. Cohen, and S. Narayanan, “Smart room: Participant and speakerlocalization and identification,” Proc. ICASSP, pp. 1117–1120, 2005.

[9] N. Ono, H. Kohno, N. Ito, and S. Sagayama, “Blind alignment ofasynchronously recorded signals for distributed microphone array,” Proc.WASPAA, pp.161–164, 2009.

[10] E. Robledo-Arnuncio and B.H. Juang, “Blind source separation ofacoustic mixtures with distributed microphones,” Proc. ICASSP, pp.949–952, 2007.

[11] Z. Liu, “Sound source separation with distributed microphone arrays inthe presence of clock synchronization errors,” Proc. IWAENC, 2008.

[12] N. Murata, S. Ikeda, and A. Ziehe, “An approach to blind source sepa-ration based on temporal structure of speech signals,” Neurocomputing,vol. 41, no. 1–4, pp. 1–24, 2001.

[13] K. Niwa, T. Nishino, and K. Takeda, “Encoding large array signals into3D sound field representation for selective listening point audio basedon blind source separation,” Proc. ICASSP, pp.181–184, 2008.

[14] F. Nesta and M. Omologo, “Cooperative Wiener-ICA for source local-ization and separation by distributed microphone arrays,” Proc. ICASSP,pp.181–184, 2010.

[15] S. Winter, H. Sawada, and S. Makino, “Geometrical interpretation of thePCA subspace approach for overdetermined blind source separation,”EURASIP Journal on Applied Signal Processing, vol. 2006, pp. 1–11,2006.

[16] S. Amari, “Natural gradient works efficiently in learning,” NeuralCompututation, vol. 10, pp. 251–276, 1998.

[17] S. Araki, H. Sawada, R. Mukai, and S. Makino, “Underdetermined blindsparse source separation for arbitrarily arranged multiple sensors,” SignalProcessing, vol. 87, pp. 1833–1847, 2007.

[18] T. Kim, H.T. Attias, S. Lee, and T. Lee, “Blind source separationexploiting higher-order frequency dependencies,” IEEE Trans. Audio,Speech and Language Processing, vol. 15, pp. 70–79, 2007.

Documents

Blind Source Separation with Distributed Microphone Pairs ...iwaenc.org/proceedings/2010/HTML/Uploads/1048.pdf · Blind Source Separation with Distributed Microphone Pairs Using Permutation