Get Complete Project Material File(s) Now! »
Far-field source
Source localization differs in far-field and near-field conditions. Fig. 2.3 illustrates a near-field condition where the distance from the source to the array is comparable to the array size. The curvature of the wavefronts is significant compared to the array size. In this condition, the distance to the target source can be estimated from the signals received at the microphones, if there are three or more microphones.
A source is considered to be in the far-field when the distance to the array center is signif-icantly larger than the array size. In that case, the received signals at each microphone can be approximated as a plane wave and they have the same AoA. This is depicted in Fig. 2.4. Therefore, we can only estimate the AoA but not the distance to the source. In this thesis, we assume that the target sources are in the far-field. In that condition, the AoA can be computed as α = arcsin Δr = arcsin Δt c , · (2.1).
where c is the speed of sound which equals 343 m/s, Δr is the difference of distances between the source and two microphones, Δt is the corresponding TDOA, and d is the distance between the two microphones. From the above equation, we can obtain the value of the AoA α by computing the time delay Δt.
Source signal model
In the following, we introduce the source signal model that will be used for all the source localization techniques presented below.
Let us assume that there are N active sources and we have an array of M microphones. The signal received at each microphone can be modeled as xi(t) = ai,j ∗ sj(t) + ni(t), j=1N X (2.2). where i = 1, . . . , M and j = 1, . . . , N are respectively the indices of microphones and sound sources, ∗ is the convolution operator, xi(t) is the received signal at the ith microphone, sj(t) is the jth emitted signal, ni(t) is the background noise which includes reverberation, and ai,j is the acoustic impulse response which models the direct path from source j to microphone i.
The frequency contents of the source signals are changing over time. Therefore, to efficiently analyze the signals we use the short-time Fourier transform (STFT) to transform the received signals into the time-frequency domain.
Overview of source localization methods
Existing source localization techniques can be classified into three main classes [DiBiase et al., 2001]. The first class includes the techniques that exploit TDOA information [Knapp and Carter, 1976]. In the second class are the techniques which are based upon maximizing the steered response power (SRP) of a beamformer [Hahn and Tretter, 1973,Van Veen and Buckley, 1988,Johnson and Dudgeon, 1992]. The localization methods that are adapted from the field of high resolution spectral analysis are in the third class [Schmidt, 1986, Wang and Kaveh, 1985]. Experimental comparisons of these algorithms are detailed in the literature [DiBiase et al., 2001, Badali et al., 2009, Blandin et al., 2012].
In this section, we provide an overview of prominent source localization methods in each of the three above classes.
Generalized cross-correlation with phase transform
The generalized cross-correlation (GCC) [Knapp and Carter, 1976] method is the most popu-lar method for estimating the TDOA information for a microphone pair. The type of filter-ing, or weighting, used with GCC is crucial to the performance of TDOA estimation. Maxi-mum likelihood weighting is theoretically optimal for single-path propagation in the presence of uncorrelated noise, however its performance degrades significantly with increasing reverbera-tion [Champagne et al., 1996]. The phase transform (PHAT) weighting is more robust against reverberation, even though it is suboptimal under reverberation-free conditions. The generalized cross-correlation with phase transform (GCC-PHAT) has been shown to perform well in realistic environments [Omologo and Svaizer, 1996,Svaizer et al., 1997,Brandstein and Silverman, 1997].
Given two signals xi(k) and xi0(k), the GCC-PHAT is defined as fmax (Xi ¯ (k, f)e 2iπfΔt ) F X (k, f)Xi0 f=fmin Xi(k, f)Xi0 Pii0(Δt, k) = < ¯ (k, f) , (2.6).
where Xi(k, f) and Xi0(k, f) are the STFTs of the two signals, . The TDOA for a single source can be computed as: ΔtPHATii0(k) = arg max Pii0(Δt, k). (2.7)
Time difference of arrival based methods
In the case of three or more microphones, after the TDOA between each pair of microphones has been estimated, the geometric relationship between the sound source and the microphone array can be utilized to estimate the source AoA. By applying a triangulation method to different microphone pairs, the source location can be estimated [Brutti and Nesta, 2013]. The accuracy of AoA measurement of the TDOA based methods depends on the accuracy of the TDOA estimation. The geometry of microphone array can also affect the performance of TDOA based methods. Such methods are well suited to AoA measurement over a limited spatial range when there are sufficient microphone data available.
Steered response power based methods
The general idea of SRP based methods is to steer a spatial filter (also known as a beamformer) to a predefined spatial region or direction [Johnson and Dudgeon, 1992] by adjusting its steering parameters and then search for maximal output. The output of the beamformer is known as the steered response. The maximal SRP is obtained when the direction of the beamformer matches the location of the target source.
One of the simplest and conventional approaches for SRP is the delay-and-sum beamformer [Johnson and Dudgeon, 1992, Flanagan et al., 1985]. Its principle is illustrated in Fig. 2.5. Delay-and-sum beamformers apply time shifts to the signals received at the microphones to compensate for the TDOA of the target source signal at each microphone. The delay-and-sum beamformer output is defined as y(k, f) = aH (α, f)x(k, f), (2.8).
where a is the steering vector whose value depends on the hypothesized AoA α, H is the Hermitian transpose operator. Therefore, signal power is enhanced in the look direction α and attenuated in all other directions.
MUSIC-GSVD algorithm
Compared to previous decomposition strategies for MUSIC, GSVD has efficient computational cost while maintaining noise robustness in source localization [Nakamura et al., 2012]. In the MUSIC-GSVD method, to reduce the computational cost, equation (2.16) is modified as K−1(k, f)R(k, f) = El(k, f)Λ(k, f)Er−1(k, f), (2.17).
where El(k, f) and Er(k, f) are left and right singular vectors, respectively. They are unitary and mutually orthogonal.
Fig. 2.7 shows the MUSIC spectrum obtained by MUSIC-SEVD and MUSIC-GSVD for the localization of N = 4 speakers speaking simultaneously in the presence of noise. The diffuse noise and the directional noise observed are presented in Fig. 2.7(a). As shown in Fig. 2.7(c), when using MUSIC-GSVD, the noise signal is suppressed correctly and the strong peaks correspond to the direction of the four speakers. However, when using MUSIC-SEVD in Fig. 2.7(b), there is a wrong peak corresponding to the noise. This shows that MUSIC-GSVD is more robust to noise compared to MUSIC-SEVD. Throughout the thesis, the MUSIC-GSVD method will be employed for estimating the source AoA.
Table of contents :
Chapter 1 Introduction
1.1 Motivation
1.1.1 Audio for robots, robots for audio
1.1.2 Audio source localization is essential
1.1.3 Conquering uncertainty
1.2 Problem
1.2.1 Problem formulation
1.2.2 General framework of source localization for robot audition
1.3 Contributions
1.4 Outline
Chapter 2 State of the art
2.1 Angle of arrival measurement
2.1.1 General concepts
2.1.1.1 Source localization cues
2.1.1.2 Far-field source
2.1.1.3 Source signal model
2.1.2 Overview of source localization methods
2.1.2.1 Generalized cross-correlation with phase transform
2.1.2.2 Time difference of arrival based methods
2.1.2.3 Steered response power based methods
2.1.2.4 Multiple signal classification based methods
2.1.3 MUSIC-GSVD algorithm
2.2 Source activity detection
2.3 Sequential filtering for a single source
2.3.1 State vector
2.3.2 Observation vector
2.3.3 Recursive Bayesian estimation
2.3.4 Nonlinear mixture Kalman filtering
2.3.5 Particle filtering
2.3.6 Occupancy grids
2.4 Sequential filtering for multiple sources
2.4.1 State vector
2.4.2 Observation vector
2.4.3 Joint probabilistic data association filter
2.4.3.1 Prediction step
2.4.3.2 Update step
2.5 Motion planning for robot audition
2.5.1 General robot motion planning
2.5.2 Motion planning for robot audition
Chapter 3 Source localization in a reverberant environment
3.1 Proposed Bayesian filtering framework
3.1.1 State vector
3.1.2 Dynamical model
3.1.2.1 Dynamical model of the robot
3.1.2.2 Dynamical model of the sound source
3.1.2.3 Full dynamical model
3.1.3 Observation vector
3.1.4 Recursive Bayesian estimation
3.2 Extended mixture Kalman filtering
3.2.1 Prediction step
3.2.2 Update step
3.2.3 Hypothesis pruning
3.2.4 Experimental evaluation
3.2.4.1 Data
3.2.4.2 Algorithm settings
3.2.4.3 Example run – Visualization
3.2.4.4 Example run – Estimated trajectories
3.2.4.5 Error rate of source location estimation
3.2.4.6 Error rate of source activity estimation
3.2.4.7 Statistical analysis
3.3 Particle filtering
3.3.1 Prediction step
3.3.2 Update step
3.3.3 Particle resampling step
3.3.4 Example run
3.4 Comparison of the extended MKF with the particle filtering
3.4.1 Data
3.4.2 Algorithm settings
3.4.3 Experimental results
3.5 Summary
Chapter 4 Multiple source localization
4.1 Learning the sensor model for multiple source localization
4.2 Proposed extended MKF with joint probabilistic data association filter
4.2.1 State and observation vectors
4.2.1.1 State vector
4.2.1.2 Observation vector
4.2.1.3 Joint associations
4.2.2 Prediction step
4.2.3 Update step
4.2.3.1 Joint association events
4.2.3.2 Update step
4.3 Experimental evaluation
4.3.1 Data
4.3.2 Algorithm settings
4.3.3 Example run
4.3.4 Statistical result
4.4 Summary
Chapter 5 Optimal motion control for robot audition
5.1 Cost function
5.1.1 Shannon entropy criterion
5.1.2 Standard deviation criterion
5.2 Monte Carlo tree search
5.2.1 Algorithm outline
5.2.2 Optimism in the face of uncertainty
5.3 Adapting MCTS for robot audition
5.3.1 Formulation
5.3.2 Selection
5.3.2.1 Bounded entropy
5.3.2.2 Bounded standard deviation
5.3.3 Expansion
5.3.4 Simulation
5.3.5 Backpropagation
5.4 Evaluation
5.4.1 Experimental protocol
5.4.2 Example trajectory
5.4.3 MCTS vs other motion planning approaches
5.4.3.1 Entropy criterion
5.4.3.2 Standard deviation criterion
5.4.4 Relation of both criteria with estimation error
5.4.5 Effect of the discount factor
5.4.5.1 Entropy criterion
5.4.5.2 Standard deviation criterion
5.5 Summary
Chapter 6 Conclusion and perspectives
6.1 Conclusion
6.2 Perspectives
Appendix A
Résumé en français
A.1 Introduction
A.2 État de l’art
A.3 Localisation d’une source en environnement réverbérant
A.4 Localisation de plusieurs sources
A.5 Planification de mouvement pour l’audition
A.6 Conclusion et perspectives
Bibliography