3D Multiple Sound Sources Localization (SSL)

The Steered Response Power Phase Transform (SRP-PHAT) is an important and robust algorithm to localize acoustic sound sources. However, the algorithm can only give us one location estimation. For multi-sources extension, we propose to use the Degraded Unmixing Estimation Technique (DUET) to separate each source and pass it to the SRP-PHAT algorithm to achieve multi-sources tracking.

Prepare an Environment

git clone https://github.com/BrownsugarZeer/Multi_SSL.git
cd Multi_SSL
python -m venv venv
venv\Scripts\activate.bat
pip install -r requirements.txt

Pyaudio requires some tricks to install on Windows. If the installation fails, finding unofficial wheels may be a available solution.

Hardware

The board is a far-field microphone array device capable of detecting voices up to 5m away even with the presence of background noise.

Running an Experiment

Using a microphone stream (online)

(venv) > python srp_phat_online.py  -s=1
Find 1 available sources.
azi:  184.4, ele:   46.4
===================================================
Find 1 available sources.
azi:  184.4, ele:   46.4
===================================================
Find 1 available sources.
azi:  276.1, ele:   39.2
===================================================
...

Using an audio file (offline)

# Automatically determine the number of sources
(venv) > python srp_phat_offline.py -s=1 -c=4 -i=None --wave=data/a0e20/50cm/a0e19_3_1b6ede00.wav
Find 1 available sources.
azi:    0.3, ele:   22.7

(venv) > python srp_phat_offline.py -s=2 -c=4 -i=None --wave=data/a0e20_a45e35/150cm/a0e19_a44e34_3_1c91d780.wav
Find 2 available sources.
azi:   50.8, ele:   43.2
azi:    2.7, ele:   26.2

Visualization

To easily show what's going on, we use plotly to plot the DOA on a sphere which diameter is 1 meter. The center of the sphere is the microphone array we place at p(x=0, y=0, z=0), the dark blue dots are the Directions of Arrival (DOA), and the lighter dots are the projections on each plane.

(venv) > python srp_visualizer.py -s=1 --wav=data/a0e20/50cm.csv

50cm

150cm

250cm

Issue

The algorithm has a high computational complexity thus making the algorithm unsuitable for real time applications. For estimating one source we need at least 0.3 seconds, estimating N sources we need at least (0.3*N) seconds,

References

S. Rickard, "The DUET blind source separation algorithm." Blind Speech Separation, pp. 217-241, 2007.
Dey, Ajoy Kumar, and Susmita Saha. "Acoustic Beamforming: Design and Development of Steered Response Power With Phase Transformation (SRP-PHAT)." (2011).
Ravanelli, Mirco, et al. "SpeechBrain: A General-Purpose Speech Toolkit." arXiv preprint arXiv:2106.04624 (2021).

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
img		img
records		records
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
srp_phat_offline.py		srp_phat_offline.py
srp_phat_online.py		srp_phat_online.py
srp_visualizer.py		srp_visualizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D Multiple Sound Sources Localization (SSL)

Prepare an Environment

Hardware

Running an Experiment

Visualization

Issue

References

About

Releases

Packages

Contributors 2

Languages

License

BrownsugarZeer/Multi_SSL

Folders and files

Latest commit

History

Repository files navigation

3D Multiple Sound Sources Localization (SSL)

Prepare an Environment

Hardware

Running an Experiment

Visualization

Issue

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages