The Steered Response Power Phase Transform (SRP-PHAT) is an important and robust algorithm to localize acoustic sound sources. However, the algorithm can only give us one location estimation. For multi-sources extension, we propose to use the Degraded Unmixing Estimation Technique (DUET) to separate each source and pass it to the SRP-PHAT algorithm to achieve multi-sources tracking.
git clone https://github.com/BrownsugarZeer/Multi_SSL.git
cd Multi_SSL
python -m venv venv
venv\Scripts\activate.bat
pip install -r requirements.txt
Pyaudio requires some tricks to install on Windows. If the installation fails, finding unofficial wheels may be a available solution.
The board is a far-field microphone array device capable of detecting voices up to 5m away even with the presence of background noise.
- Using a microphone stream (online)
(venv) > python srp_phat_online.py -s=1
Find 1 available sources.
azi: 184.4, ele: 46.4
===================================================
Find 1 available sources.
azi: 184.4, ele: 46.4
===================================================
Find 1 available sources.
azi: 276.1, ele: 39.2
===================================================
...
- Using an audio file (offline)
# Automatically determine the number of sources
(venv) > python srp_phat_offline.py -s=1 -c=4 -i=None --wave=data/a0e20/50cm/a0e19_3_1b6ede00.wav
Find 1 available sources.
azi: 0.3, ele: 22.7
(venv) > python srp_phat_offline.py -s=2 -c=4 -i=None --wave=data/a0e20_a45e35/150cm/a0e19_a44e34_3_1c91d780.wav
Find 2 available sources.
azi: 50.8, ele: 43.2
azi: 2.7, ele: 26.2
To easily show what's going on, we use plotly to plot the DOA on a sphere which diameter is 1 meter. The center of the sphere is the microphone array we place at p(x=0, y=0, z=0), the dark blue dots are the Directions of Arrival (DOA), and the lighter dots are the projections on each plane.
(venv) > python srp_visualizer.py -s=1 --wav=data/a0e20/50cm.csv
- The algorithm has a high computational complexity thus making the algorithm unsuitable for real time applications. For estimating one source we need at least 0.3 seconds, estimating N sources we need at least (0.3*N) seconds,
-
S. Rickard, "The DUET blind source separation algorithm." Blind Speech Separation, pp. 217-241, 2007.
-
Dey, Ajoy Kumar, and Susmita Saha. "Acoustic Beamforming: Design and Development of Steered Response Power With Phase Transformation (SRP-PHAT)." (2011).
-
Ravanelli, Mirco, et al. "SpeechBrain: A General-Purpose Speech Toolkit." arXiv preprint arXiv:2106.04624 (2021).