VAD system based on Deep Neural Networks (DNN) and feature fusion (Gammatone, Gabor, Long-term Spectral Variability and voicing).
System was developed as part of the RATS (Robust Automated Transcription of Speech) program of DARPA.
-
open Matlab
-
run the script apply_vad(path/to/audio):
-
a figure will appear that shows the original signal, and VAD labels, given a directory of audio wav files.
-
to control the accuracy (depending on how noisy the files are), you can play with the parameters p1 and p2
-
additional info
Apply Voice Activity Detection to all files in a specified audio directory --IN-- audiodir: directory of audio files (WAV format) p1: speech/non-speech threshold [default:0.1] p2: speech region smoothing [default:20] --OUT-- vadout: VAD labels at frame level (10 ms)
Van Segbroeck, Maarten, Andreas Tsiartas, and Shrikanth Narayanan. "A robust frontend for VAD: exploiting contextual, discriminative and spectral cues of human voice." INTERSPEECH. 2013.
bibtex
@inproceedings{van2013robust, title={A robust frontend for VAD: exploiting contextual, discriminative and spectral cues of human voice.}, author={Van Segbroeck, Maarten and Tsiartas, Andreas and Narayanan, Shrikanth}, booktitle={INTERSPEECH}, pages={704--708}, year={2013} }