model/dwavenet.py
includes a PyTorch implementation of the DNN model proposed in A Wavenet For Speech Denoising.
But the training code is written for multi-channel speech dereverberation, not speech denoising.
For the original training code, refer to the authors' repo. It's the Keras + Theano version.
This repo requires following data:
-
The RIRs for spherical microphone array
- It can be generated by SMIR generator.
- The .mat file named
hp.dict_path[f'RIR_{room_create}']
should includeRIR_TRAIN
andRIR_TEST
. - The shapes of
RIR_TRAIN
andRIR_TEST
are [No. of microphones$\times$ length of impulse response$\times$ No. of source-microphone positions].
-
The regularized inverse of the 0-th order mode strength
$b_0^{-1}(kr)$ (bEQf
)- It is required for monitoring validation input and output (using tensorboard).
- The .mat file named
hp.dict_path[f'bEQf.mat']
should includebEQf
. -
bEQf
should have the shape of [No. of frequency bins$\times$ 1].
- python >= 3.7 (or 3.6 with dataclasses backport)
- numpy
- scipy
- matplotlib
- PyTorch >= 1.0
- tensorboardX >= 1.7
- PySoundFile
- librosa
- tqdm
- torchsummary
create_mulchwave.py
calculates spherical microphone array recordings from speech sources and the prepared RIRs.
Read docstring of create_mulchwave.py
for usage.
main.py
is used to train or test DNNs.
_Hyperparameters.l_target
is the target field length in the A Wavenet For Speech Denoising.
Read docstring of main.py
for usage.
Source codes for PESQ, STOI, and fwSegSNR are in matlab_lib
directory.
Frequency-domain SegSNR is implemented in audio_utils.py
.