Skip to content

Sytronik/denoising-wavenet-pytorch

Repository files navigation

Multi-channel Speech Dereverberation using Denoising-Wavenet

model/dwavenet.py includes a PyTorch implementation of the DNN model proposed in A Wavenet For Speech Denoising.

But the training code is written for multi-channel speech dereverberation, not speech denoising.

For the original training code, refer to the authors' repo. It's the Keras + Theano version.

Requirements

Data

This repo requires following data:

  • The RIRs for spherical microphone array

    • It can be generated by SMIR generator.
    • The .mat file named hp.dict_path[f'RIR_{room_create}'] should include RIR_TRAIN and RIR_TEST.
    • The shapes of RIR_TRAIN and RIR_TEST are [No. of microphones $\times$ length of impulse response $\times$ No. of source-microphone positions].
  • The regularized inverse of the 0-th order mode strength $b_0^{-1}(kr)$ (bEQf)

    • It is required for monitoring validation input and output (using tensorboard).
    • The .mat file named hp.dict_path[f'bEQf.mat'] should include bEQf.
    • bEQf should have the shape of [No. of frequency bins $\times$ 1].

Packages

  • python >= 3.7 (or 3.6 with dataclasses backport)
  • numpy
  • scipy
  • matplotlib
  • PyTorch >= 1.0
  • tensorboardX >= 1.7
  • PySoundFile
  • librosa
  • tqdm
  • torchsummary

Dataset

create_mulchwave.py calculates spherical microphone array recordings from speech sources and the prepared RIRs.

Read docstring of create_mulchwave.py for usage.

Training and Testing

main.py is used to train or test DNNs.

_Hyperparameters.l_target is the target field length in the A Wavenet For Speech Denoising.

Read docstring of main.py for usage.

Evaluation Metrics

Source codes for PESQ, STOI, and fwSegSNR are in matlab_lib directory.

Frequency-domain SegSNR is implemented in audio_utils.py.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published