Skip to content

DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement, Conference on Neural Information Processing Systems (NeurIPS), 2023

Notifications You must be signed in to change notification settings

ICDM-UESTC/DOSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 

Repository files navigation

DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement

Newest

We are currently finishing the extension paper for this work, and the project page will be released along with the completion of the extension.

Brief

DOSE employs two efficient condition-augmentation techniques to address the challenge of incorporating condition information into DDPMs for SE, based on two key insights:

  • We force the model to prioritize the condition factor when generating samples by training it with dropout operation;
  • We incorporate the condition information into the sampling process by providing an informative adaptive prior.

Experiments demonstrate that our approach yields substantial improvements in high-quality and stable speech generation, consistency with the condition factor, and efficiency.

We upload the pre-trained model, trained on VB with 0.5 as the dropout ratio:

csig:3.8357 cbak:3.2350 covl:3.1840 pesq:2.5430 ssnr:8.9398 stoi:0.9335 on VB (step 1=40, step 2=15)

csig:2.8673 cbak:2.1805 covl:2.1647 pesq:1.5709 ssnr:1.6121 stoi:0.8673 on CHIME-4 (step 1=35, step 2=0)

Environment Requirements

Note: be careful with the repo version, especially PESQ

We run the code on a computer with RTX-3090, i7 13700KF, and 128G memory. The code was tested with python 3.8.13, pytorch 1.13.1, cudatoolkit 11.7.0. Install the dependencies via Anaconda:

# create a virtual environment
conda create --name DOSE python=3.8.13

# activate environment
conda activate DOSE

# install pytorch & cudatoolkit
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

# install speech metrics repo:
pip install https://github.com/ludlows/python-pesq/archive/master.zip
pip install pystoi
pip install librosa

# install utils (we use ``tensorboard`` for logging)
pip install tensorboard

How to train

Before you start training, you'll need to prepare a training dataset. The default dataset is VOICEBANK-DEMAND dataset. You can download them from VOICEBANK-DEMAND and resample them to 16 kHz. By default, this implementation assumes the sampling steps are 35&15 steps and the sample rate of 16 kHz. If you need to change these values, edit params.py.

We train the model via running:

python src/DOSE/__main__.py /path/to/model

How to inference

We generate the audio via running:

python src/DOSE/inference.py /path/to/model /path/to/condition /path/to/output_dir

How to evaluate

We evaluate the generated samples via running:

python src/DOSE/metric.py /path/to/clean_speech /path/to/output_dir

Folder Structure

└── DOSE──
	├── src
	│	├── init.py 
	│	├── main.py # run the model for training
	│	├── dataset.py # Preprocess the dataset and fill/crop the speech for the model running
	│	├── inference.py # Run model for inferencing speech and adjust inference-steps
	│	├── learner.py # Load the model params for training/inferencing and saving checkpoints
	│	├── model.py # The neural network code of the proposed DOSE
	│	├── params.py # The diffusions, model, and speech params
	└── README.md

The code of DOSE is developed based on the code of Diffwave

Correction of Typographical Error in Experimental Results

We have identified a typographical error in the experimental results presented in our paper. We sincerely apologize for any inconvenience this may have caused. Specifically, the error is in Table 1, under the "CHiME4" dataset for the "Unprocessed" method. The correct values are:

  • Original incorrect value:

    STOI: 72.3 & PESQ: 1.22 & CSIG: 2.21 & CBAK: 1.95 & COVL: 1.63

  • Corrected value:

    STOI: 87.0 & PESQ: 1.27 & CSIG: 2.61 & CBAK: 1.92 & COVL: 1.88

About

DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement, Conference on Neural Information Processing Systems (NeurIPS), 2023

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages