Skip to content

anonymous20211004/iclr2022-train-vf

Repository files navigation

VoiceFixer

VoiceFixer is a framework for general speech restoration. We aim at the restoration of severely degraded speech and historical speech.

46dPxJ.png

Usage

Environment

# Download dataset and prepare running environment
source init.sh 

General Speech Restoration (GSR)

Here we take VF_UNet(voicefixer with unet as analysis module) as an example. Other GSR model have the similar folder structure, training entry and evaluation logic.

Train from scratch

Change directory and run training entry.

cd general_speech_restoration/voicefixer/unet
# Use default settings
source run.sh

Also you can personalize your training:

  • By modifying aug_conf in voicefixer/config.py you can setup the augmentation parameters.
  • By changing --aug_effects flag in run.sh. You can have more augmentation effects.
    • For example: You can pass in --aug_effects low_pass clip reverb_rir reverb_freeverb high_pass treble bass fade
  • ...

After that, you will automatically get a log directory that look like this

├── unet
    ├── pc_log
        └── 2021-09-30-unet-#vocalsnoise#-#vctkvd_noise#-#gsr#-voicefixer_unet-l1#1500_44100#
            └── version_0
                ├── args.pkl
                └── checkpoints 
                    └── epoch=1.ckpt
                ├── code # Pin the version of code you use
                │   ├── config.py
                │   ├── dm_sr_rand_sr_order.py
                │   ├── get_model.py
                │   ├── handler.py
                │   ├── model.py
                │   ├── model_kqq.py
                │   ├── modules.py
                │   ├── run.sh
                │   └── train.py
                ├── events.out.MacBook-Pro.local.3521.0 # tensorboard
                ├── git_version # git commit id when you run the code
                └── hparams.yaml

Evaluation

Automatic evaluation and generate .csv file for the GSR evaluation results.

First you should stay in the following directory:

cd general_speech_restoration/voicefixer/unet

For example, if you like to evaluate on all testset.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  base \ 
                    -d  all_test_set 

For example, if you just wanna evaluate on GSR testset.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  general_speech_restoration \ 
                    -d  general_speech_restoration_eval 

There are generally seven testsets you can pass to -t:

  • base: all testset
  • clip: testset with speech that have clipping threshold of 0.1, 0.25, and 0.5
  • reverb: testset with reverberate speech
  • general_speech_restoration: testset with speech that contain all kinds of random distortions
  • enhancement: testset with noisy speech
  • speech_super_resolution: testset with low resolution speech that have sampling rate of 2kHz, 4kHz, 8kHz, 16kHz, and 24kHz.

And if you would like to evaluate on a small portion of data, e.g. 10 utterance.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  base \ 
                    -d  only_ten_utterance_for_each_testset \ 
                    -l 10 

Evaluation results will be presented in the exp_results folder.

To sum, handler.py accept the following input format.

# Basic usage
python3 handler.py  -c <str, path-to-checkpoint> \
                    -t <str, testset> \ 
                    -l <int, limit-utterance-number> \ 
                    -d <str, description of this evaluation> \ 

Single Task Speech Restoration (SSR)

Here we take enhancement UNet(Enh-UNet) as an example. Other SSR model have the similar folder structure, training entry and evaluation logic.

Train from scratch

Change directory and run training entry

cd single_task_speech_restoration/enhancement/unet
# Using default setting
source run.sh 

After that, you will automatically get a log directory that look like this

├── unet
    ├── pc_log
        └── 2021-09-30-unet-#vocalsnoise#-#vctkvd_noise#-##-fixed_4k_44k_mask_gan-l1#44100_44100#
            └── version_0
                ├── args.pkl
                ├── checkpoints
                ├── code
                │   ├── config.py
                │   ├── dm_sr_rand_sr_order.py
                │   ├── get_model.py
                │   ├── handler.py
                │   ├── model.py
                │   ├── model_kqq.py
                │   ├── modules.py
                │   ├── run.sh
                │   └── train.py
                ├── git_version
                └── hparams.yaml

Evaluation

Automatic evaluation and generate .csv file for the GSR evaluation results.

cd single_task_speech_restoration/declip/unet

For example, if you like to evaluate on clipping testset.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  clip \ 
                    -d  declipping_model_evaluation 

If you like to evaluate on GSR testset.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  general_speech_restoration \ 
                    -d  declipping_model_evaluation_with_GSR_testset 

There are generally seven testsets you can pass to -t:

  • base: all testset
  • clip: testset with speech that have clipping threshold of 0.1, 0.25, and 0.5
  • reverb: testset with reverberate speech
  • general_speech_restoration: testset with speech that contain all kinds of random distortions
  • enhancement: testset with noisy speech
  • speech_super_resolution: testset with low resolution speech that have sampling rate of 2kHz, 4kHz, 8kHz, 16kHz, and 24kHz.

And if you would like to evaluate on a small portion of data, e.g. 10 utterance.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  base \ 
                    -d  all_test_set \ 
                    -l 10 

Evaluation results will be presented in the exp_results folder.

To sum, handler.py accept the following input format.

# Basic usage
python3 handler.py  -c <str, path-to-checkpoint> \
                    -t <str, testset> \ 
                    -l <int, limit-utterance-number> \ 
                    -d <str, description of this evaluation> \ 

Project Structure

.
├── dataloaders 
│   ├── augmentation # code for speech data augmentation.
│   └── dataloader # code for different kinds of dataloaders.
├── datasets 
│   ├── datasetParser # code for preparing each dataset
│   └── se # Dataset for speech enhancement (source init.sh)
│       ├── RIR_44k # Room Impulse Response 44.1kHz
│       │   ├── test
│       │   └── train
│       ├── TestSets # Evaluation datasets
│       │   ├── ALL_GSR # General speech restoration testset
│       │   │   ├── simulated
│       │   │   └── target
│       │   ├── DECLI # Speech declipping testset
│       │   │   ├── 0.1 # Different clipping threshold
│       │   │   ├── 0.25
│       │   │   ├── 0.5
│       │   │   └── GroundTruth
│       │   ├── DENOISE # Speech enhancement testset
│       │   │   └── vd_test
│       │   │       ├── clean_testset_wav
│       │   │       └── noisy_testset_wav
│       │   ├── DEREV # Speech dereverberation testset
│       │   │   ├── GroundTruth
│       │   │   └── Reverb_Speech
│       │   └── SR # Speech super resolution testset
│       │       ├── GroundTruth
│       │       └── cheby1
│       │           ├── 1000 # Different cutoff frequencies
│       │           ├── 12000
│       │           ├── 2000
│       │           ├── 4000
│       │           └── 8000
│       ├── vd_noise # Noise training dataset
│       └── wav48 # Speech training dataset
│           ├── test # Not used, included for completeness
│           └── train 
├── evaluation # The code for model evaluation
├── exp_results # The Folder that store evaluation result (in handler.py).
├── general_speech_restoration # GSR 
│   ├── unet # GSR_UNet
│   │   └── model_kqq_lstm_mask_gan
│   └── voicefixer # Each folder contains the training entry for each model.
│       ├── dnn # VF_DNN
│       ├── lstm # VF_LSTM
│       ├── unet # VF_UNet
│       └── unet_small # VF_UNet_S
├── resources 
├── single_task_speech_restoration # SSR
│   ├── declip_unet # Declip_UNet
│   ├── derev_unet # Derev_UNet
│   ├── enh_unet # Enh_UNet
│   └── sr_unet # SR_UNet
├── tools
└── callbacks

real-life-example real-life-example real-life-example

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published