VoiceFixer

VoiceFixer is a framework for general speech restoration. We aim at the restoration of severely degraded speech and historical speech.

VoiceFixer
- Usage
- Project Structure

Usage

Environment

# Download dataset and prepare running environment
source init.sh

General Speech Restoration (GSR)

Here we take VF_UNet(voicefixer with unet as analysis module) as an example. Other GSR model have the similar folder structure, training entry and evaluation logic.

Train from scratch

Change directory and run training entry.

cd general_speech_restoration/voicefixer/unet
# Use default settings
source run.sh

Also you can personalize your training:

By modifying aug_conf in voicefixer/config.py you can setup the augmentation parameters.
By changing --aug_effects flag in run.sh. You can have more augmentation effects.
- For example: You can pass in --aug_effects low_pass clip reverb_rir reverb_freeverb high_pass treble bass fade
...

After that, you will automatically get a log directory that look like this

├── unet
    ├── pc_log
        └── 2021-09-30-unet-#vocalsnoise#-#vctkvd_noise#-#gsr#-voicefixer_unet-l1#1500_44100#
            └── version_0
                ├── args.pkl
                └── checkpoints 
                    └── epoch=1.ckpt
                ├── code # Pin the version of code you use
                │   ├── config.py
                │   ├── dm_sr_rand_sr_order.py
                │   ├── get_model.py
                │   ├── handler.py
                │   ├── model.py
                │   ├── model_kqq.py
                │   ├── modules.py
                │   ├── run.sh
                │   └── train.py
                ├── events.out.MacBook-Pro.local.3521.0 # tensorboard
                ├── git_version # git commit id when you run the code
                └── hparams.yaml

Evaluation

Automatic evaluation and generate .csv file for the GSR evaluation results.

First you should stay in the following directory:

cd general_speech_restoration/voicefixer/unet

For example, if you like to evaluate on all testset.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  base \ 
                    -d  all_test_set

For example, if you just wanna evaluate on GSR testset.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  general_speech_restoration \ 
                    -d  general_speech_restoration_eval

There are generally seven testsets you can pass to -t:

base: all testset
clip: testset with speech that have clipping threshold of 0.1, 0.25, and 0.5
reverb: testset with reverberate speech
general_speech_restoration: testset with speech that contain all kinds of random distortions
enhancement: testset with noisy speech
speech_super_resolution: testset with low resolution speech that have sampling rate of 2kHz, 4kHz, 8kHz, 16kHz, and 24kHz.

And if you would like to evaluate on a small portion of data, e.g. 10 utterance.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  base \ 
                    -d  only_ten_utterance_for_each_testset \ 
                    -l 10

Evaluation results will be presented in the exp_results folder.

To sum, handler.py accept the following input format.

# Basic usage
python3 handler.py  -c <str, path-to-checkpoint> \
                    -t <str, testset> \ 
                    -l <int, limit-utterance-number> \ 
                    -d <str, description of this evaluation> \

Single Task Speech Restoration (SSR)

Here we take enhancement UNet(Enh-UNet) as an example. Other SSR model have the similar folder structure, training entry and evaluation logic.

Train from scratch

Change directory and run training entry

cd single_task_speech_restoration/enhancement/unet
# Using default setting
source run.sh

After that, you will automatically get a log directory that look like this

├── unet
    ├── pc_log
        └── 2021-09-30-unet-#vocalsnoise#-#vctkvd_noise#-##-fixed_4k_44k_mask_gan-l1#44100_44100#
            └── version_0
                ├── args.pkl
                ├── checkpoints
                ├── code
                │   ├── config.py
                │   ├── dm_sr_rand_sr_order.py
                │   ├── get_model.py
                │   ├── handler.py
                │   ├── model.py
                │   ├── model_kqq.py
                │   ├── modules.py
                │   ├── run.sh
                │   └── train.py
                ├── git_version
                └── hparams.yaml

Evaluation

Automatic evaluation and generate .csv file for the GSR evaluation results.

cd single_task_speech_restoration/declip/unet

For example, if you like to evaluate on clipping testset.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  clip \ 
                    -d  declipping_model_evaluation

If you like to evaluate on GSR testset.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  general_speech_restoration \ 
                    -d  declipping_model_evaluation_with_GSR_testset

There are generally seven testsets you can pass to -t:

base: all testset
clip: testset with speech that have clipping threshold of 0.1, 0.25, and 0.5
reverb: testset with reverberate speech
general_speech_restoration: testset with speech that contain all kinds of random distortions
enhancement: testset with noisy speech
speech_super_resolution: testset with low resolution speech that have sampling rate of 2kHz, 4kHz, 8kHz, 16kHz, and 24kHz.

And if you would like to evaluate on a small portion of data, e.g. 10 utterance.

python3 handler.py  -c  log/2021-09-30-xxx/version_0/checkpoints/epoch=1.ckpt \
                    -t  base \ 
                    -d  all_test_set \ 
                    -l 10

Evaluation results will be presented in the exp_results folder.

To sum, handler.py accept the following input format.

# Basic usage
python3 handler.py  -c <str, path-to-checkpoint> \
                    -t <str, testset> \ 
                    -l <int, limit-utterance-number> \ 
                    -d <str, description of this evaluation> \

Project Structure

.
├── dataloaders 
│   ├── augmentation # code for speech data augmentation.
│   └── dataloader # code for different kinds of dataloaders.
├── datasets 
│   ├── datasetParser # code for preparing each dataset
│   └── se # Dataset for speech enhancement (source init.sh)
│       ├── RIR_44k # Room Impulse Response 44.1kHz
│       │   ├── test
│       │   └── train
│       ├── TestSets # Evaluation datasets
│       │   ├── ALL_GSR # General speech restoration testset
│       │   │   ├── simulated
│       │   │   └── target
│       │   ├── DECLI # Speech declipping testset
│       │   │   ├── 0.1 # Different clipping threshold
│       │   │   ├── 0.25
│       │   │   ├── 0.5
│       │   │   └── GroundTruth
│       │   ├── DENOISE # Speech enhancement testset
│       │   │   └── vd_test
│       │   │       ├── clean_testset_wav
│       │   │       └── noisy_testset_wav
│       │   ├── DEREV # Speech dereverberation testset
│       │   │   ├── GroundTruth
│       │   │   └── Reverb_Speech
│       │   └── SR # Speech super resolution testset
│       │       ├── GroundTruth
│       │       └── cheby1
│       │           ├── 1000 # Different cutoff frequencies
│       │           ├── 12000
│       │           ├── 2000
│       │           ├── 4000
│       │           └── 8000
│       ├── vd_noise # Noise training dataset
│       └── wav48 # Speech training dataset
│           ├── test # Not used, included for completeness
│           └── train 
├── evaluation # The code for model evaluation
├── exp_results # The Folder that store evaluation result (in handler.py).
├── general_speech_restoration # GSR 
│   ├── unet # GSR_UNet
│   │   └── model_kqq_lstm_mask_gan
│   └── voicefixer # Each folder contains the training entry for each model.
│       ├── dnn # VF_DNN
│       ├── lstm # VF_LSTM
│       ├── unet # VF_UNet
│       └── unet_small # VF_UNet_S
├── resources 
├── single_task_speech_restoration # SSR
│   ├── declip_unet # Declip_UNet
│   ├── derev_unet # Derev_UNet
│   ├── enh_unet # Enh_UNet
│   └── sr_unet # SR_UNet
├── tools
└── callbacks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceFixer

Usage

Environment

General Speech Restoration (GSR)

Train from scratch

Evaluation

Single Task Speech Restoration (SSR)

Train from scratch

Evaluation

Project Structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
callbacks		callbacks
dataloaders		dataloaders
datasets		datasets
evaluation		evaluation
general_speech_restoration		general_speech_restoration
single_task_speech_restoration		single_task_speech_restoration
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.MD		README.MD
init.sh		init.sh

License

anonymous20211004/iclr2022-train-vf

Folders and files

Latest commit

History

Repository files navigation

VoiceFixer

Usage

Environment

General Speech Restoration (GSR)

Train from scratch

Evaluation

Single Task Speech Restoration (SSR)

Train from scratch

Evaluation

Project Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages