This is the Tensorflow implementation of:
- [1] "Divide and conquer: A deep CASA approach to talker-independent monaural speaker separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, pp. 2092-2102.
- [2] "Causal deep CASA for monaural talker-independent speaker separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2109-2118.
Please find demos and keynote lecture slides at ASRU-19 here.
./feat/exp_prepare_folder.sh
: prepares folders for experiments../feat/feat_gen.py
: generates STFT featuers for training, validation and test../feat/stft.py
: defines STFT and iSTFT../nn/simul_group.py
: training/validation/test of the simultaneous grouping stage for non-causal deep CASA [1]../nn/seq_group.py
: training/validation/test of the sequential grouping stage for non-causal deep CASA [1]../nn/simul_group_causal.py
: training/validation/test of the simultaneous grouping stage for causal deep CASA [2]../nn/seq_group_causal.py
: training/validation/test of the sequential grouping stage for causal deep CASA [2]../nn/utility.py
: defines various functions for training/evaluation.
This codebase has been tested on AWS EC2 p3.2xlarge nodes with Deep Learning AMI (Ubuntu 18.04) Version 38.0.
Follow instructions in turn to set up the environment and run experiments.
-
Requirements:
- Python3
- Tensorflow 1.15.4.
Activate the environment on EC2 :source activate tensorflow_p37
- gflags
pip install python-gflags
- Please install other necessary python packages if not using AWS deep Learning AMI (Ubuntu 18.04) Version 38.0.
-
Before running experiments, activate the tensorflow environment on EC2 using:
source activate tensorflow_p37
The current code only supports single-GPU training/inference, please run the following command if your device has multiple GPUs:
export CUDA_VISIBLE_DEVICES=0
-
Generate the WSJ0-2mix dataset using
http://www.merl.com/demos/deep-clustering/create-speaker-mixtures.zip
. Copy the generated files to the EC2 instance. -
Start feature extraction by running the following command in the main directory:
python feat/feat_gen.py
Thre are two arguments in
feat_gen.py
,data_folder
andwav_list_folder
. Change them to where your WSJ0-2mix dataset and file list locate. -
Train the simultaneous grouping stage.
For non-causal deep CASA:TIME_STAMP=train_simul_group python nn/simul_group.py --time_stamp $TIME_STAMP --is_deploy 0 --batch_size 1
- Due to utterance-level training and limited GPU memory,
batch_size
can be set to 1 or 2. - Change
data_folder
andwav_list_folder
accordingly. - You can also change other hyperparameters, e.g., the number of epochs and learning rate, using gflags arguments.
For causal deep CASA:
TIME_STAMP=train_simul_group_causal python nn/simul_group_causal.py --time_stamp $TIME_STAMP --is_deploy 0 --batch_size 4
- Segment-level training is adopted.
batch_size
can be set to 4.
- Due to utterance-level training and limited GPU memory,
-
Run inference of simultaneous grouping (tt set).
For non-causal deep CASA:RESUME_MODEL=exp/deep_casa_wsj/models/train_simul_group/deep_casa_wsj_model.ckpt_step_1 python nn/simul_group.py --is_deploy 1 --resume_model $RESUME_MODEL
$RESUME_MODEL
is the model to be loaded for inference. Change it accordingly.- Mixtures, clean references and Dense-UNet estimates will be generated and saved in folder
./exp/deep_casa_wsj/output_tt/files/
. - Please use your own scripts to generate results in different metrics.
For causal deep CASA:
RESUME_MODEL=exp/deep_casa_wsj/models/train_simul_group_causal/deep_casa_wsj_model.ckpt_step_1 python nn/simul_group_causal.py --is_deploy 1 --resume_model $RESUME_MODEL
$RESUME_MODEL
is the model to be loaded for inference. Change it accordingly.
-
Generate temporary .npy file for the next stage (sequential grouping).
For non-causal deep CASA:RESUME_MODEL=exp/deep_casa_wsj/models/train_simul_group/deep_casa_wsj_model.ckpt_step_1 python nn/simul_group.py --is_deploy 2 --resume_model $RESUME_MODEL
- Setting
is_deploy
to 2 will generate unorganized estimates by Dense-UNet, and save them as .npy files for the sequential grouping stage. - tr, cv and tt data are generated in turn, and saved in
./exp/deep_casa_wsj/feat/
.
For causal deep CASA:
RESUME_MODEL=exp/deep_casa_wsj/models/train_simul_group_causal/deep_casa_wsj_model.ckpt_step_1 python nn/simul_group_causal.py --is_deploy 2 --resume_model $RESUME_MODEL
- Setting
-
Train the sequential grouping stage.
For non-causal deep CASA:TIME_STAMP=train_seq_group python nn/seq_group.py --time_stamp $TIME_STAMP --is_deploy 0
Change
data_folder
andwav_list_folder
accordingly. You can also change other hyperparameters, e.g., the number of epochs and learning rate, using gflags arguments.For causal deep CASA:
TIME_STAMP=train_seq_group_causal python nn/seq_group_causal.py --time_stamp $TIME_STAMP --is_deploy 0
-
Run inference of sequential grouping (tt set).
For non-causal deep CASA:RESUME_MODEL=exp/deep_casa_wsj/models/train_seq_group/deep_casa_wsj_model.ckpt_step_1 python nn/seq_group.py --is_deploy 1 --resume_model $RESUME_MODEL
$RESUME_MODEL
is the model to be loaded for inference. Change it accordingly.- Mixtures, clean references and estimates will be saved in folder
./exp/deep_casa_wsj/output_tt/files/
. - Please use your own scripts to generate results in different metrics.
For causal deep CASA:
RESUME_MODEL=exp/deep_casa_wsj/models/train_seq_group_causal/deep_casa_wsj_model.ckpt_step_1 python nn/seq_group_causal.py --is_deploy 1 --resume_model $RESUME_MODEL
$RESUME_MODEL
is the model to be loaded for inference. Change it accordingly.