Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How long does training take usually in 1 GPU(Nvidia 1080Ti) use case? #26

Open
sekigh opened this issue Feb 10, 2020 · 1 comment
Open

Comments

@sekigh
Copy link

sekigh commented Feb 10, 2020

I started run.sh to initiate training session(stage2) and is still running after full 4days. I checked htop and nvidia-smi and both tell the training is running. Does it normally take that long ? Or does anything wrong happen? The run.sh parameter setting is default as listed below.

#!/bin/bash

Created on 2018/12

Author: Kaituo XU

-- START IMPORTANT

* If you have mixture wsj0 audio, modify data to your path that including tr, cv and tt.

* If you jsut have origin sphere format wsj0 , modify wsj0_origin to your path and

modify wsj0_wav to path that put output wav format wsj0, then read and run stage 1 part.

After that, modify data and run from stage 2.

wsj0_origin=/home/xxx/xxxx/Speech_Corpus/csr_1
wsj0_wav=/home/xxx/xxxxx/Speech_Corpus/wsj0-wav/wsj0
data=/home/xxx/xxxxx/Speech_Corpus/wsj-mix/2speakers/wav8k/min/
stage=2 # Modify this to control to start from which stage

-- END

dumpdir=data # directory to put generated json file

-- START Conv-TasNet Config

train_dir=$dumpdir/tr
valid_dir=$dumpdir/cv
evaluate_dir=$dumpdir/tt
separate_dir=$dumpdir/tt
sample_rate=8000
segment=4 # seconds
cv_maxlen=6 # seconds

Network config

N=256
L=20
B=256
H=512
P=3
X=8
R=4
norm_type=gLN
causal=0
mask_nonlinear='relu'
C=2

Training config

use_cuda=1
id=0
epochs=100
half_lr=1
early_stop=0
max_norm=5

minibatch

shuffle=1
batch_size=3
num_workers=4

optimizer

optimizer=adam
lr=1e-3
momentum=0
l2=0

save and visualize

checkpoint=0
continue_from=""
print_freq=10
visdom=0
visdom_epoch=0
visdom_id="Conv-TasNet Training"

evaluate

ev_use_cuda=0
cal_sdr=1

-- END Conv-TasNet Config

exp tag

tag="" # tag for managing experiments.

ngpu=1 # always 1

. utils/parse_options.sh || exit 1;
. ./cmd.sh
. ./path.sh

if [ $stage -le 0 ]; then
echo "Stage 0: Convert sphere format to wav format and generate mixture"
local/data_prepare.sh --data ${wsj0_origin} --wav_dir ${wsj0_wav}

echo "NOTE: You should generate mixture by yourself now.
You can use tools/create-speaker-mixtures.zip which is download from
http://www.merl.com/demos/deep-clustering/create-speaker-mixtures.zip
If you don't have Matlab and want to use Octave, I suggest to replace
all mkdir(...) in create_wav_2speakers.m with system(['mkdir -p '...])
due to mkdir in Octave can not work in 'mkdir -p' way.
e.g.:
mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type}]);
->
system(['mkdir -p ' output_dir16k '/' min_max{i_mm} '/' data_type{i_type}]);"
exit 1
fi

if [ $stage -le 1 ]; then
echo "Stage 1: Generating json files including wav path and duration"
[ ! -d $dumpdir ] && mkdir $dumpdir
preprocess.py --in-dir $data --out-dir $dumpdir --sample-rate $sample_rate
fi

if [ -z ${tag} ]; then
expdir=exp/train_r${sample_rate}_N${N}_L${L}_B${B}_H${H}_P${P}_X${X}_R${R}C${C}${norm_type}causal${causal}${mask_nonlinear}_epoch${epochs}_half${half_lr}_norm${max_norm}_bs${batch_size}worker${num_workers}${optimizer}_lr${lr}mmt${momentum}l2${l2}basename $train_dir
else
expdir=exp/train
${tag}
fi

if [ $stage -le 2 ]; then
echo "Stage 2: Training"
${cuda_cmd} --gpu ${ngpu} ${expdir}/train.log
CUDA_VISIBLE_DEVICES="$id"
train.py
--train_dir $train_dir
--valid_dir $valid_dir
--sample_rate $sample_rate
--segment $segment
--cv_maxlen $cv_maxlen
--N $N
--L $L
--B $B
--H $H
--P $P
--X $X
--R $R
--C $C
--norm_type $norm_type
--causal $causal
--mask_nonlinear $mask_nonlinear
--use_cuda $use_cuda
--epochs $epochs
--half_lr $half_lr
--early_stop $early_stop
--max_norm $max_norm
--shuffle $shuffle
--batch_size $batch_size
--num_workers $num_workers
--optimizer $optimizer
--lr $lr
--momentum $momentum
--l2 $l2
--save_folder ${expdir}
--checkpoint $checkpoint
--continue_from "$continue_from"
--print_freq ${print_freq}
--visdom $visdom
--visdom_epoch $visdom_epoch
--visdom_id "$visdom_id"
fi
(the rest omitted)

@sekigh
Copy link
Author

sekigh commented Feb 12, 2020

Anyway, I got the training and the rest finished in 6 days. Is it possible to use 2 gpus to process the training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant