Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Emrys365 committed Dec 1, 2020
2 parents 8a06381 + 747c46d commit 63f88c0
Show file tree
Hide file tree
Showing 106 changed files with 39,553 additions and 238 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ We list the character error rate (CER) and word error rate (WER) of major ASR ta
| **ESPnet2** CSJ eval1/eval2/eval3 | 4.5/3.3/3.6 | N/A | [link](https://github.com/espnet/espnet/tree/master/egs2/csj/asr1#initial-conformer-results) |
| HKUST dev | 23.5 | N/A | [link](https://github.com/espnet/espnet/blob/master/egs/hkust/asr1/RESULTS.md#transformer-only-20-epochs) |
| Librispeech dev_clean/dev_other/test_clean/test_other | N/A | 1.9/4.9/2.1/4.9 | [link](https://github.com/espnet/espnet/blob/master/egs/librispeech/asr1/RESULTS.md#pytorch-large-conformer-with-specaug--speed-perturbation-8-gpus--transformer-lm-4-gpus) |
| Switchboard (eval2000) callhm/swbd | N/A | 14.0/6.8 | [link](https://github.com/espnet/espnet/blob/master/egs/swbd/asr1/RESULTS.md#conformer-with-bpe-2000-specaug-speed-perturbation-transformer-lm-decoding) |
| TEDLIUM2 dev/test | N/A | 8.6/7.2 | [link](https://github.com/espnet/espnet/blob/master/egs/tedlium2/asr1/RESULTS.md#conformer-large-model--specaug--speed-perturbation--rnnlm) |
| TEDLIUM3 dev/test | N/A | 9.6/7.6 | [link](https://github.com/espnet/espnet/blob/master/egs/tedlium3/asr1/RESULTS.md) |
| WSJ dev93/eval92 | 3.2/2.1 | 7.0/4.7 | N/A |
Expand Down
6 changes: 5 additions & 1 deletion ci/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,12 @@ ${CXX:-g++} -v
if ${USE_CONDA}; then
./setup_anaconda.sh venv espnet ${ESPNET_PYTHON_VERSION}
else
./setup_python.sh "$(which python3)" venv
./setup_python.sh "$(command -v python3)" venv
fi
# Temporary fix pip version to avoid PEP440
# See: https://github.com/pypa/pip/issues/8745
. ./activate_python.sh
pip3 install pip==20.2.4
make TH_VERSION="${TH_VERSION}"

make nkf.done moses.done mwerSegmenter.done pesq pyopenjtalk.done py3mmseg.done
Expand Down
26 changes: 26 additions & 0 deletions doc/espnet2_training_option.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,32 @@ for epoch in range(max_epoch):

Therefore, the training can be resumed at the start of the epoch.

## Weights & Biases integration

About Weights & Biases: https://docs.wandb.com/

1. Installation and setup

See: https://docs.wandb.com/quickstart

```sh
wandb login
```
1. Enable wandb

```sh
python -m espnet2.bin.asr_train --use_wandb true
```

and go to the shown URL.
1. [Option] To use HTTPS PROXY
```sh
export HTTPS_PROXY=...your proxy
export CURL_CA_BUNDLE=your.pem
export CURL_CA_BUNDLE= # Disable SSL certificate verification
```


## Multi GPUs

```bash
Expand Down
4 changes: 3 additions & 1 deletion egs/csmsc/tts1/local/data_download.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash -e
#!/bin/bash

# Copyright 2019 Nagoya University (Tomoki Hayashi)
# Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
Expand All @@ -11,6 +11,8 @@ if [ $# != 1 ]; then
exit 1
fi

set -euo pipefail

# download dataset
cwd=$(pwd)
if [ ! -e ${db}/CSMSC ]; then
Expand Down
6 changes: 3 additions & 3 deletions egs/csmsc/tts1/local/data_prep.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ echo "Successfully finished making spk2utt."
# make text and segments
find ${db}/PhoneLabeling -name "*.interval" -follow | sort | while read -r filename;do
id="$(basename ${filename} .interval)"
content=$(tail -n +13 ${filename} | grep "\"" | grep -v "sil" | sed -e "s/\"//g" | tr "\n" " " | sed -e "s/ $//g")
start_sec=$(tail -n +14 ${filename} | head -n 1)
end_sec=$(head -n -2 ${filename} | tail -n 1)
content=$(nkf -Lu -w ${filename} | tail -n +13 | grep "\"" | grep -v "sil" | sed -e "s/\"//g" | tr "\n" " " | sed -e "s/ $//g")
start_sec=$(nkf -Lu -w ${filename} | tail -n +14 | head -n 1)
end_sec=$(nkf -Lu -w ${filename} | head -n -2 | tail -n 1)
echo "${id} ${content}" >> ${text}
echo "${id} ${id} ${start_sec} ${end_sec}" >> ${segments}
done
Expand Down
21 changes: 12 additions & 9 deletions egs/swbd/asr1/RESULTS.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
# conformer with BPE 2000, specaug, LM decoding
# conformer with BPE 2000, specaug, speed perturbation, Transformer LM decoding
## Models
- model link: https://drive.google.com/file/d/1n5yD08H8g0bMLeqPXx7MZ8IU5S-o681E/view
- training config file: `conf/train.yaml`
- model link: https://drive.google.com/file/d/1FhDeQ4eFxBsnGitZkG7rgScZB0oNpeg7/view
- training config file: `conf/tuning/train_pytorch_conformer_lr5.yaml`
- preprocess config file: `conf/specaug.yaml`
- decoding config file: `conf/decode.yaml`

## WER
```
exp/train_nodup_pytorch_train_pytorch_conformer_specaug/decode_eval2000_model.last10.avg.best_decode_lm/scoring/hyp.callhm.ctm.filt.sys:| SPK | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
exp/train_nodup_pytorch_train_pytorch_conformer_specaug/decode_eval2000_model.last10.avg.best_decode_lm/scoring/hyp.callhm.ctm.filt.sys:| SumAvg | 2628 21594 | 86.3 10.3 3.4 2.5 16.2 51.9 |
exp/train_nodup_pytorch_train_pytorch_conformer_specaug/decode_eval2000_model.last10.avg.best_decode_lm/scoring/hyp.ctm.filt.sys:| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
exp/train_nodup_pytorch_train_pytorch_conformer_specaug/decode_eval2000_model.last10.avg.best_decode_lm/scoring/hyp.ctm.filt.sys:| Sum/Avg | 4459 42989 | 89.8 7.5 2.7 1.7 11.9 46.8 |
exp/train_nodup_pytorch_train_pytorch_conformer_specaug/decode_eval2000_model.last10.avg.best_decode_lm/scoring/hyp.swbd.ctm.filt.sys:| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
exp/train_nodup_pytorch_train_pytorch_conformer_specaug/decode_eval2000_model.last10.avg.best_decode_lm/scoring/hyp.swbd.ctm.filt.sys:| Sum/Avg | 1831 21395 | 93.4 4.6 2.1 1.0 7.7 39.4 |
exp_sp/train_nodup_sp_pytorch_train_pytorch_conformer_lr5_specaug_resume/decode_eval2000_model.last10.avg.best_decode_train_transformer_lm_pytorch_swbd+fisher_bpe2000/scoring/hyp.callhm.ctm.filt.sys
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 2628 21594 | 87.9 8.9 3.2 2.0 14.0 49.8 |
exp_sp/train_nodup_sp_pytorch_train_pytorch_conformer_lr5_specaug_resume/decode_eval2000_model.last10.avg.best_decode_train_transformer_lm_pytorch_swbd+fisher_bpe2000/scoring/hyp.ctm.filt.sys
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 4459 42989 | 91.0 6.5 2.5 1.4 10.4 44.5 |
exp_sp/train_nodup_sp_pytorch_train_pytorch_conformer_lr5_specaug_resume/decode_eval2000_model.last10.avg.best_decode_train_transformer_lm_pytorch_swbd+fisher_bpe2000/scoring/hyp.swbd.ctm.filt.sys
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 1831 21395 | 94.1 4.1 1.9 0.9 6.8 36.9 |
```

# transformer with BPE 2000, specaug, LM decoding
Expand Down
8 changes: 0 additions & 8 deletions egs/swbd/asr1/conf/lm.yaml

This file was deleted.

1 change: 1 addition & 0 deletions egs/swbd/asr1/conf/lm.yaml
2 changes: 1 addition & 1 deletion egs/swbd/asr1/conf/train.yaml
8 changes: 8 additions & 0 deletions egs/swbd/asr1/conf/tuning/lm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
layer: 2
unit: 650
opt: sgd # or adam
sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
batchsize: 256 # batch size in LM training
epoch: 20 # if the data size is large, we can reduce this
patience: 3
maxlen: 100 # if sentence length > lm_maxlen, lm_batchsize is automatically reduced
11 changes: 11 additions & 0 deletions egs/swbd/asr1/conf/tuning/lm_transformer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
batchsize: 256
dropout: 0.1
epoch: 100
layer: 4
maxlen: 150
opt: sgd
patience: 0
sortagrad: 0
unit: 1024
att-unit: 256
model-module: transformer
48 changes: 48 additions & 0 deletions egs/swbd/asr1/conf/tuning/train_pytorch_conformer_lr5.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# network architecture
# encoder related
elayers: 12
eunits: 2048
# decoder related
dlayers: 6
dunits: 2048
# attention related
adim: 256
aheads: 4

# hybrid CTC/attention
mtlalpha: 0.2

# label smoothing
lsm-weight: 0.1

# minibatch related
batch-size: 32
maxlen-in: 512 # if input length > maxlen-in, batchsize is automatically reduced
maxlen-out: 150 # if output length > maxlen-out, batchsize is automatically reduced

# optimization related
sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
opt: noam
accum-grad: 8
grad-clip: 5
patience: 0
epochs: 100
dropout-rate: 0.1

# transformer specific setting
backend: pytorch
model-module: "espnet.nets.pytorch_backend.e2e_asr_conformer:E2E"
transformer-input-layer: conv2d # encoder architecture type
transformer-lr: 5.0
transformer-warmup-steps: 25000
transformer-attn-dropout-rate: 0.0
transformer-length-normalized-loss: false
transformer-init: pytorch

# conformer specific setting
transformer-encoder-pos-enc-layer-type: rel_pos
transformer-encoder-selfattn-layer-type: rel_selfattn
transformer-encoder-activation-type: swish
macaron-style: true
use-cnn-module: true
cnn-module-kernel: 31
31 changes: 23 additions & 8 deletions egs/swbd/asr1/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,9 @@ set -e
set -u
set -o pipefail

train_set=train_nodup
train_dev=train_dev
recog_set="train_dev eval2000 rt03"
train_set=train_nodup_sp
train_dev=train_dev_trim
recog_set="train_dev_trim eval2000 rt03"

if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
### Task dependent. You have to make data the following preparation part by yourself.
Expand Down Expand Up @@ -107,10 +107,25 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
utils/fix_data_dir.sh data/${x}
done

utils/subset_data_dir.sh --first data/train 4000 data/${train_dev} # 5hr 6min
utils/subset_data_dir.sh --first data/train 4000 data/train_dev # 5hr 6min
n=$(($(wc -l < data/train/segments) - 4000))
utils/subset_data_dir.sh --last data/train ${n} data/train_nodev
utils/data/remove_dup_utts.sh 300 data/train_nodev data/${train_set} # 286hr
utils/data/remove_dup_utts.sh 300 data/train_nodev data/train_nodup # 286hr

# remove utt having > 2000 frames or < 10 frames or
# remove utt having > 400 characters or 0 characters
remove_longshortdata.sh --maxchars 400 data/train data/train_nodup_trim
remove_longshortdata.sh --maxchars 400 data/dev data/${train_dev}

# speed-perturbed
utils/perturb_data_dir_speed.sh 0.9 data/train_nodup_trim data/temp1
utils/perturb_data_dir_speed.sh 1.0 data/train_nodup_trim data/temp2
utils/perturb_data_dir_speed.sh 1.1 data/train_nodup_trim data/temp3
utils/combine_data.sh --extra-files utt2uniq data/${train_set} data/temp1 data/temp2 data/temp3
rm -r data/temp1 data/temp2 data/temp3
steps/make_fbank_pitch.sh --cmd "$train_cmd" --nj 32 --write_utt2num_frames true \
data/${train_set} exp/make_fbank/${train_set} ${fbankdir}
utils/fix_data_dir.sh data/${train_set}

# compute global CMVN
compute-cmvn-stats scp:data/${train_set}/feats.scp data/${train_set}/cmvn.ark
Expand Down Expand Up @@ -138,8 +153,8 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
done
fi

dict=data/lang_char/${train_set}_${bpemode}${nbpe}_units.txt
bpemodel=data/lang_char/${train_set}_${bpemode}${nbpe}
dict=data/lang_char/train_nodup_${bpemode}${nbpe}_units.txt
bpemodel=data/lang_char/train_nodup_${bpemode}${nbpe}

echo "dictionary: ${dict}"
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
Expand Down Expand Up @@ -194,7 +209,7 @@ fi
if [ -z ${lmtag} ]; then
lmtag=$(basename ${lm_config%.*})
fi
lmexpname=train_rnnlm_${backend}_${lmtag}_${bpemode}${nbpe}
lmexpname=train_transformer_lm_${backend}_${lmtag}_${bpemode}${nbpe}
lmexpdir=exp/${lmexpname}
mkdir -p ${lmexpdir}

Expand Down
Loading

0 comments on commit 63f88c0

Please sign in to comment.