Skip to content

Commit

Permalink
Merge branch 'master' of github.com:espnet/espnet into dpclanddan
Browse files Browse the repository at this point in the history
  • Loading branch information
earthmanylf committed Feb 27, 2022
2 parents d3acdcc + 637d8c3 commit 5f7e2e7
Show file tree
Hide file tree
Showing 80 changed files with 3,378 additions and 519 deletions.
19 changes: 17 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,9 @@ ESPnet2's recipes correspond to `egs2`. ESPnet2 applies a new paradigm without d
For ESPnet2, we do not recommend preparing the recipe's stages for each corpus but using the common pipelines we provided in `asr.sh`, `tts.sh`, and
`enh.sh`. For details of creating ESPnet2 recipes, please refer to [egs2-readme](https://github.com/espnet/espnet/blob/master/egs2/TEMPLATE/README.md).

The common pipeline of ESPnet2 recipes will take care of the `RESULTS.md` generation, model packing, and uploading. ESPnet2 models are maintained at Zenodo and Hugging Face.
The common pipeline of ESPnet2 recipes will take care of the `RESULTS.md` generation, model packing, and uploading. ESPnet2 models are maintained at Hugging Face and Zenodo (Deprecated).
You can also refer to the document in https://github.com/espnet/espnet_model_zoo
To upload your model, you need first:
To upload your model, you need first (This is currently deprecated , uploading to Huggingface Hub is prefered) :
1. Sign up to Zenodo: https://zenodo.org/
2. Create access token: https://zenodo.org/account/settings/applications/tokens/new/
3. Set your environment: % export ACCESS_TOKEN="<your token>"
Expand All @@ -64,6 +64,21 @@ To port models from zenodo using Hugging Face hub,
1. Create a Hugging Face account - https://huggingface.co/
2. Request to be added to espnet organisation - https://huggingface.co/espnet
3. Go to `egs2/RECIPE/*/scripts/utils` and run `./upload_models_to_hub.sh "ZENODO_MODEL_NAME"`

To upload models using Huggingface-cli follow the following steps:
You can also refer to https://huggingface.co/docs/transformers/model_sharing
1. Create a Hugging Face account - https://huggingface.co/
2. Request to be added to espnet organisation - https://huggingface.co/espnet
3. Run huggingface-cli login (You can get the token request at this step under setting > Access Tokens > espnet token
4. `huggingface-cli repo create your-model-name --organization espnet`
5. `git clone https://huggingface.co/username/your-model-name` (clone this outside ESPNet to avoid issues as this a git repo)
6. `cd your-model-name`
7. `git lfs install`
8. copy contents from exp diretory of your recipe into this directory (Check other models of similar task under ESPNet to confirm your directory structure)
9. `git add . `
10. `git commit -m "Add model files"`
11. `git push`
12. Check if the inference demo on HF is running successfully to verify the upload

#### 1.3.3 Additional requirements for new recipe

Expand Down
4 changes: 2 additions & 2 deletions doc/espnet2_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ You need to do one of the following two ways to change the training configuratio
```sh
# Give a configuration file
./run.sh --asr_train_config conf/train_asr.yaml
./run.sh --asr_config conf/train_asr.yaml
# Give arguments to "espnet2/bin/asr_train.py" directly
./run.sh --asr_args "--foo arg --bar arg2"
```
Expand Down Expand Up @@ -291,7 +291,7 @@ To use SSLRs in your task, you need to make several modifications.
### Usage
1. To reduce the time used in `collect_stats` step, please specify `--feats_normalize uttmvn` in `run.sh` and pass it as arguments to `asr.sh` or other task-specific scripts. (Recommended)
2. In the configuration file, specify the `frontend` and `preencoder`. Taking `HuBERT` as an example:
The `upsteam` name can be whatever supported in S3PRL. `multilayer-feature=True` means the final representation is a weighted-sum of all layers' hidden states from SSLR model.
The `upstream` name can be whatever supported in S3PRL. `multilayer-feature=True` means the final representation is a weighted-sum of all layers' hidden states from SSLR model.
```
frontend: s3prl
frontend_conf:
Expand Down
1 change: 1 addition & 0 deletions egs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ See: https://espnet.github.io/espnet/tutorial.html
| librispeech | LibriSpeech ASR corpus | ASR | EN | http://www.openslr.org/12 | |
| libritts | LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech | TTS | EN | http://www.openslr.org/60/ | |
| ljspeech | The LJ Speech Dataset | TTS | EN | https://keithito.com/LJ-Speech-Dataset/ | |
| lrs | The Lip Reading Sentences Dataset | ASR/AVSR | EN | https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html | |
| m_ailabs | The M-AILABS Speech Dataset | TTS | ~5 languages | https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/ |
| mucs_2021 | MUCS 2021: MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages | ASR/Code Switching | HI, MR, OR, TA, TE, GU, HI-EN, BN-EN | https://navana-tech.github.io/MUCS2021/data.html | |
| mtedx | Multilingual TEDx | ASR/Machine Translation/Speech Translation | 13 Language pairs | http://www.openslr.org/100/ |
Expand Down
31 changes: 31 additions & 0 deletions egs/librispeech/asr1/RESULTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,37 @@ exp/train_960_pytorch_train_pytorch_conformer_large_specaug/decode_test_other_mo
| Sum/Avg | 2939 52343 | 95.3 4.1 0.6 0.6 5.3 44.8 |
```

# pytorch large conformer-transducer with specaug + speed perturbation (4 GPUs)

- Environments
- python version: `3.8.3 (default) [GCC 7.3.0]`
- espnet version: `espnet 0.10.7a1`
- chainer version: `chainer 6.0.0`
- pytorch version: `pytorch 1.10.0`

- Model files (archived to model.tar.gz by `$ pack_model.sh`)
- model link: ([pretrained model](https://drive.google.com/file/d/1fdadICi2w_b6lqb9_7J3wfRJc3LTnnSq/view?usp=sharing))
- training config file: `conf/tuning/transducer/train_conformer-rnn_transducer.yaml`
- decoding config file: `conf/tuning/transducer/decode.yaml`
- cmvn file: `data/train_sp/cmvn.ark`
- e2e file: `exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/results/model.last10.avg.best`
- e2e JSON file: `exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/results/model.json`
- dict file: `data/lang_char`
- Results (paste them by yourself or obtained by `$ pack_model.sh --results <results>`)
```
exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/decode_dev_clean_model.last10.avg.best/result.wrd.txt
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 2703 54402 | 97.6 2.2 0.2 0.3 2.7 33.0 |
exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/decode_dev_other_model.last10.avg.best/result.wrd.txt
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 2864 50948 | 93.7 5.7 0.6 0.7 7.0 52.8 |
exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/decode_test_clean_model.last10.avg.best/result.wrd.txt
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 2620 52576 | 97.4 2.3 0.3 0.3 2.9 33.1 |
exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/decode_test_other_model.last10.avg.best/result.wrd.txt
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 2939 52343 | 93.7 5.6 0.7 0.8 7.1 55.1 |
```

# Lightweight/Dynamic convolution results
| | | # Snt | # Wrd |Corr|Sub|Del|Ins|Err|S.Err |
Expand Down
4 changes: 4 additions & 0 deletions egs/librispeech/asr1/conf/tuning/transducer/decode.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
batch: 0
beam-size: 10
search-type: default
score-norm: True
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# minibatch related
batch-size: 32
maxlen-in: 512
maxlen-out: 150

# optimization related
criterion: loss
early-stop-criterion: "validation/main/loss"
sortagrad: 0
opt: noam
noam-adim: 256
transformer-lr: 1.0
transformer-warmup-steps: 25000
epochs: 100
patience: 0
accum-grad: 4
grad-clip: 5.0

# network architecture
## general
custom-enc-positional-encoding-type: rel_pos
custom-enc-self-attn-type: rel_self_attn
custom-enc-pw-activation-type: swish
## encoder related
etype: custom
custom-enc-input-layer: vgg2l
enc-block-arch:
- type: conformer
d_hidden: 512
d_ff: 2048
heads: 4
macaron_style: True
use_conv_mod: True
conv_mod_kernel: 15
dropout-rate: 0.3
att-dropout-rate: 0.3
enc-block-repeat: 12
## decoder related
dtype: lstm
dlayers: 1
dec-embed-dim: 1024
dunits: 512
dropout-rate-embed-decoder: 0.2
dropout-rate-decoder: 0.1
## joint network related
joint-dim: 512

# transducer related
model-module: "espnet.nets.pytorch_backend.e2e_asr_transducer:E2E"

39 changes: 39 additions & 0 deletions egs/lrs/asr1/RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## pretrain_Train_pytorch_train_specaug

* Model files (archived to model.tar.gz by <code>$ pack_model.sh</code>)
- download link: <code>https://drive.google.com/file/d/1YUePEjk2Utgznr7sP0x4KdKCcPjbMM7C/view?usp=sharing</code>
- training config file: <code>conf/train.yaml</code>
- decoding config file: <code>conf/decode.yaml</code>
- preprocess config file: <code>conf/specaug.yaml</code>
- lm config file: <code>conf/lm.yaml</code>
- cmvn file: <code>data/pretrain_Train/cmvn.ark</code>
- e2e file: <code>exp/pretrain_Train_pytorch_train_specaug/results/model.val5.avg.best</code>
- e2e json file: <code>exp/pretrain_Train_pytorch_train_specaug/results/model.json</code>
- lm file: <code>exp/pretrainedlm/rnnlm.model.best</code>
- lm JSON file: <code>exp/pretrainedlm/model.json</code>
- dict file: <code>data/lang_char/pretrain_Train_unigram5000_units.txt</code>


## Environments
- date: `Wed Feb 16 09:06:58 CET 2022`
- python version: `3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0]`
- espnet version: `espnet 0.9.8`
- chainer version: `chainer 6.0.0`
- pytorch version: `pytorch 1.4.0`
- Git hash: `19aabb415657c05a45467f9d8bb612db4764f6a1`
- Commit date: `Tue Oct 19 12:00:34 2021 +0200`


### CER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_Test_model.val5.avg.best_decode_|1243|12648|96.3|1.6|2.1|0.2|3.9|15.8|
|decode_Val_model.val5.avg.best_decode_|1082|14858|92.7|3.2|4.1|0.9|8.2|38.2|

### WER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_Test_model.val5.avg.best_decode_|1243|6660|96.2|2.1|1.7|0.4|4.2|15.7|
|decode_Val_model.val5.avg.best_decode_|1082|7866|91.6|4.7|3.7|1.0|9.4|38.2|
89 changes: 89 additions & 0 deletions egs/lrs/asr1/cmd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ======
# Usage: <cmd>.pl [options] JOB=1:<nj> <log> <command...>
# e.g.
# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB
#
# Options:
# --time <time>: Limit the maximum time to execute.
# --mem <mem>: Limit the maximum memory usage.
# -–max-jobs-run <njob>: Limit the number parallel jobs. This is ignored for non-array jobs.
# --num-threads <ngpu>: Specify the number of CPU core.
# --gpu <ngpu>: Specify the number of GPU devices.
# --config: Change the configuration file from default.
#
# "JOB=1:10" is used for "array jobs" and it can control the number of parallel jobs.
# The left string of "=", i.e. "JOB", is replaced by <N>(Nth job) in the command and the log file name,
# e.g. "echo JOB" is changed to "echo 3" for the 3rd job and "echo 8" for 8th job respectively.
# Note that the number must start with a positive number, so you can't use "JOB=0:10" for example.
#
# run.pl, queue.pl, slurm.pl, and ssh.pl have unified interface, not depending on its backend.
# These options are mapping to specific options for each backend and
# it is configured by "conf/queue.conf" and "conf/slurm.conf" by default.
# If jobs failed, your configuration might be wrong for your environment.
#
#
# The official documentation for run.pl, queue.pl, slurm.pl, and ssh.pl:
# "Parallelization in Kaldi": http://kaldi-asr.org/doc/queue.html
# =========================================================~


# Select the backend used by run.sh from "local", "sge", "slurm", or "ssh"
cmd_backend='local'

# Local machine, without any Job scheduling system
if [ "${cmd_backend}" = local ]; then

# The other usage
export train_cmd="run.pl"
# Used for "*_train.py": "--gpu" is appended optionally by run.sh
export cuda_cmd="run.pl"
# Used for "*_recog.py"
export decode_cmd="run.pl"

# "qsub" (SGE, Torque, PBS, etc.)
elif [ "${cmd_backend}" = sge ]; then
# The default setting is written in conf/queue.conf.
# You must change "-q g.q" for the "queue" for your environment.
# To know the "queue" names, type "qhost -q"
# Note that to use "--gpu *", you have to setup "complex_value" for the system scheduler.

export train_cmd="queue.pl"
export cuda_cmd="queue.pl"
export decode_cmd="queue.pl"

# "sbatch" (Slurm)
elif [ "${cmd_backend}" = slurm ]; then
# The default setting is written in conf/slurm.conf.
# You must change "-p cpu" and "-p gpu" for the "partion" for your environment.
# To know the "partion" names, type "sinfo".
# You can use "--gpu * " by default for slurm and it is interpreted as "--gres gpu:*"
# The devices are allocated exclusively using "${CUDA_VISIBLE_DEVICES}".

export train_cmd="slurm.pl"
export cuda_cmd="slurm.pl"
export decode_cmd="slurm.pl"

elif [ "${cmd_backend}" = ssh ]; then
# You have to create ".queue/machines" to specify the host to execute jobs.
# e.g. .queue/machines
# host1
# host2
# host3
# Assuming you can login them without any password, i.e. You have to set ssh keys.

export train_cmd="ssh.pl"
export cuda_cmd="ssh.pl"
export decode_cmd="ssh.pl"

# This is an example of specifying several unique options in the JHU CLSP cluster setup.
# Users can modify/add their own command options according to their cluster environments.
elif [ "${cmd_backend}" = jhu ]; then

export train_cmd="queue.pl --mem 2G"
export cuda_cmd="queue-freegpu.pl --mem 2G --gpu 1 --config conf/gpu.conf"
export decode_cmd="queue.pl --mem 4G"

else
echo "$0: Error: Unknown cmd_backend=${cmd_backend}" 1>&2
return 1
fi
7 changes: 7 additions & 0 deletions egs/lrs/asr1/conf/decode.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
batchsize: 0
beam-size: 60
ctc-weight: 0.4
lm-weight: 0.6
maxlenratio: 0.0
minlenratio: 0.0
penalty: 0.0
2 changes: 2 additions & 0 deletions egs/lrs/asr1/conf/fbank.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
--sample-frequency=16000
--num-mel-bins=80
10 changes: 10 additions & 0 deletions egs/lrs/asr1/conf/gpu.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Default configuration
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64*
option mem=* -l mem_free=$0,ram_free=$0
option mem=0 # Do not add anything to qsub_opts
option num_threads=* -pe smp $0
option num_threads=1 # Do not add anything to qsub_opts
option max_jobs_run=* -tc $0
default gpu=0
option gpu=0
option gpu=* -l 'hostname=b1[12345678]*|c*,gpu=$0' -q g.q
9 changes: 9 additions & 0 deletions egs/lrs/asr1/conf/lm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
layer: 4
unit: 2048
opt: sgd # or adam
sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
batchsize: 512 # batch size in LM training
epoch: 20 # if the data size is large, we can reduce this
patience: 3
maxlen: 40 # if sentence length > lm_maxlen, lm_batchsize is automatically reduced
dropout-rate: 0.0
1 change: 1 addition & 0 deletions egs/lrs/asr1/conf/pitch.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--sample-frequency=16000
10 changes: 10 additions & 0 deletions egs/lrs/asr1/conf/queue.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Default configuration
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64*
option mem=* -l mem_free=$0,ram_free=$0
option mem=0 # Do not add anything to qsub_opts
option num_threads=* -pe smp $0
option num_threads=1 # Do not add anything to qsub_opts
option max_jobs_run=* -tc $0
default gpu=0
option gpu=0
option gpu=* -l gpu=$0 -q g.q
14 changes: 14 additions & 0 deletions egs/lrs/asr1/conf/slurm.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Default configuration
command sbatch --export=PATH
option name=* --job-name $0
option time=* --time $0
option mem=* --mem-per-cpu $0
option mem=0
option num_threads=* --cpus-per-task $0
option num_threads=1 --cpus-per-task 1
option num_nodes=* --nodes $0
default gpu=0
option gpu=0 -p cpu
option gpu=* -p gpu --gres=gpu:$0 -c $0 # Recommend allocating more CPU than, or equal to the number of GPU
# note: the --max-jobs-run option is supported as a special case
# by slurm.pl and you don't have to handle it in the config file.
16 changes: 16 additions & 0 deletions egs/lrs/asr1/conf/specaug.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
process:
# these three processes are a.k.a. SpecAugument
- type: "time_warp"
max_time_warp: 5
inplace: true
mode: "PIL"
- type: "freq_mask"
F: 30
n_mask: 2
inplace: true
replace_with_zero: false
- type: "time_mask"
T: 40
n_mask: 2
inplace: true
replace_with_zero: false
Loading

0 comments on commit 5f7e2e7

Please sign in to comment.