Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGB2 #396

Merged
merged 36 commits into from
Dec 2, 2022
Merged

MGB2 #396

Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
86e1f9b
mgb2
AmirHussein96 Jun 4, 2022
68aa924
mgb2
AmirHussein96 Jun 4, 2022
8f26a13
Merge branch 'master' of https://github.com/AmirHussein96/icefall int…
AmirHussein96 Jul 5, 2022
b08f442
adding pruned transducer stateless to mgb2
AmirHussein96 Jul 5, 2022
65e1c9b
update display_manifest_statistics.py
AmirHussein96 Jul 5, 2022
64d6ec0
Merge branch 'master' of https://github.com/AmirHussein96/icefall int…
AmirHussein96 Sep 7, 2022
5f9ef7b
Merge branch 'k2-fsa:master' into mgb2
AmirHussein96 Sep 7, 2022
7c798de
Merge branch 'mgb2' of https://github.com/AmirHussein96/icefall into …
AmirHussein96 Sep 7, 2022
cb840d6
.
AmirHussein96 Sep 7, 2022
a89ae13
stateless transducer MGB-2
AmirHussein96 Sep 7, 2022
53b0b0c
Update README.md
AmirHussein96 Sep 8, 2022
fc45d7d
Update RESULTS.md
AmirHussein96 Sep 8, 2022
ec365c8
Update prepare_lang_bpe.py
AmirHussein96 Sep 8, 2022
d21dbd5
Update asr_datamodule.py
AmirHussein96 Sep 8, 2022
042d4f4
.nfs removed
AmirHussein96 Sep 8, 2022
af5a7a4
Adding symlink
AmirHussein96 Sep 8, 2022
b3b8474
.
AmirHussein96 Nov 22, 2022
f01ca63
.
AmirHussein96 Nov 22, 2022
36803e0
resolving conflicts
AmirHussein96 Nov 22, 2022
7817749
Update .gitignore
AmirHussein96 Nov 22, 2022
b555459
black formatting
AmirHussein96 Nov 22, 2022
889b4d3
Merge branch 'mgb2' of https://github.com/AmirHussein96/icefall into …
AmirHussein96 Nov 22, 2022
ffbaa8b
Merge branch 'k2-fsa:master' into mgb2
AmirHussein96 Nov 22, 2022
367771b
Merge branch 'master' of https://github.com/AmirHussein96/icefall int…
AmirHussein96 Nov 22, 2022
4702b2f
hhMerge branch 'mgb2' of https://github.com/AmirHussein96/icefall int…
AmirHussein96 Nov 22, 2022
bdb9746
Update compile_hlg.py
AmirHussein96 Nov 23, 2022
6ce9d86
Update compute_fbank_musan.py
AmirHussein96 Nov 23, 2022
11c1257
Update convert_transcript_words_to_tokens.py
AmirHussein96 Nov 23, 2022
9a4790f
Update download_lm.py
AmirHussein96 Nov 23, 2022
70c5966
Update generate_unique_lexicon.py
AmirHussein96 Nov 23, 2022
6378a1f
Merge branch 'k2-fsa:master' into mgb2
AmirHussein96 Nov 30, 2022
091abc4
Merge branch 'master' of https://github.com/AmirHussein96/icefall int…
AmirHussein96 Nov 30, 2022
84a545f
adding simlinks
AmirHussein96 Nov 30, 2022
584a81f
merge branch 'mgb2' of https://github.com/AmirHussein96/icefall into …
AmirHussein96 Nov 30, 2022
6c50f5a
fixing symbolic links
AmirHussein96 Dec 2, 2022
6a15a6e
Merge branch 'k2-fsa:master' into mgb2
AmirHussein96 Dec 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,25 @@ log
*.bak
*-bak
*bak.py

# Ignore Mac system files
.DS_store

# Ignore node_modules folder
node_modules

# ignore .nfs

.nfs*

# Ignore all text files
*.txt

# Ignore files related to API keys
.env

# Ignore SASS config files
.sass-cache

*.param
*.bin
3 changes: 2 additions & 1 deletion egs/librispeech/ASR/pruned_transducer_stateless2/optim.py
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,8 @@ def get_lr(self):
factor = (
(self.batch**2 + self.lr_batches**2) / self.lr_batches**2
) ** -0.25 * (
((self.epoch**2 + self.lr_epochs**2) / self.lr_epochs**2) ** -0.25
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AmirHussein96 could you remove these formatting changes? Please see #692 where we updated the line-length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished the formatting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

((self.epoch**2 + self.lr_epochs**2) / self.lr_epochs**2)
** -0.25
)
return [x * factor for x in self.base_lrs]

Expand Down
3 changes: 2 additions & 1 deletion egs/librispeech/ASR/pruned_transducer_stateless2/scaling.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,8 @@ def forward(self, x: Tensor) -> Tensor:
if not is_jit_tracing():
assert x.shape[self.channel_dim] == self.num_channels
scales = (
torch.mean(x**2, dim=self.channel_dim, keepdim=True) + self.eps.exp()
torch.mean(x**2, dim=self.channel_dim, keepdim=True)
+ self.eps.exp()
) ** -0.5
return x * scales

Expand Down
43 changes: 43 additions & 0 deletions egs/mgb2/ASR/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# MGB2

The Multi-Dialect Broadcast News Arabic Speech Recognition (MGB-2):
The second edition of the Multi-Genre Broadcast (MGB-2) Challenge is
an evaluation of speech recognition and lightly supervised alignment
using TV recordings in Arabic. The speech data is broad and multi-genre,
spanning the whole range of TV output, and represents a challenging task for
speech technology. In 2016, the challenge featured two new Arabic tracks based
on TV data from Aljazeera. It was an official challenge at the 2016 IEEE
Workshop on Spoken Language Technology. The 1,200 hours MGB-2: from Aljazeera
TV programs have been manually captioned with no timing information.
QCRI Arabic ASR system has been used to recognize all programs. The ASR output
was used to align the manual captioning and produce speech segments for
training speech recognition. More than 20 hours from 2015 programs have been
transcribed verbatim and manually segmented. This data is split into a
development set of 10 hours, and a similar evaluation set of 10 hours.
Both the development and evaluation data have been released in the 2016 MGB
challenge

Official reference:

Ali, Ahmed, et al. "The MGB-2 challenge: Arabic multi-dialect broadcast media recognition."
2016 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2016.

IEEE link: https://ieeexplore.ieee.org/abstract/document/7846277

## Stateless Pruned Transducer Performance Record (after 30 epochs)

| | dev | test | comment |
|------------------------------------|------------|------------|------------------------------------------|
| greedy search | 15.52 | 15.28 | --epoch 18, --avg 5, --max-duration 200 |
| modified beam search | 13.88 | 13.7 | --epoch 18, --avg 5, --max-duration 200 |
| fast beam search | 14.62 | 14.36 | --epoch 18, --avg 5, --max-duration 200 |

## Conformer-CTC Performance Record (after 40 epochs)

| Decoding method | dev WER | test WER |
|---------------------------|------------|---------|
| attention-decoder | 15.62 | 15.01 |
| whole-lattice-rescoring | 15.89 | 15.08 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, could you also add the results for 1best decoding and ctc_decoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add them.



See [RESULTS](/egs/mgb2/ASR/RESULTS.md) for details.
236 changes: 236 additions & 0 deletions egs/mgb2/ASR/RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
# Results


### MGB2 all data BPE training results (Stateless Pruned Transducer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you upload the pretrained model, checkpoint, and decoding results to a hugging face repo?

You can use
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14/tree/main
as a reference and see which files should be uploaded to the huggingface repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not share the stateless transducer at the current stage as it is being used in another project and it is kind of sensitive to the side that supports me with the computation resources. However I am planning to upload it in the near future.


#### 2022-09-07

The WERs are

| | dev | test | comment |
|------------------------------------|------------|------------|------------------------------------------|
| greedy search | 15.52 | 15.28 | --epoch 18, --avg 5, --max-duration 200 |
| modified beam search | 13.88 | 13.7 | --epoch 18, --avg 5, --max-duration 200 |
| fast beam search | 14.62 | 14.36 | --epoch 18, --avg 5, --max-duration 200|

The training command for reproducing is given below:

```
export CUDA_VISIBLE_DEVICES="0,1,2,3"



./pruned_transducer_stateless5/train.py \
--world-size 4 \
--num-epochs 30 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30 epochs are trained. Does the combination --epoch 18, --avg 5 produce the best WER among other combinations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

--start-epoch 1 \
--exp-dir pruned_transducer_stateless5/exp \
--max-duration 300 \
--num-buckets 50
```

The tensorboard training log can be found at
https://tensorboard.dev/experiment/YyNv45pfQ0GqWzZ898WOlw/#scalars

The decoding command is:
```
epoch=18
avg=5
for method in greedy_search modified_beam_search fast_beam_search; do
./pruned_transducer_stateless5/decode.py \
--epoch $epoch \
--beam-size 10 \
--avg $avg \
--exp-dir ./pruned_transducer_stateless5/exp \
--max-duration 200 \
--decoding-method $method \
--max-sym-per-frame 1 \
--num-encoder-layers 12 \
--dim-feedforward 2048 \
--nhead 8 \
--encoder-dim 512 \
--decoder-dim 512 \
--joiner-dim 512 \
--use-averaged-model True
done
```

### MGB2 all data BPE training results (Conformer-CTC) (after 40 epochs)

#### 2022-06-04

You can find a pretrained model, training logs, decoding logs, and decoding results at:
https://huggingface.co/AmirHussein/icefall-asr-mgb2-conformer_ctc-2022-27-06
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also upload pretrained.pt?
cpu_jit.pt is useful during inference time, while pretrained.pt is useful for resuming the training.

For the decoding results, could you also upload the following files:

  • errs-xxx
  • recogs-xxx
    Currently, only the decoding logs log-xxx are uploaded, which do not contain the recognition results.

Also, have you tried other decoding methods, e.g., ctc decoding and 1best decoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried the whole lattice rescoring and the attention decoding. The attention gave me the best results.
I uploaded the errs-xxx and recogs-xxx.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also provide some test waves and the corresponding transcripts in the above hugging face repo so that we can use them to test your model in sherpa?


You can use
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14/tree/main/test_wavs
as a reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


The best WER, as of 2022-06-04, for the MGB2 test dataset is below

Using whole lattice HLG decoding + n-gram LM rescoring

| | dev | test |
|-----|------------|------------|
| WER | 15.62 | 15.01 |

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
| ngram_lm_scale | attention_scale |
|----------------|-----------------|
| 0.1 | - |


Using n-best (n=0.5) attention decoder rescoring

| | dev | test |
|-----|------------|------------|
| WER | 15.89 | 15.08 |

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
| ngram_lm_scale | attention_scale |
|----------------|-----------------|
| 0.01 | 0.5 |


To reproduce the above result, use the following commands for training:

# Note: the model was trained on V-100 32GB GPU

```
cd egs/mgb2/ASR
. ./path.sh
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1"
./conformer_ctc/train.py \
--lang-dir data/lang_bpe_5000 \
--att-rate 0.8 \
--lr-factor 10 \
--max-duration \
--concatenate-cuts 0 \
--world-size 2 \
--bucketing-sampler 1 \
--max-duration 100 \
--start-epoch 0 \
--num-epochs 40

```

and the following command for nbest decoding

```
./conformer_ctc/decode.py \
--lang-dir data/lang_bpe_5000 \
--max-duration 30 \
--concatenate-cuts 0 \
--bucketing-sampler 1 \
--num-paths 1000 \
--epoch 40 \
--avg 5 \
--method attention-decoder \
--nbest-scale 0.5
```

and the following command for whole-lattice decoding

```
./conformer_ctc/decode.py \
--epoch 40 \
--avg 5 \
--exp-dir conformer_ctc/exp_5000_att0.8 \
--lang-dir data/lang_bpe_5000 \
--max-duration 30 \
--concatenate-cuts 0 \
--bucketing-sampler 1 \
--num-paths 1000 \
--method whole-lattice-rescoring
```


The tensorboard log for training is available at
https://tensorboard.dev/experiment/QYNzOi52RwOX8yvtpl3hMw/#scalars


### MGB2 100h BPE training results (Conformer-CTC) (after 33 epochs)

#### 2022-06-04

The best WER, as of 2022-06-04, for the MGB2 test dataset is below

Using whole lattice HLG decoding + n-gram LM rescoring

| | dev | test |
|-----|------------|------------|
| WER | 25.32 | 23.53 |

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
| ngram_lm_scale | attention_scale |
|----------------|-----------------|
| 0.1 | - |


Using n-best (n=0.5) HLG decoding + n-gram LM rescoring + attention decoder rescoring:

| | dev | test |
|-----|------------|------------|
| WER | 27.87 | 26.12 |

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
| ngram_lm_scale | attention_scale |
|----------------|-----------------|
| 0.01 | 0.3 |


To reproduce the above result, use the following commands for training:

# Note: the model was trained on V-100 32GB GPU

```
cd egs/mgb2/ASR
. ./path.sh
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1"
./conformer_ctc/train.py \
--lang-dir data/lang_bpe_5000 \
--att-rate 0.8 \
--lr-factor 10 \
--max-duration \
--concatenate-cuts 0 \
--world-size 2 \
--bucketing-sampler 1 \
--max-duration 100 \
--start-epoch 0 \
--num-epochs 40

```

and the following command for nbest decoding

```
./conformer_ctc/decode.py \
--lang-dir data/lang_bpe_5000 \
--max-duration 30 \
--concatenate-cuts 0 \
--bucketing-sampler 1 \
--num-paths 1000 \
--epoch 40 \
--avg 5 \
--method attention-decoder \
--nbest-scale 0.5
```

and the following command for whole-lattice decoding

```
./conformer_ctc/decode.py \
--lang-dir data/lang_bpe_5000 \
--max-duration 30 \
--concatenate-cuts 0 \
--bucketing-sampler 1 \
--num-paths 1000 \
--epoch 40 \
--avg 5 \
--method whole-lattice-rescoring
```

The tensorboard log for training is available at
<https://tensorboard.dev/experiment/zy6FnumCQlmiO7BPsdCmEg/#scalars>




Empty file.
Loading