-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGB2 #396
MGB2 #396
Conversation
Thanks! How did you choose the following thresholds? icefall/egs/mgb2/ASR/conformer_ctc/train.py Lines 640 to 649 in 68aa924
Could you update Also, could you please try our pruned RNN-T recipe, which not only has a lower WER on LibriSpeech/GigaSpeech but also has a faster decoding speed with much less memory consumption? I would recommend you using https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5 as a starting point. |
Also, I think you are converting kaldi manifests to lhotse format. Please have a look at #391 (reply in thread) If you use a version of lhotse before lhotse-speech/lhotse#729 to extract the features, I would suggest you to re-extract it using the latest lhotse, which uses |
Yes my current version uses |
The min=0.5 and max=30 duration boundaries are similar to what I used with Espnet based on my experience. Longer segments > 30 cause memory issues and model underfitting (needs a lot of epochs to start fitting the training data). Regarding RNN-T I was actually considering it as my next step, so yes I will run it as well, thank you for pointing me to the latest best RNN-T configuration. |
So to use lilcom_chunky I should change |
Yes. Please see
Yes, you can do that. Please see
Notes that filenames end with Also, please replace icefall/egs/librispeech/ASR/tdnn_lstm_ctc/asr_datamodule.py Lines 227 to 229 in 1235e23
icefall/egs/librispeech/ASR/tdnn_lstm_ctc/asr_datamodule.py Lines 424 to 426 in 1235e23
And use icefall/egs/librispeech/ASR/tdnn_lstm_ctc/asr_datamodule.py Lines 308 to 316 in 1235e23
|
|
||
with get_executor() as ex: # Initialize the executor only once. | ||
for partition, m in manifests.items(): | ||
if (output_dir / f"cuts_{partition}.json.gz").is_file(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (output_dir / f"cuts_{partition}.json.gz").is_file(): | |
if (output_dir / f"cuts_{partition}.jsonl.gz").is_file(): |
cut_set = cut_set.trim_to_supervisions( | ||
keep_overlapping=False, min_duration=None | ||
) | ||
cut_set.to_json(output_dir / f"cuts_{partition}.json.gz") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cut_set.to_json(output_dir / f"cuts_{partition}.json.gz") | |
cut_set.to_file(output_dir / f"cuts_{partition}.jsonl.gz") |
Also I think I closed the PR by mistake, @csukuangfj can you reopen it? |
After making the suggested modifications for feature storing |
Since you are doubling the max duration, the time for 6k iterations should also be increased. But I am not sure whether it is normal that the time is doubled. @pzelasko Do you have any comments?
I don't know whether switching to |
It is pretty close to linear, the change of time per minibatch when you increase the --max-duration, so I think that is as expected. |
LilcomChunky and LilcomHdf5 should have very close performance, I don’t expect you’d win anything here. Like Dan says, if you scaled up the batch size by 2x it can explain that it takes almost twice as long to run (unless you had a small model that underutilizes the GPU which is likely not the case here). |
Related to the slow training discussion, @danpovey suggested to either :
So first I double checked the node hard drive
So as the first attempt I tried increasing the number of workers from 2 to 8, the --max-duration is150 similar to https://tensorboard.dev/experiment/zy6FnumCQlmiO7BPsdCmEg/#scalars because 300 and 200 gave OOM. The speed of the iterations indeed became 4 times faster, you can find the new setup with 8 workers: https://tensorboard.dev/experiment/WvSg4yn8SYyJlKyQGkls0A/#scalars. |
Increasing num-workers increases RAM utilization but does not increase GPU memory utilization so it should not affect the maximum --max-duration you can use. My feeling is that the issue is that he is running from a HDD, not an SDD, so the latency of disk access is quite slow. |
@AmirHussein96 |
If your dataset is quite large, you can use the following two files as a reference, which splits the dataset into smaller pieces: |
Yes I followed these changes with increasing the number of workers from 2 to 8 per GPU, and I am using 2 GPUs. The utilization is shown below, it is much better now 10h-12h per epoch compared to 2 days previously: |
@AmirHussein96 |
Recent updates: The conformer_ctc training for 45 epochs has finished, the tensorboard is here: https://tensorboard.dev/experiment/QYNzOi52RwOX8yvtpl3hMw/#scalars I tried the following decoding methods: (Note I had to reduce the max_active_states from 10000 to 5000 to fit on P100 16GB GPU.
Looks like there is still considerable gap compared to similar Espnet setup WER with decoding beam search=20, no LM: |
I tried the RNNT on MGB2 with the following command For some reason the RNNT asks for a lot of memory that does not fit into V100 16GB, any ideas why this is happening? errors-7078120.txt |
It looks like, for 1 of your jobs, an inf has got into the pruned_loss at some point. But this may only affect the diagnostics. |
Hi @danpovey , apologize for the late reply, I have pushed the updated pruned transducer stateless config that I am using with MGB2 please check it and let me know what do you think. The details about k2 version I am using are below:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please update your black
settings (see https://icefall.readthedocs.io/en/latest/contributing/code-style.html), and reformat the directory using the new default line-length of 88? This can be done by running black mgb2/
.
@@ -287,7 +287,8 @@ def get_lr(self): | |||
factor = ( | |||
(self.batch**2 + self.lr_batches**2) / self.lr_batches**2 | |||
) ** -0.25 * ( | |||
((self.epoch**2 + self.lr_epochs**2) / self.lr_epochs**2) ** -0.25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AmirHussein96 could you remove these formatting changes? Please see #692 where we updated the line-length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished the formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
parser.add_argument( | ||
"--use-averaged-model", | ||
type=str2bool, | ||
default=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csukuangfj told me that using this option provides significant WER improvements, and I noticed the same in my experiments. You can try it out if you have some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the best reported results with stateless transducer are decoded with --use-averaged-model True
The style_check failed on the symbolic links which @csukuangfj asked me to add. |
Here are the error logs:
Please recheck your symlink. |
@@ -0,0 +1 @@ | |||
../../../librispeech/ASR/conformer_ctc/convert_transcript_words_to_tokens.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, this symlink is not correct.
Please fix other symlinks reported by the CI.
@@ -0,0 +1 @@ | |||
../../../librispeech/ASR/conformer_ctc/generate_unique_lexicon.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this one. It should be placed in local
, not in conformer_ctc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I fixed them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csukuangfj please let me know if there is anything else I need to do to merge the PR
@@ -0,0 +1,901 @@ | |||
#!/usr/bin/env python3 | |||
# Copyright (c) 2021 University of Chinese Academy of Sciences (author: Han Zhu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't make changes to this file, could you please replace it with a symlink to the one from the librispeech recipe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
egs/mgb2/ASR/local/compile_hlg.py
Outdated
@@ -0,0 +1,157 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace it with a symlink to the file from librispeech?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
./pruned_transducer_stateless5/train.py \ | ||
--world-size 4 \ | ||
--num-epochs 30 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
30 epochs are trained. Does the combination --epoch 18, --avg 5
produce the best WER among other combinations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
#### 2022-06-04 | ||
|
||
You can find a pretrained model, training logs, decoding logs, and decoding results at: | ||
https://huggingface.co/AmirHussein/icefall-asr-mgb2-conformer_ctc-2022-27-06 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also upload pretrained.pt
?
cpu_jit.pt
is useful during inference time, while pretrained.pt
is useful for resuming the training.
For the decoding results, could you also upload the following files:
- errs-xxx
- recogs-xxx
Currently, only the decoding logslog-xxx
are uploaded, which do not contain the recognition results.
Also, have you tried other decoding methods, e.g., ctc decoding and 1best decoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried the whole lattice rescoring and the attention decoding. The attention gave me the best results.
I uploaded the errs-xxx and recogs-xxx.
#### 2022-06-04 | ||
|
||
You can find a pretrained model, training logs, decoding logs, and decoding results at: | ||
https://huggingface.co/AmirHussein/icefall-asr-mgb2-conformer_ctc-2022-27-06 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also provide some test waves and the corresponding transcripts in the above hugging face repo so that we can use them to test your model in sherpa?
You can use
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14/tree/main/test_wavs
as a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Results | ||
|
||
|
||
### MGB2 all data BPE training results (Stateless Pruned Transducer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you upload the pretrained model, checkpoint, and decoding results to a hugging face repo?
You can use
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14/tree/main
as a reference and see which files should be uploaded to the huggingface repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not share the stateless transducer at the current stage as it is being used in another project and it is kind of sensitive to the side that supports me with the computation resources. However I am planning to upload it in the near future.
@csukuangfj I addressed all your comments. Please let me know if you have any other comments before merging the PR. |
Thanks for your contribution. |
| Decoding method | dev WER | test WER | | ||
|---------------------------|------------|---------| | ||
| attention-decoder | 15.62 | 15.01 | | ||
| whole-lattice-rescoring | 15.89 | 15.08 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, could you also add the results for 1best decoding and ctc_decoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will add them.
By the way, you can try the pre-trained models from this PR by visiting You don't need to install anything for that. All you need is a browser. |
This is a pull request for MGB2 recipe.
Kindly note that the model is still running and currently at epoch 3, see the training curves here https://tensorboard.dev/experiment/zy6FnumCQlmiO7BPsdCmEg/#scalars.
One issue is that with the current setup one epoch on 2GPUs V-100 32GB with --max-duration 100, takes 2 days which is very long compared to similar architecture with Espnet (1/2 day for 1 epoch ), any ideas what could cause this?
I tried to increase the --max-duration to 200 but it gave me OOM error.
On the other hand the WER on test = 23.53, looks reasonable given that this is still 3rd epoch. I expect to get something close to Espnet (Transformer 14.2, Conformer 13.7).