Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready to merge] Pruned Transducer Stateless2 for WenetSpeech (char-based) #349

Merged

Conversation

luomingshuang
Copy link
Collaborator

@luomingshuang luomingshuang commented May 6, 2022

Actually, there is an existing PR #314 with pruned transducer stateless2 for wenetspeech. But that PR can be regarded as a draft. This PR aims to merge into the master. Of course, the process trained with the L subset is still going on. (The results are better than other methods at present. And the performance still gets a little improvement. I will update the results on RESULT.md) It doesn't influence the code review for this PR. The greedy_search_new and modified_beam_search_new are from the PR #358.
Results:

Subset_for_training decoding_method epoch avg pruned RNN-T (test-net/test-meet) reworked model (dev/test-net/test-meet)
M greedy_search 29 11 -/- 10.40/11.31/19.64
M modified_beam_search 29 11 -/- 9.85/11.04/18.20
M fast_beam_search 29 11 -/- 10.18/11.10/19.32
M / / / kaldi 9.81/14.19/28.22
S greedy_search 29 24 -/- 19.92/25.20/35.35
S modified_beam_search 29 24 -/- 18.62/23.88/33.80
S fast_beam_search 29 24 -/- 19.31/24.41/34.87
S / / / kaldi 11.70/17.47/37.27
L greedy_search 10 2 -/- 7.80/8.78/13.49
L greedy_search_new 10 2 -/- 7.80/8.76/13.50
L modified_beam_search 10 2 -/- 7.76/8.82/13.41
L modified_beam_search_new 10 2 -/- 7.76/8.72/13.41
L fast_beam_search 10 2 -/- 7.94/8.73/13.81
L / / / kaldi 9.07/12.83/24.72
L / / / wenet 8.88/9.70/15.59
L / / / espnet 9.70/8.90/15.90

@luomingshuang luomingshuang changed the title [WIP] Pruned Transducer Stateless2 for WenetSpeech (char-based) Pruned Transducer Stateless2 for WenetSpeech (char-based) May 6, 2022
@luomingshuang
Copy link
Collaborator Author

The best results trained with L subset (better than other public results at present):

dev test-net test-meeting comment
greedy search 7.80 8.75 13.49 --epoch 10, --avg 2, --max-duration 100
modified beam search (beam size 4) 7.76 8.71 13.41 --epoch 10, --avg 2, --max-duration 100
fast beam search (set as default) 7.94 8.74 13.80 --epoch 10, --avg 2, --max-duration 1500

@luomingshuang luomingshuang changed the title Pruned Transducer Stateless2 for WenetSpeech (char-based) [Ready to merge] Pruned Transducer Stateless2 for WenetSpeech (char-based) May 19, 2022
README.md Outdated
@@ -20,6 +20,11 @@ We provide 6 recipes at present:
- [TIMIT][timit]
- [TED-LIUM3][tedlium3]
- [GigaSpeech][gigaspeech]
<<<<<<< HEAD
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve the conflicts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@luomingshuang luomingshuang added ready and removed ready labels May 23, 2022
| fast beam search (set as default) | 19.31 | 24.41 | 34.87 | --epoch 29, --avg 24, --max-duration 1500 |


A pre-trained model and decoding logs can be found at <https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add the pretrained models and decoding results for the M and S subset to huggingface?

Also, could you upload the training logs to huggingface?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, no problem.

@csukuangfj
Copy link
Collaborator

Thanks!

@csukuangfj csukuangfj merged commit 0e57b30 into k2-fsa:master May 23, 2022
@pingfengluo
Copy link
Contributor

@luomingshuang how long it takes to complete wenetspeech training with L set?

--training-subset L
```

The tensorboard training log can be found at
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can find the tensorboard log here. And according to this log, it takes about 21 days 16 hours and a half for 17 epochs. According to my results, I think it can be stopped when training for 11 epochs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pingfengluo
Copy link
Contributor

tks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants