Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Emformer RNNT recipes #2212

Closed
wants to merge 1 commit into from

Conversation

hwangjeff
Copy link
Contributor

Consolidates LibriSpeech and TED-LIUM Release 3 Emformer RNN-T training recipes in a single directory.

@facebook-github-bot
Copy link
Contributor

@hwangjeff has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

hwangjeff added a commit to hwangjeff/audio that referenced this pull request Feb 9, 2022
Summary:
Consolidates LibriSpeech and TED-LIUM Release 3 Emformer RNN-T training recipes in a single directory.

Pull Request resolved: pytorch#2212

Differential Revision: D34120104

Pulled By: hwangjeff

fbshipit-source-id: 38cf6453d19e74f9851ff0544c6c7f604ec5b630
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D34120104

@hwangjeff hwangjeff marked this pull request as ready for review February 9, 2022 22:59
Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that demo script can be moved one directory up so that it can handle both librispeech model and tedlium model. It doesn’t have to happen in this PR. @nateanl wojld you like to update #2203 after this one?


## Model Types

Currently, we have training recipes for the LibriSpeech and TED-LIUM Release 3 datasets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you have a brief description on what’s the difference between these models? I know vocab size is one. Is there any other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's the only difference in the model architecture. the recipes themselves have some differences that are specific to the datasets — this should be apparent in the lightning modules

@facebook-github-bot
Copy link
Contributor

@hwangjeff has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

hwangjeff added a commit to hwangjeff/audio that referenced this pull request Feb 10, 2022
Summary:
Consolidates LibriSpeech and TED-LIUM Release 3 Emformer RNN-T training recipes in a single directory.

Pull Request resolved: pytorch#2212

Reviewed By: mthrok

Differential Revision: D34120104

Pulled By: hwangjeff

fbshipit-source-id: 60deed45731954d5f9b7df73a35d8373184c3c67
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D34120104

Summary:
Consolidates LibriSpeech and TED-LIUM Release 3 Emformer RNN-T training recipes in a single directory.

Pull Request resolved: pytorch#2212

Reviewed By: mthrok

Differential Revision: D34120104

Pulled By: hwangjeff

fbshipit-source-id: 9ed0a5d5b209478a841324f360c5b66268e3b228
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D34120104


assert len(idx_target_lengths) > 0

idx_target_lengths = sorted(idx_target_lengths, key=lambda x: x[1])

assert max_token_limit >= idx_target_lengths[-1][1]

self.batches = _batch_by_token_count(idx_target_lengths, max_token_limit)
self.batches = batch_by_token_count(idx_target_lengths, max_token_limit)[:100]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we choose the first 100 batches here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call — this was meant to speed up the training epoch for testing purposes only and should be removed

@nateanl
Copy link
Member

nateanl commented Feb 10, 2022

Do we want to keep the train_sentencepiece.py script in each sub directory? Otherwise they may ignore the important details that may impact the model performance.

@mthrok
Copy link
Collaborator

mthrok commented Feb 14, 2022

Do we want to keep the train_sentencepiece.py script in each sub directory? Otherwise they may ignore the important details that may impact the model performance.

@nateanl The difference between librispeech model and tedluim model should be clearly outlined in the top level README. It can be as simple as the Sentence Piece model also needs to be different, for the detail please refer to the READMe in respective directory and delegate the detail to the sub directory READMEs.

default=pathlib.Path("global_stats.json"),
type=pathlib.Path,
help="Path to JSON file containing feature means and stddevs.",
required=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it has a default value, is it really necessary to specify required=True here?
Also, the default value is not correct as that file is inside the recipe dir.

I would suggest removing the default value.

xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
Consolidates LibriSpeech and TED-LIUM Release 3 Emformer RNN-T training recipes in a single directory.

Pull Request resolved: pytorch#2212

Reviewed By: mthrok

Differential Revision: D34120104

Pulled By: hwangjeff

fbshipit-source-id: 29c6e27195d5998f76d67c35b718110e73529456
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants