Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size of initial minibatch #146

Closed
danpovey opened this issue Nov 22, 2020 · 5 comments · Fixed by #148
Closed

Size of initial minibatch #146

danpovey opened this issue Nov 22, 2020 · 5 comments · Fixed by #148

Comments

@danpovey
Copy link
Collaborator

danpovey commented Nov 22, 2020

Piotr, in our snowfall eg with mini_librispeech, the 1st minibatch is 16 seconds long which seems on the long side.
Is that typical of the data, or are they arranged from longest to shortest?
.. because if we want them in a nonrandom order we probably want shortest to longest, which would be better for convergence.

@pzelasko
Copy link
Collaborator

pzelasko commented Nov 22, 2020 via email

@danpovey
Copy link
Collaborator Author

danpovey commented Nov 22, 2020 via email

@pzelasko
Copy link
Collaborator

By default it's twice of that (it helps the heuristic pack cuts better), but can be adjusted; see: https://github.com/lhotse-speech/lhotse/blob/master/lhotse/dataset/speech_recognition.py#L136

@danpovey
Copy link
Collaborator Author

danpovey commented Nov 22, 2020 via email

@pzelasko
Copy link
Collaborator

pzelasko commented Nov 22, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants