Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix group_lengths for short datasets #12558

Merged
merged 1 commit into from
Jul 8, 2021
Merged

Fix group_lengths for short datasets #12558

merged 1 commit into from
Jul 8, 2021

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Jul 7, 2021

What does this PR do?

This PR adds a fix in the group_lengths function used in all language modeling examples so it also works for short datasets (without returning a dataset of length 0). The fix was discussed in the issue mentioned below.

Fixes #12438

@sgugger sgugger requested a review from LysandreJik July 7, 2021 13:14
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks for taking care of it @sgugger :)

@sgugger sgugger merged commit 6f1adc4 into master Jul 8, 2021
@sgugger sgugger deleted the fix_12438 branch July 8, 2021 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IndexError: index out of bound, MLM+XLA (pre-training)
2 participants