Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of end of file while reading vocab from file #1573

Merged
merged 1 commit into from
Feb 4, 2022

Conversation

abhinavarora
Copy link
Contributor

Issue

As we were onboarding pre-commit for torchtext in #1546, we added end of file newline rule. With this rule it was discovered that our experimental vocab incorrectly handled new line at the end of vocab file. Vocab simply added an empty string to the vocab due to which our unit tests failed.

Fix

Vocab builder should ignore tokens that are empty.

Testing

pytest -v test/experimental/test_with_asset.py

@abhinavarora abhinavarora self-assigned this Feb 4, 2022
@abhinavarora abhinavarora requested review from parmeet, Nayef211 and vcm2114 and removed request for parmeet February 4, 2022 01:30
Copy link
Contributor

@Nayef211 Nayef211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. Thanks for catching and fixing this bug @abhinavarora!

As a small nit, I would avoid combining formatting changes with features/bug fixes since it makes it hard to differentiate between the 2 types of changes when reviewing the PR or when referring to it in the future in case the PR introduces a bug.

c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the change for this line? On the Github UI, the diff looks the same to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Nayef211 this PR does not make any formatting change. The change in vocab_test.txt is to add a new line at the end of the file, so that the test is also able to test this condition.

@abhinavarora
Copy link
Contributor Author

This LGTM. Thanks for catching and fixing this bug @abhinavarora!

As a small nit, I would avoid combining formatting changes with features/bug fixes since it makes it hard to differentiate between the 2 types of changes when reviewing the PR or when referring to it in the future in case the PR introduces a bug.

The PR does not add any formatting changes :-) Those are in a different PR #1546,

@abhinavarora abhinavarora merged commit 7f3ed4b into pytorch:main Feb 4, 2022
@abhinavarora abhinavarora deleted the fix_eof branch February 4, 2022 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants