Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepare datasets for new encoding kwarg. #1616

Merged
merged 1 commit into from
Feb 19, 2022

Conversation

erip
Copy link
Contributor

@erip erip commented Feb 16, 2022

This may need to wait for appropriate nightlies to be published, but @ejguan suggested this feature will make it into pytorch 1.11 so that's a good sign.

@ejguan
Copy link
Contributor

ejguan commented Feb 16, 2022

Note: The PR with the change of encoding has already been merged into release/1.11 branch.

@Nayef211
Copy link
Contributor

This may need to wait for appropriate nightlies to be published, but @ejguan suggested this feature will make it into pytorch 1.11 so that's a good sign.

I think we're using the latest main branch for torchdata for our unit tests so our tests should not be failing if the encoding kwarg is already present in the torchdata repo.

Here are the code pointers to where we are installing torchdata

@erip
Copy link
Contributor Author

erip commented Feb 17, 2022

The encoding kwarg is in pytorch proper, not torchdata.

@Nayef211
Copy link
Contributor

The encoding kwarg is in pytorch proper, not torchdata.

Ahh understood, thanks for clarifying!

@Nayef211
Copy link
Contributor

Btw can we fix the merge conflicts with some of the datasets?

@erip
Copy link
Contributor Author

erip commented Feb 17, 2022

Oops -- yep!

@erip erip force-pushed the feature/use-fileopener-encoding branch from 1248136 to 951253d Compare February 17, 2022 01:36
@parmeet
Copy link
Contributor

parmeet commented Feb 17, 2022

Great to see this PR @erip! Hopefully we should soon be in position to land these changes :)

@erip erip force-pushed the feature/use-fileopener-encoding branch from 951253d to e77522f Compare February 17, 2022 13:59
@erip
Copy link
Contributor Author

erip commented Feb 17, 2022

This is a good sign. Once nightlies with Windows (and OS X) are published, this should be gtg -- Linux nightlies have propagated and tests pass in those cases.

Copy link
Contributor

@parmeet parmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the changes LGTM! We can merge once after the windows tests are passing. Thanks.

@erip erip force-pushed the feature/use-fileopener-encoding branch from e77522f to af4da8a Compare February 19, 2022 12:38
@erip
Copy link
Contributor Author

erip commented Feb 19, 2022

New windows and OS X nightlies landed, so this should hopefully pass and be good to go. 🎆

@codecov
Copy link

codecov bot commented Feb 19, 2022

Codecov Report

Merging #1616 (af4da8a) into main (16acc71) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1616   +/-   ##
=======================================
  Coverage   85.25%   85.25%           
=======================================
  Files          58       58           
  Lines        2483     2483           
=======================================
  Hits         2117     2117           
  Misses        366      366           
Impacted Files Coverage Δ
torchtext/datasets/ag_news.py 95.23% <100.00%> (ø)
torchtext/datasets/amazonreviewfull.py 96.15% <100.00%> (ø)
torchtext/datasets/amazonreviewpolarity.py 96.00% <100.00%> (ø)
torchtext/datasets/cc100.py 95.83% <100.00%> (ø)
torchtext/datasets/conll2000chunking.py 95.83% <100.00%> (ø)
torchtext/datasets/dbpedia.py 96.00% <100.00%> (ø)
torchtext/datasets/enwik9.py 95.45% <100.00%> (ø)
torchtext/datasets/imdb.py 97.29% <100.00%> (ø)
torchtext/datasets/iwslt2016.py 89.83% <100.00%> (ø)
torchtext/datasets/iwslt2017.py 92.72% <100.00%> (ø)
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16acc71...af4da8a. Read the comment docs.

@parmeet parmeet merged commit f2da1b8 into pytorch:main Feb 19, 2022
@erip erip deleted the feature/use-fileopener-encoding branch February 19, 2022 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants