Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add _PATHS to text classification datasets, add global MD5 and NUM_LINES #1155

Merged
merged 1 commit into from
Feb 15, 2021

Conversation

cpuhrsch
Copy link
Contributor

@cpuhrsch cpuhrsch commented Feb 13, 2021

Currently text classification datasets will download from google drive even if the result already exists on disk, because google drive URLs do not allow to infer the path at which the download result will be stored. This PR addresses this for translation datasets and further introduces torchtext.experimental.datasets.raw.MD5 and NUM_LINES dictionaries to allow for easier access of this meta information. It also fixes meta information for the wrapped dataset factory function to actually contain the defaults for split, root and offset.

…ult already exists. Create global MD5 and NUM_LINES dictionaries.
@codecov
Copy link

codecov bot commented Feb 13, 2021

Codecov Report

Merging #1155 (351f0d9) into master (9053d95) will increase coverage by 0.14%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1155      +/-   ##
==========================================
+ Coverage   79.46%   79.61%   +0.14%     
==========================================
  Files          47       47              
  Lines        3175     3198      +23     
==========================================
+ Hits         2523     2546      +23     
  Misses        652      652              
Impacted Files Coverage Δ
torchtext/experimental/datasets/raw/__init__.py 100.00% <100.00%> (ø)
torchtext/experimental/datasets/raw/common.py 88.00% <100.00%> (+0.16%) ⬆️
...t/experimental/datasets/raw/text_classification.py 87.83% <100.00%> (+0.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9053d95...351f0d9. Read the comment docs.

@cpuhrsch cpuhrsch merged commit dc58f9a into pytorch:master Feb 15, 2021
@cpuhrsch cpuhrsch changed the title Add _PATHS to text classification datasets, add global MD5 and NUM_LINES. Add _PATHS to text classification datasets, add global MD5 and NUM_LINES Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants