Add _PATHS to text classification datasets, add global MD5 and NUM_LINES #1155

cpuhrsch · 2021-02-13T05:17:16Z

Currently text classification datasets will download from google drive even if the result already exists on disk, because google drive URLs do not allow to infer the path at which the download result will be stored. This PR addresses this for translation datasets and further introduces torchtext.experimental.datasets.raw.MD5 and NUM_LINES dictionaries to allow for easier access of this meta information. It also fixes meta information for the wrapped dataset factory function to actually contain the defaults for split, root and offset.

…ult already exists. Create global MD5 and NUM_LINES dictionaries.

codecov · 2021-02-13T05:47:06Z

Codecov Report

Merging #1155 (351f0d9) into master (9053d95) will increase coverage by 0.14%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1155      +/-   ##
==========================================
+ Coverage   79.46%   79.61%   +0.14%     
==========================================
  Files          47       47              
  Lines        3175     3198      +23     
==========================================
+ Hits         2523     2546      +23     
  Misses        652      652

Impacted Files	Coverage Δ
torchtext/experimental/datasets/raw/__init__.py	`100.00% <100.00%> (ø)`
torchtext/experimental/datasets/raw/common.py	`88.00% <100.00%> (+0.16%)`	⬆️
...t/experimental/datasets/raw/text_classification.py	`87.83% <100.00%> (+0.33%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9053d95...351f0d9. Read the comment docs.

Add _PATHS to text classification datasets to prevent download if res…

351f0d9

…ult already exists. Create global MD5 and NUM_LINES dictionaries.

facebook-github-bot added the cla signed label Feb 13, 2021

cpuhrsch merged commit dc58f9a into pytorch:master Feb 15, 2021

cpuhrsch changed the title ~~Add _PATHS to text classification datasets, add global MD5 and NUM_LINES.~~ Add _PATHS to text classification datasets, add global MD5 and NUM_LINES Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add _PATHS to text classification datasets, add global MD5 and NUM_LINES #1155

Add _PATHS to text classification datasets, add global MD5 and NUM_LINES #1155

cpuhrsch commented Feb 13, 2021 •

edited

Loading

codecov bot commented Feb 13, 2021 •

edited

Loading

Add _PATHS to text classification datasets, add global MD5 and NUM_LINES #1155

Add _PATHS to text classification datasets, add global MD5 and NUM_LINES #1155

Conversation

cpuhrsch commented Feb 13, 2021 • edited Loading

codecov bot commented Feb 13, 2021 • edited Loading

Codecov Report

cpuhrsch commented Feb 13, 2021 •

edited

Loading

codecov bot commented Feb 13, 2021 •

edited

Loading