Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove torchdata dependency from package and from CI #2241

Merged
merged 11 commits into from
Mar 22, 2024

Conversation

NicolasHug
Copy link
Member

This PR removes the dependency on torchdata:

  • from the torchtext package (setup.py / requirements.txt, etc.)
  • from the CI: all unittest and build jobs.

The files that still have references to torchdata after this PR are:

(base) ➜  text git:(remove_torchdata) ✗ git grep -i -l torchdata 
CONTRIBUTING_DATASETS.md
README.rst
docs/source/datasets.rst
examples/torcharrow/README.md
examples/tutorials/sst2_classification_non_distributed.py
examples/tutorials/t5_demo.py
test/torchtext_unittest/datasets/common.py
test/torchtext_unittest/datasets/test_agnews.py
test/torchtext_unittest/datasets/test_amazonreviews.py
test/torchtext_unittest/datasets/test_cc100.py
test/torchtext_unittest/datasets/test_cnndm.py
test/torchtext_unittest/datasets/test_cola.py
test/torchtext_unittest/datasets/test_conll2000chunking.py
test/torchtext_unittest/datasets/test_dbpedia.py
test/torchtext_unittest/datasets/test_enwik9.py
test/torchtext_unittest/datasets/test_imdb.py
test/torchtext_unittest/datasets/test_iwslt2016.py
test/torchtext_unittest/datasets/test_iwslt2017.py
test/torchtext_unittest/datasets/test_mnli.py
test/torchtext_unittest/datasets/test_mrpc.py
test/torchtext_unittest/datasets/test_multi30k.py
test/torchtext_unittest/datasets/test_penntreebank.py
test/torchtext_unittest/datasets/test_qnli.py
test/torchtext_unittest/datasets/test_qqp.py
test/torchtext_unittest/datasets/test_rte.py
test/torchtext_unittest/datasets/test_sogounews.py
test/torchtext_unittest/datasets/test_squads.py
test/torchtext_unittest/datasets/test_sst2.py
test/torchtext_unittest/datasets/test_stsb.py
test/torchtext_unittest/datasets/test_udpos.py
test/torchtext_unittest/datasets/test_wikitexts.py
test/torchtext_unittest/datasets/test_wnli.py
test/torchtext_unittest/datasets/test_yahooanswers.py
test/torchtext_unittest/datasets/test_yelpreviews.py
torchtext/datasets/ag_news.py
torchtext/datasets/amazonreviewfull.py
torchtext/datasets/amazonreviewpolarity.py
torchtext/datasets/cc100.py
torchtext/datasets/cnndm.py
torchtext/datasets/cola.py
torchtext/datasets/conll2000chunking.py
torchtext/datasets/dbpedia.py
torchtext/datasets/enwik9.py
torchtext/datasets/imdb.py
torchtext/datasets/iwslt2016.py
torchtext/datasets/iwslt2017.py
torchtext/datasets/mnli.py
torchtext/datasets/mrpc.py
torchtext/datasets/multi30k.py
torchtext/datasets/penntreebank.py
torchtext/datasets/qnli.py
torchtext/datasets/qqp.py
torchtext/datasets/rte.py
torchtext/datasets/sogounews.py
torchtext/datasets/squad1.py
torchtext/datasets/squad2.py
torchtext/datasets/sst2.py
torchtext/datasets/stsb.py
torchtext/datasets/udpos.py
torchtext/datasets/wikitext103.py
torchtext/datasets/wikitext2.py
torchtext/datasets/wnli.py
torchtext/datasets/yahooanswers.py
torchtext/datasets/yelpreviewfull.py
torchtext/datasets/yelpreviewpolarity.py

The torchtext.datasets namespace still relies on torchdata but that's OK: users who still need it can just install torchdata manually. Similarly I did not update/remove the tests/datasets/* files and instead just ignored all the dataset tests in pytest.ini.

I will be submitting follow-up PRs to also address the user-facing docs, raise proper warnings, etc.

@NicolasHug NicolasHug requested a review from atalman March 22, 2024 11:49
Copy link

pytorch-bot bot commented Mar 22, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/text/2241

Note: Links to docs will display an error until the docs builds have been completed.

❌ 19 New Failures, 5 Unrelated Failures

As of commit 9e13774 with merge base 52c0d85 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.


if channel == "nightly":
validateTorchdataVersion()

print("torchtext version is ", torchtext.__version__)
Copy link
Contributor

@atalman atalman Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could include 1-2 basic torchtext functionalities tests, here to see if they work. This can be done in followup pr

@NicolasHug NicolasHug merged commit 51f373d into pytorch:main Mar 22, 2024
20 of 51 checks passed
@NicolasHug NicolasHug deleted the remove_torchdata branch March 22, 2024 14:34
PaliC added a commit that referenced this pull request Mar 22, 2024
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Fix torchdata import error (#2242)
huydhn added a commit that referenced this pull request Mar 23, 2024
* Remove torchdata dependency from package and from CI (#2241)

* Fix torchdata import error (#2242)

* Remove stuff

* stuff

* lint

---------

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants