Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dataloader2 training loop example with torch text #670

Closed
wants to merge 1 commit into from

Conversation

dahsh
Copy link
Contributor

@dahsh dahsh commented Jul 21, 2022

Summary:

  • Add the uscase and example of DataLoader2 with open source datasets/datapipes.
    TorchText provides several standard NLP datasets; here we use its SST2 OSS dataset
    in the DataLoader2 train loop.

  • We will have more examples to showcase the advantages:
    (1) The usage of the DLv2 with popular open source dataset.
    (2) Integrate datasets/datapipes with different reading service.
    (3) Datapipe manipulation for example batch, collate, map.
    (4) Dist usage and examples with features such as sharding_filter for the sharding feature.
    (5) Eventually add those examples to the pytorch tutorials.

Reviewed By: ejguan

Differential Revision: D37938017

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Jul 21, 2022
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D37938017

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D37938017

dahsh added a commit to dahsh/data that referenced this pull request Jul 21, 2022
Summary:
Pull Request resolved: pytorch#670

* Add the uscase and example of DataLoader2 with open source datasets/datapipes.
TorchText provides several standard NLP datasets; here we use its SST2 OSS dataset
 in the DataLoader2 train loop.

* We will have more examples to showcase the advantages:
(1) The usage of the DLv2 with popular open source dataset.
(2) Integrate datasets/datapipes with different reading service.
(3) Datapipe manipulation for example batch, collate, map.
(4) Dist usage and examples with features such as sharding_filter for the sharding feature.
(5) Eventually add those examples to the pytorch tutorials.

Reviewed By: ejguan

Differential Revision: D37938017

fbshipit-source-id: f58076b6d5de73577c4d9cdd423e2c5be88ef041
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

Summary:
Pull Request resolved: pytorch#670

* Add the uscase and example of DataLoader2 with open source datasets/datapipes.
TorchText provides several standard NLP datasets; here we use its SST2 OSS dataset
 in the DataLoader2 train loop.

* We will have more examples to showcase the advantages:
(1) The usage of the DLv2 with popular open source dataset.
(2) Integrate datasets/datapipes with different reading service.
(3) Datapipe manipulation for example batch, collate, map.
(4) Dist usage and examples with features such as sharding_filter for the sharding feature.
(5) Eventually add those examples to the pytorch tutorials.

Reviewed By: ejguan

Differential Revision: D37938017

fbshipit-source-id: e00c6f7af63b5a6d33ed138c563f4b20881b5ad6
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D37938017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants