-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataLoader tutorial does not handle num_workers > 0 #352
Comments
Thanks for the feedback!!!
Yeah. Like where users should put sharding in the datapipe graph. Also, the discrepancy of result should be mentioned as well in #302 |
It would be nice to see an updated tutorial, I don't know how to do this in an easy way 😭 Using 0 < num_workers is a very common usecase, it could be nice to have a default solution. On top, the datapipe examples provided have no consideration for this case (except https://github.com/pytorch/torchrec/blob/main/torchrec/datasets/criteo.py). Maybe a quick explanation here could suffice for now 😜 |
Updating the tutorial and README with more relevant/correct information Fixes #352 Differential Revision: [D36645515](https://our.internmc.facebook.com/intern/diff/D36645515) [ghstack-poisoned]
Updating the tutorial and README with more relevant/correct information. Minor fix to one part of `MapDataPipe` documentation as well. Fixes #352 Differential Revision: [D36645515](https://our.internmc.facebook.com/intern/diff/D36645515) [ghstack-poisoned]
Updating the tutorial and README with more relevant/correct information. Minor fix to one part of `MapDataPipe` documentation as well. Fixes #352 Differential Revision: [D36645515](https://our.internmc.facebook.com/intern/diff/D36645515) [ghstack-poisoned]
Updating the tutorial and README with more relevant/correct information. Minor fix to one part of `MapDataPipe` documentation as well. Fixes #352
Updating the tutorial and README with more relevant/correct information. Minor fix to one part of `MapDataPipe` documentation as well. Fixes #352
The tutorial has been updated. If anyone is still experiencing any issue with the nightly version, feel free to re-open. |
I just wanted to document an issue with the tutorials https://pytorch.org/data/beta/tutorial.html#working-with-dataloader
The code in the tutorial will not work when running multiple DataLoader processes as the datapipe will be duplicated across workers:
gives
Even though this is still beta, it may still be worth letting users know about such pitfalls.
Also, since there are various ways to achieve the sharding, it could be useful to settle on a definite canonical way of handling all this.
The text was updated successfully, but these errors were encountered: