-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] Add data ingestion clarification for AIR converting existing pytorch code example #32058
[Doc] Add data ingestion clarification for AIR converting existing pytorch code example #32058
Conversation
Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter>
e62c7c3
to
55e066b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @woshiyyya! Left a comment
"Then we download the data:" | ||
"Then we download the data: \n", | ||
"\n", | ||
"Assumption for this tutorial: your existing code is using the `torchvision.datasets` native to PyTorch. This tutorial continues to use `torchvision.datasets` to allow you to make as few code changes as possible. **Everything in this tutorial is also possible if you choose to use Ray Data, and you will also get the benefits of efficient preprocessing and multi-worker batch prediction.** See [here](train-datasets) for resources to get started with Ray Data." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It can be any PyTorch DataLoader, not necessarily torchvision datasets
- The benefit is for parallel preprocessing-- you can still use Ray Data for batch prediction without using for training (as this tutorial already does)
- Maybe link to this tutorial instead: https://docs.ray.io/en/latest/ray-air/examples/torch_image_example.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the points! I have modified the description accordingly.
Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter>
doc/source/ray-air/examples/convert_existing_pytorch_code_to_ray_air.ipynb
Outdated
Show resolved
Hide resolved
…ay_air.ipynb Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…torch code example (ray-project#32058) The example under Ray AI Runtime/Example section directly used native PyTorch datasets for data loading. It's good to clarify that the current approach is for simplicity, the more recommended approach is to use the Ray dataset. Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MacBook-Pro.local> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Why are these changes needed?
The example under Ray AI Runtime/Example section directly used native PyTorch datasets for data loading. It's good to clarify that the current approach is for simplicity, the more recommended approach is to use the Ray dataset.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.