Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Add data ingestion clarification for AIR converting existing pytorch code example #32058

Merged

Conversation

woshiyyya
Copy link
Member

@woshiyyya woshiyyya commented Jan 30, 2023

Why are these changes needed?

The example under Ray AI Runtime/Example section directly used native PyTorch datasets for data loading. It's good to clarify that the current approach is for simplicity, the more recommended approach is to use the Ray dataset.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter>
@woshiyyya woshiyyya force-pushed the doc/clarify_pytorch_demo_data_ingest branch from e62c7c3 to 55e066b Compare January 30, 2023 19:50
@woshiyyya woshiyyya marked this pull request as ready for review January 30, 2023 22:32
Copy link
Contributor

@amogkam amogkam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @woshiyyya! Left a comment

"Then we download the data:"
"Then we download the data: \n",
"\n",
"Assumption for this tutorial: your existing code is using the `torchvision.datasets` native to PyTorch. This tutorial continues to use `torchvision.datasets` to allow you to make as few code changes as possible. **Everything in this tutorial is also possible if you choose to use Ray Data, and you will also get the benefits of efficient preprocessing and multi-worker batch prediction.** See [here](train-datasets) for resources to get started with Ray Data."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It can be any PyTorch DataLoader, not necessarily torchvision datasets
  2. The benefit is for parallel preprocessing-- you can still use Ray Data for batch prediction without using for training (as this tutorial already does)
  3. Maybe link to this tutorial instead: https://docs.ray.io/en/latest/ray-air/examples/torch_image_example.html?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the points! I have modified the description accordingly.

Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter>
woshiyyya and others added 2 commits February 6, 2023 13:09
…ay_air.ipynb

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>
Copy link
Contributor

@amogkam amogkam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@amogkam amogkam merged commit 91940e3 into ray-project:master Feb 13, 2023
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…torch code example (ray-project#32058)

The example under Ray AI Runtime/Example section directly used native PyTorch datasets for data loading. It's good to clarify that the current approach is for simplicity, the more recommended approach is to use the Ray dataset.

Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter>
Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>
Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MacBook-Pro.local>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants