-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[docs/data] Add download to key user journeys in documentation
#59417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs/data] Add download to key user journeys in documentation
#59417
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces documentation for the download expression in Ray Data, which is a valuable addition. The examples are clear and helpful for users looking to download data from URIs within their datasets. I've identified a couple of minor areas for improvement in the code examples to enhance clarity by removing unused imports. The other changes in this PR, which correct file paths, are accurate and necessary.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
|
|
||
| NUM_GPU_NODES = 8 | ||
| INPUT_PATH = "s3://anonymous@ray-example-data/imagenet/metadata_file" | ||
| INPUT_PATH = "s3://anonymous@ray-example-data/imagenet/metadata_file.parquet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Inconsistent path update between benchmark comparison scripts
The INPUT_PATH in ray_data_main.py was updated to include .parquet, but the companion benchmark file daft_main.py in the same directory still uses the old path s3://anonymous@ray-example-data/imagenet/metadata_file without the extension. These two files are meant to compare Ray Data vs Daft performance on the same image classification workload, so they need to read from the same data path. This inconsistency will cause either one benchmark to fail (if only one path exists) or the benchmarks to read from different datasets, making comparisons invalid.
…-project#59417) Shows users how to use `download` to download from URI tables. --------- Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: kriyanshii <kriyanshishah06@gmail.com>
…-project#59417) Shows users how to use `download` to download from URI tables. --------- Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Shows users how to use
downloadto download from URI tables.