-
Notifications
You must be signed in to change notification settings - Fork 7k
[Data] Remove deprecated read_parquet_bulk API #58970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Remove deprecated read_parquet_bulk API #58970
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively removes the deprecated read_parquet_bulk API, which helps reduce maintenance and prevent user confusion. The changes are comprehensive, covering the function's implementation, tests, documentation, and internal references. The code removal is clean and I only have one minor suggestion to improve the clarity of an updated docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Orphaned parametrize decorators stacked on unrelated test function
The @pytest.mark.parametrize decorators for the deleted test_parquet_read_bulk and test_parquet_read_bulk_meta_provider functions were not removed along with the functions. These orphaned decorators (lines 234-249 and 250-265) are now stacked on top of test_parquet_read_partitioned, causing that test to have three parametrize decorators instead of one. This results in the test running with a Cartesian product of parameters from all three decorators, dramatically increasing test execution time and potentially causing failures from duplicate parameter name conflicts.
python/ray/data/tests/test_parquet.py#L233-L265
ray/python/ray/data/tests/test_parquet.py
Lines 233 to 265 in bbfae94
| @pytest.mark.parametrize( | |
| "fs,data_path", | |
| [ | |
| (None, lazy_fixture("local_path")), | |
| (lazy_fixture("local_fs"), lazy_fixture("local_path")), | |
| (lazy_fixture("s3_fs"), lazy_fixture("s3_path")), | |
| ( | |
| lazy_fixture("s3_fs_with_space"), | |
| lazy_fixture("s3_path_with_space"), | |
| ), # Path contains space. | |
| ( | |
| lazy_fixture("s3_fs_with_anonymous_crendential"), | |
| lazy_fixture("s3_path_with_anonymous_crendential"), | |
| ), | |
| ], | |
| ) | |
| @pytest.mark.parametrize( | |
| "fs,data_path", | |
| [ | |
| (None, lazy_fixture("local_path")), | |
| (lazy_fixture("local_fs"), lazy_fixture("local_path")), | |
| (lazy_fixture("s3_fs"), lazy_fixture("s3_path")), | |
| ( | |
| lazy_fixture("s3_fs_with_space"), | |
| lazy_fixture("s3_path_with_space"), | |
| ), # Path contains space. | |
| ( | |
| lazy_fixture("s3_fs_with_anonymous_crendential"), | |
| lazy_fixture("s3_path_with_anonymous_crendential"), | |
| ), | |
| ], | |
| ) |
bveeramani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty for the contribution! Overall LGTM, just left a couple comments
Signed-off-by: rushikesh.adhav <adhavrushikesh6@gmail.com>
bbfae94 to
5619e3f
Compare
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
bveeramani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚢
|
@rushikeshadhav as a follow up, would you be interested in removing |
Yes, I would love to. |
## Description >This PR removes the deprecated read_parquet_bulk API from Ray Data, along with its implementation and documentation. This function was deprecated in favor of read_parquet, which now covers all equivalent use cases. The deprecation warning stated removal after May 2025, and that deadline has passed — so this cleanup reduces maintenance burden and prevents user confusion. Summary of changes - Removed read_parquet_bulk from read_api.py and __init__.py - Deleted ParquetBulkDatasource + its file - Removed related tests and documentation - Updated references and docstrings mentioning the deprecated API ## Related issues > Fixes ray-project#58969 --------- Signed-off-by: rushikesh.adhav <adhavrushikesh6@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Description
Summary of changes
Related issues