Skip to content

Conversation

@rushikeshadhav
Copy link
Contributor

Description

This PR removes the deprecated read_parquet_bulk API from Ray Data, along with its implementation and documentation. This function was deprecated in favor of read_parquet, which now covers all equivalent use cases. The deprecation warning stated removal after May 2025, and that deadline has passed — so this cleanup reduces maintenance burden and prevents user confusion.

Summary of changes

  • Removed read_parquet_bulk from read_api.py and init.py
  • Deleted ParquetBulkDatasource + its file
  • Removed related tests and documentation
  • Updated references and docstrings mentioning the deprecated API

Related issues

Fixes #58969

@rushikeshadhav rushikeshadhav requested a review from a team as a code owner November 25, 2025 12:06
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively removes the deprecated read_parquet_bulk API, which helps reduce maintenance and prevent user confusion. The changes are comprehensive, covering the function's implementation, tests, documentation, and internal references. The code removal is clean and I only have one minor suggestion to improve the clarity of an updated docstring.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Orphaned parametrize decorators stacked on unrelated test function

The @pytest.mark.parametrize decorators for the deleted test_parquet_read_bulk and test_parquet_read_bulk_meta_provider functions were not removed along with the functions. These orphaned decorators (lines 234-249 and 250-265) are now stacked on top of test_parquet_read_partitioned, causing that test to have three parametrize decorators instead of one. This results in the test running with a Cartesian product of parameters from all three decorators, dramatically increasing test execution time and potentially causing failures from duplicate parameter name conflicts.

python/ray/data/tests/test_parquet.py#L233-L265

@pytest.mark.parametrize(
"fs,data_path",
[
(None, lazy_fixture("local_path")),
(lazy_fixture("local_fs"), lazy_fixture("local_path")),
(lazy_fixture("s3_fs"), lazy_fixture("s3_path")),
(
lazy_fixture("s3_fs_with_space"),
lazy_fixture("s3_path_with_space"),
), # Path contains space.
(
lazy_fixture("s3_fs_with_anonymous_crendential"),
lazy_fixture("s3_path_with_anonymous_crendential"),
),
],
)
@pytest.mark.parametrize(
"fs,data_path",
[
(None, lazy_fixture("local_path")),
(lazy_fixture("local_fs"), lazy_fixture("local_path")),
(lazy_fixture("s3_fs"), lazy_fixture("s3_path")),
(
lazy_fixture("s3_fs_with_space"),
lazy_fixture("s3_path_with_space"),
), # Path contains space.
(
lazy_fixture("s3_fs_with_anonymous_crendential"),
lazy_fixture("s3_path_with_anonymous_crendential"),
),
],
)

Fix in Cursor Fix in Web


@ray-gardener ray-gardener bot added docs An issue or change related to documentation data Ray Data-related issues community-contribution Contributed by the community labels Nov 25, 2025
Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty for the contribution! Overall LGTM, just left a couple comments

Signed-off-by: rushikesh.adhav <adhavrushikesh6@gmail.com>
@rushikeshadhav rushikeshadhav force-pushed the rushikesh/remove-read-parquet-bulk-api branch from bbfae94 to 5619e3f Compare November 26, 2025 05:31
rushikeshadhav and others added 2 commits November 26, 2025 11:02
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚢

@bveeramani
Copy link
Member

@rushikeshadhav as a follow up, would you be interested in removing FastFileMetadataProvider?

@bveeramani bveeramani changed the title data: remove deprecated read_parquet_bulk API [Data] Remove deprecated read_parquet_bulk API Nov 26, 2025
@bveeramani bveeramani enabled auto-merge (squash) November 26, 2025 08:19
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Nov 26, 2025
@bveeramani bveeramani self-assigned this Nov 26, 2025
@bveeramani bveeramani merged commit 2fbb0bd into ray-project:master Nov 26, 2025
7 of 8 checks passed
@rushikeshadhav
Copy link
Contributor Author

@rushikeshadhav as a follow up, would you be interested in removing FastFileMetadataProvider?

Yes, I would love to.

SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
## Description
>This PR removes the deprecated read_parquet_bulk API from Ray Data,
along with its implementation and documentation. This function was
deprecated in favor of read_parquet, which now covers all equivalent use
cases. The deprecation warning stated removal after May 2025, and that
deadline has passed — so this cleanup reduces maintenance burden and
prevents user confusion.

Summary of changes

- Removed read_parquet_bulk from read_api.py and __init__.py
- Deleted ParquetBulkDatasource + its file
- Removed related tests and documentation
- Updated references and docstrings mentioning the deprecated API

## Related issues
> Fixes ray-project#58969

---------

Signed-off-by: rushikesh.adhav <adhavrushikesh6@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues docs An issue or change related to documentation go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Remove deprecated read_parquet_bulk

2 participants