Skip to content

Conversation

@omatthew98
Copy link
Contributor

Description

We are using read_parquet in two of our tests in test_operator_fusion.py, this switches those to use range to make the tests less brittle.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Matthew Owen <mowen@anyscale.com>
@omatthew98 omatthew98 requested a review from a team as a code owner October 22, 2025 18:10
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors two tests in test_operator_fusion.py to use ray.data.range instead of ray.data.read_parquet. This is a great improvement as it makes the tests more robust and less brittle by removing the dependency on file I/O. The changes are correct and align with the goal of improving test quality. I've added a couple of minor suggestions to improve variable naming for better readability. Additionally, as a minor follow-up, the temp_dir fixture is now unused in test_read_with_map_batches_fused_successfully and test_map_batches_with_batch_size_specified_fusion and could be removed from their signatures for better code clarity.

Comment on lines +262 to +263
n = 10
ds = ray.data.range(n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and maintainability, consider using a more descriptive variable name instead of n. For example, num_rows would make the purpose of the variable clearer.

Suggested change
n = 10
ds = ray.data.range(n)
num_rows = 10
ds = ray.data.range(num_rows)

Comment on lines +380 to +381
n = 10
ds = ray.data.range(n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and maintainability, consider using a more descriptive variable name instead of n. For example, num_rows would make the purpose of the variable clearer.

Suggested change
n = 10
ds = ray.data.range(n)
num_rows = 10
ds = ray.data.range(num_rows)

@omatthew98 omatthew98 requested a review from bveeramani October 22, 2025 18:12
Comment on lines +262 to +263
n = 10
ds = ray.data.range(n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Minimally passing tests is a good practice because it can make the intent of the test more clear.

Suggested change
n = 10
ds = ray.data.range(n)
ds = ray.data.range(1)

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Oct 22, 2025
@omatthew98 omatthew98 added the go add ONLY when ready to merge, run all tests label Oct 22, 2025
@bveeramani bveeramani merged commit 1b1bd91 into ray-project:master Oct 22, 2025
6 checks passed
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 27, 2025
## Description
We are using `read_parquet` in two of our tests in
`test_operator_fusion.py`, this switches those to use `range` to make
the tests less brittle.

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
## Description
We are using `read_parquet` in two of our tests in
`test_operator_fusion.py`, this switches those to use `range` to make
the tests less brittle.

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
## Description
We are using `read_parquet` in two of our tests in
`test_operator_fusion.py`, this switches those to use `range` to make
the tests less brittle.

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
## Description
We are using `read_parquet` in two of our tests in
`test_operator_fusion.py`, this switches those to use `range` to make
the tests less brittle.

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

2 participants