Skip to content

Conversation

@bveeramani
Copy link
Member

@bveeramani bveeramani commented Nov 10, 2025

Description

This PR refactors tests in test_download_expression.py to make them easier to maintain and less prone to brittle failures. Some of the previous tests were more complex than necessary and relied on assumptions that could occasionally cause false negatives.

Key updates:

  • Reduce flaky behavior: Added explicit sorting by ID in test_download_expression_handles_failed_downloads to avoid relying on a specific output order, which isn’t guaranteed and could sometimes cause intermittent failures.
  • Simplify test logic: Reduced test_download_expression_failed_size_estimation from 30 URIs to just 1. A single failing URI is sufficient to confirm that failed downloads don’t trigger divide-by-zero errors, and this change makes the test easier to understand and faster to run.
  • Improve readability: Replaced pa.Table.from_arrays() with ray.data.from_items(), which makes the test setup more straightforward for future maintainers.
  • Remove redundancy: Deleted test_download_expression_mixed_valid_and_invalid_size_estimation, since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster, clearer, and more robust while keeping the key behaviors fully verified.

Related issue

#58464 (comment)

Refactor test_download_expression.py to follow unit testing best practices:
- Remove assumptions about output ordering by explicitly sorting results
- Reduce test complexity by using minimal inputs that verify the behavior
- Improve code clarity by using from_items() instead of from_arrow()
- Remove redundant test that didn't add coverage beyond existing tests

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani requested a review from a team as a code owner November 10, 2025 23:43
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a great job of refactoring the download expression tests to be simpler, more robust, and easier to maintain. The changes, such as using ray.data.from_items for better readability, adding sorting to eliminate flakiness, and simplifying complex tests, are all positive improvements. I have a couple of minor suggestions to further improve the robustness of the tests by consistently using pytest's tmp_path fixture, which will help avoid potential filesystem permission issues in different environments.

],
names=["uri"],
{"uri": f"local://{valid_file}", "id": 0},
{"uri": "local:///nonexistent.txt", "id": 1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using an absolute path like local:///nonexistent.txt could lead to permission issues in restricted environments. It's better practice to use the tmp_path fixture for creating test file paths, even for non-existent files. This ensures that all file operations are contained within the temporary directory managed by pytest.

Suggested change
{"uri": "local:///nonexistent.txt", "id": 1},
{"uri": f"local://{tmp_path}/nonexistent.txt", "id": 1},

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@ray-gardener ray-gardener bot added the data Ray Data-related issues label Nov 11, 2025
@bveeramani bveeramani added the go add ONLY when ready to merge, run all tests label Nov 12, 2025
),
],
names=["uri"],
{"uri": str(valid_file), "id": 0},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why would you change the format of the uri from f"local://{valid_file}" to str(valid_file)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to work without local:// prefix in this case, but I assume it's good to follow the pattern of other tests in the same file


ds = ray.data.from_arrow(table)
# Create URIs that will fail size estimation (non-existent files).
ds = ray.data.from_items([{"uri": str(tmp_path / "nonexistent.txt")}])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

assert results[2]["bytes"] is None

def test_download_expression_all_size_estimations_fail(self):
def test_download_expression_all_size_estimations_fail(self, tmp_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the name of the test as we only test 1 row here

Update test name and annotation to explain the purpose of the test

Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
@xyuzh xyuzh changed the title [Data] Simplify download expression error handling tests [Data] Simplify and remove the ordering dependency of download expression error handling tests Nov 13, 2025
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
@xyuzh xyuzh self-assigned this Nov 13, 2025
xyuzh and others added 2 commits November 13, 2025 19:40
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
@robertnishihara robertnishihara merged commit a7926ae into master Nov 14, 2025
6 checks passed
@robertnishihara robertnishihara deleted the data-simplify-download-expression-tests branch November 14, 2025 06:05
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Nov 14, 2025
…sion error handling tests (ray-project#58518)

## Description

This PR refactors tests in `test_download_expression.py` to make them
easier to maintain and less prone to brittle failures. Some of the
previous tests were more complex than necessary and relied on
assumptions that could occasionally cause false negatives.

### Key updates:
* **Reduce flaky behavior**: Added explicit sorting by ID in
`test_download_expression_handles_failed_downloads` to avoid relying on
a specific output order, which isn’t guaranteed and could sometimes
cause intermittent failures.
* **Simplify test logic**: Reduced
`test_download_expression_failed_size_estimation` from 30 URIs to just
1. A single failing URI is sufficient to confirm that failed downloads
don’t trigger divide-by-zero errors, and this change makes the test
easier to understand and faster to run.
* **Improve readability**: Replaced `pa.Table.from_arrays()` with
`ray.data.from_items()`, which makes the test setup more straightforward
for future maintainers.
* **Remove redundancy**: Deleted
`test_download_expression_mixed_valid_and_invalid_size_estimation`,
since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster,
clearer, and more robust while keeping the key behaviors fully verified.

## Related issue

ray-project#58464 (comment)

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
ArturNiederfahrenhorst pushed a commit to ArturNiederfahrenhorst/ray that referenced this pull request Nov 16, 2025
…sion error handling tests (ray-project#58518)

## Description 

This PR refactors tests in `test_download_expression.py` to make them
easier to maintain and less prone to brittle failures. Some of the
previous tests were more complex than necessary and relied on
assumptions that could occasionally cause false negatives.

### Key updates:
* **Reduce flaky behavior**: Added explicit sorting by ID in
`test_download_expression_handles_failed_downloads` to avoid relying on
a specific output order, which isn’t guaranteed and could sometimes
cause intermittent failures.
* **Simplify test logic**: Reduced
`test_download_expression_failed_size_estimation` from 30 URIs to just
1. A single failing URI is sufficient to confirm that failed downloads
don’t trigger divide-by-zero errors, and this change makes the test
easier to understand and faster to run.
* **Improve readability**: Replaced `pa.Table.from_arrays()` with
`ray.data.from_items()`, which makes the test setup more straightforward
for future maintainers.
* **Remove redundancy**: Deleted
`test_download_expression_mixed_valid_and_invalid_size_estimation`,
since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster,
clearer, and more robust while keeping the key behaviors fully verified.

## Related issue

ray-project#58464 (comment)

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…sion error handling tests (ray-project#58518)

## Description

This PR refactors tests in `test_download_expression.py` to make them
easier to maintain and less prone to brittle failures. Some of the
previous tests were more complex than necessary and relied on
assumptions that could occasionally cause false negatives.

### Key updates:
* **Reduce flaky behavior**: Added explicit sorting by ID in
`test_download_expression_handles_failed_downloads` to avoid relying on
a specific output order, which isn’t guaranteed and could sometimes
cause intermittent failures.
* **Simplify test logic**: Reduced
`test_download_expression_failed_size_estimation` from 30 URIs to just
1. A single failing URI is sufficient to confirm that failed downloads
don’t trigger divide-by-zero errors, and this change makes the test
easier to understand and faster to run.
* **Improve readability**: Replaced `pa.Table.from_arrays()` with
`ray.data.from_items()`, which makes the test setup more straightforward
for future maintainers.
* **Remove redundancy**: Deleted
`test_download_expression_mixed_valid_and_invalid_size_estimation`,
since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster,
clearer, and more robust while keeping the key behaviors fully verified.

## Related issue

ray-project#58464 (comment)

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…sion error handling tests (ray-project#58518)

## Description

This PR refactors tests in `test_download_expression.py` to make them
easier to maintain and less prone to brittle failures. Some of the
previous tests were more complex than necessary and relied on
assumptions that could occasionally cause false negatives.

### Key updates:
* **Reduce flaky behavior**: Added explicit sorting by ID in
`test_download_expression_handles_failed_downloads` to avoid relying on
a specific output order, which isn’t guaranteed and could sometimes
cause intermittent failures.
* **Simplify test logic**: Reduced
`test_download_expression_failed_size_estimation` from 30 URIs to just
1. A single failing URI is sufficient to confirm that failed downloads
don’t trigger divide-by-zero errors, and this change makes the test
easier to understand and faster to run.
* **Improve readability**: Replaced `pa.Table.from_arrays()` with
`ray.data.from_items()`, which makes the test setup more straightforward
for future maintainers.
* **Remove redundancy**: Deleted
`test_download_expression_mixed_valid_and_invalid_size_estimation`,
since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster,
clearer, and more robust while keeping the key behaviors fully verified.

## Related issue

ray-project#58464 (comment)

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…sion error handling tests (ray-project#58518)

## Description 

This PR refactors tests in `test_download_expression.py` to make them
easier to maintain and less prone to brittle failures. Some of the
previous tests were more complex than necessary and relied on
assumptions that could occasionally cause false negatives.

### Key updates:
* **Reduce flaky behavior**: Added explicit sorting by ID in
`test_download_expression_handles_failed_downloads` to avoid relying on
a specific output order, which isn’t guaranteed and could sometimes
cause intermittent failures.
* **Simplify test logic**: Reduced
`test_download_expression_failed_size_estimation` from 30 URIs to just
1. A single failing URI is sufficient to confirm that failed downloads
don’t trigger divide-by-zero errors, and this change makes the test
easier to understand and faster to run.
* **Improve readability**: Replaced `pa.Table.from_arrays()` with
`ray.data.from_items()`, which makes the test setup more straightforward
for future maintainers.
* **Remove redundancy**: Deleted
`test_download_expression_mixed_valid_and_invalid_size_estimation`,
since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster,
clearer, and more robust while keeping the key behaviors fully verified.

## Related issue

ray-project#58464 (comment)

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants