[Data] Simplify and remove the ordering dependency of download expression error handling tests #58518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

robertnishihara merged 6 commits into master from data-simplify-download-expression-tests

Nov 14, 2025

+21 −66

Member

bveeramani commented Nov 10, 2025 •

edited

Loading

Description

This PR refactors tests in test_download_expression.py to make them easier to maintain and less prone to brittle failures. Some of the previous tests were more complex than necessary and relied on assumptions that could occasionally cause false negatives.

Key updates:

Reduce flaky behavior: Added explicit sorting by ID in test_download_expression_handles_failed_downloads to avoid relying on a specific output order, which isn’t guaranteed and could sometimes cause intermittent failures.
Simplify test logic: Reduced test_download_expression_failed_size_estimation from 30 URIs to just 1. A single failing URI is sufficient to confirm that failed downloads don’t trigger divide-by-zero errors, and this change makes the test easier to understand and faster to run.
Improve readability: Replaced pa.Table.from_arrays() with ray.data.from_items(), which makes the test setup more straightforward for future maintainers.
Remove redundancy: Deleted test_download_expression_mixed_valid_and_invalid_size_estimation, since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster, clearer, and more robust while keeping the key behaviors fully verified.

Related issue

#58464 (comment)


          [Data] Simplify download expression error handling tests

1f9c443

Refactor test_download_expression.py to follow unit testing best practices:
- Remove assumptions about output ordering by explicitly sorting results
- Reduce test complexity by using minimal inputs that verify the behavior
- Improve code clarity by using from_items() instead of from_arrow()
- Remove redundant test that didn't add coverage beyond existing tests

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

bveeramani requested a review from a team as a code owner

November 10, 2025 23:43

gemini-code-assist bot reviewed

View reviewed changes

Contributor

gemini-code-assist bot left a comment

Code Review

This pull request does a great job of refactoring the download expression tests to be simpler, more robust, and easier to maintain. The changes, such as using ray.data.from_items for better readability, adding sorting to eliminate flakiness, and simplifying complex tests, are all positive improvements. I have a couple of minor suggestions to further improve the robustness of the tests by consistently using pytest's tmp_path fixture, which will help avoid potential filesystem permission issues in different environments.

python/ray/data/tests/test_download_expression.py Outdated

    
                          ],

                          names=["uri"],

                              {"uri": f"local://{valid_file}", "id": 0},

                              {"uri": "local:///nonexistent.txt", "id": 1},

Contributor

gemini-code-assist bot Nov 10, 2025

Using an absolute path like local:///nonexistent.txt could lead to permission issues in restricted environments. It's better practice to use the tmp_path fixture for creating test file paths, even for non-existent files. This ensures that all file operations are contained within the temporary directory managed by pytest.

Suggested change

      
                            {"uri": "local:///nonexistent.txt", "id": 1},
          
                            {"uri": f"local://{tmp_path}/nonexistent.txt", "id": 1},

python/ray/data/tests/test_download_expression.py Outdated Show resolved Hide resolved

bveeramani mentioned this pull request

[Data] Add exception handling for invalid URIs in download operation #58464

Merged


          Addressr eview comments

f7a6427

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

ray-gardener bot added the data label

bveeramani added the go label

xyuzh reviewed

View reviewed changes

python/ray/data/tests/test_download_expression.py

    
                              ),

                          ],

                          names=["uri"],

                              {"uri": str(valid_file), "id": 0},

Contributor

xyuzh Nov 11, 2025

Not sure why would you change the format of the uri from f"local://{valid_file}" to str(valid_file)

Contributor

xyuzh Nov 12, 2025

It's ok to work without local:// prefix in this case, but I assume it's good to follow the pattern of other tests in the same file

python/ray/data/tests/test_download_expression.py Outdated

    
                      ds = ray.data.from_arrow(table)

                      # Create URIs that will fail size estimation (non-existent files).

                      ds = ray.data.from_items([{"uri": str(tmp_path / "nonexistent.txt")}])

Contributor

xyuzh Nov 11, 2025

ditto

xyuzh reviewed

View reviewed changes

python/ray/data/tests/test_download_expression.py Outdated

    
                      assert results[2]["bytes"] is None

                  def test_download_expression_all_size_estimations_fail(self):

                  def test_download_expression_all_size_estimations_fail(self, tmp_path):

Contributor

xyuzh Nov 12, 2025

change the name of the test as we only test 1 row here


          Rename test to reflect handling of zero valid uri

b92a382

Update test name and annotation to explain the purpose of the test

Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>

xyuzh changed the title ~~[Data] Simplify download expression error handling tests~~ [Data] Simplify and remove the ordering dependency of download expression error handling tests


          Merge branch 'master' into data-simplify-download-expression-tests

231de1b

Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>

xyuzh self-assigned this

xyuzh and others added 2 commits

November 13, 2025 19:40


          Add test case for graceful failure in download expression

03bfb37

Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>


          Update test_download_expression.py

e50ef96

Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>

robertnishihara approved these changes

View reviewed changes

robertnishihara merged commit a7926ae into master

6 checks passed

robertnishihara deleted the data-simplify-download-expression-tests branch

November 14, 2025 06:05

justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request


          [Data] Simplify and remove the ordering dependency of download expres…

1e136b7

…sion error handling tests (ray-project#58518)

## Description

This PR refactors tests in `test_download_expression.py` to make them
easier to maintain and less prone to brittle failures. Some of the
previous tests were more complex than necessary and relied on
assumptions that could occasionally cause false negatives.

### Key updates:
* **Reduce flaky behavior**: Added explicit sorting by ID in
`test_download_expression_handles_failed_downloads` to avoid relying on
a specific output order, which isn’t guaranteed and could sometimes
cause intermittent failures.
* **Simplify test logic**: Reduced
`test_download_expression_failed_size_estimation` from 30 URIs to just
1. A single failing URI is sufficient to confirm that failed downloads
don’t trigger divide-by-zero errors, and this change makes the test
easier to understand and faster to run.
* **Improve readability**: Replaced `pa.Table.from_arrays()` with
`ray.data.from_items()`, which makes the test setup more straightforward
for future maintainers.
* **Remove redundancy**: Deleted
`test_download_expression_mixed_valid_and_invalid_size_estimation`,
since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster,
clearer, and more robust while keeping the key behaviors fully verified.

## Related issue

ray-project#58464 (comment)

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

ArturNiederfahrenhorst pushed a commit to ArturNiederfahrenhorst/ray that referenced this pull request


          [Data] Simplify and remove the ordering dependency of download expres…

55f9d17

…sion error handling tests (ray-project#58518)

## Description 

This PR refactors tests in `test_download_expression.py` to make them
easier to maintain and less prone to brittle failures. Some of the
previous tests were more complex than necessary and relied on
assumptions that could occasionally cause false negatives.

### Key updates:
* **Reduce flaky behavior**: Added explicit sorting by ID in
`test_download_expression_handles_failed_downloads` to avoid relying on
a specific output order, which isn’t guaranteed and could sometimes
cause intermittent failures.
* **Simplify test logic**: Reduced
`test_download_expression_failed_size_estimation` from 30 URIs to just
1. A single failing URI is sufficient to confirm that failed downloads
don’t trigger divide-by-zero errors, and this change makes the test
easier to understand and faster to run.
* **Improve readability**: Replaced `pa.Table.from_arrays()` with
`ray.data.from_items()`, which makes the test setup more straightforward
for future maintainers.
* **Remove redundancy**: Deleted
`test_download_expression_mixed_valid_and_invalid_size_estimation`,
since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster,
clearer, and more robust while keeping the key behaviors fully verified.

## Related issue

ray-project#58464 (comment)

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>

Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request


          [Data] Simplify and remove the ordering dependency of download expres…

87356e3

…sion error handling tests (ray-project#58518)

## Description

This PR refactors tests in `test_download_expression.py` to make them
easier to maintain and less prone to brittle failures. Some of the
previous tests were more complex than necessary and relied on
assumptions that could occasionally cause false negatives.

### Key updates:
* **Reduce flaky behavior**: Added explicit sorting by ID in
`test_download_expression_handles_failed_downloads` to avoid relying on
a specific output order, which isn’t guaranteed and could sometimes
cause intermittent failures.
* **Simplify test logic**: Reduced
`test_download_expression_failed_size_estimation` from 30 URIs to just
1. A single failing URI is sufficient to confirm that failed downloads
don’t trigger divide-by-zero errors, and this change makes the test
easier to understand and faster to run.
* **Improve readability**: Replaced `pa.Table.from_arrays()` with
`ray.data.from_items()`, which makes the test setup more straightforward
for future maintainers.
* **Remove redundancy**: Deleted
`test_download_expression_mixed_valid_and_invalid_size_estimation`,
since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster,
clearer, and more robust while keeping the key behaviors fully verified.

## Related issue

ray-project#58464 (comment)

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>

ykdojo pushed a commit to ykdojo/ray that referenced this pull request


          [Data] Simplify and remove the ordering dependency of download expres…

4d30028

…sion error handling tests (ray-project#58518)

## Description

This PR refactors tests in `test_download_expression.py` to make them
easier to maintain and less prone to brittle failures. Some of the
previous tests were more complex than necessary and relied on
assumptions that could occasionally cause false negatives.

### Key updates:
* **Reduce flaky behavior**: Added explicit sorting by ID in
`test_download_expression_handles_failed_downloads` to avoid relying on
a specific output order, which isn’t guaranteed and could sometimes
cause intermittent failures.
* **Simplify test logic**: Reduced
`test_download_expression_failed_size_estimation` from 30 URIs to just
1. A single failing URI is sufficient to confirm that failed downloads
don’t trigger divide-by-zero errors, and this change makes the test
easier to understand and faster to run.
* **Improve readability**: Replaced `pa.Table.from_arrays()` with
`ray.data.from_items()`, which makes the test setup more straightforward
for future maintainers.
* **Remove redundancy**: Deleted
`test_download_expression_mixed_valid_and_invalid_size_estimation`,
since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster,
clearer, and more robust while keeping the key behaviors fully verified.

## Related issue

ray-project#58464 (comment)

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>

SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request


          [Data] Simplify and remove the ordering dependency of download expres…

8f505aa

…sion error handling tests (ray-project#58518)

## Description 

This PR refactors tests in `test_download_expression.py` to make them
easier to maintain and less prone to brittle failures. Some of the
previous tests were more complex than necessary and relied on
assumptions that could occasionally cause false negatives.

### Key updates:
* **Reduce flaky behavior**: Added explicit sorting by ID in
`test_download_expression_handles_failed_downloads` to avoid relying on
a specific output order, which isn’t guaranteed and could sometimes
cause intermittent failures.
* **Simplify test logic**: Reduced
`test_download_expression_failed_size_estimation` from 30 URIs to just
1. A single failing URI is sufficient to confirm that failed downloads
don’t trigger divide-by-zero errors, and this change makes the test
easier to understand and faster to run.
* **Improve readability**: Replaced `pa.Table.from_arrays()` with
`ray.data.from_items()`, which makes the test setup more straightforward
for future maintainers.
* **Remove redundancy**: Deleted
`test_download_expression_mixed_valid_and_invalid_size_estimation`,
since its behavior is already covered by the other tests.

Overall, these updates streamline the test suite, making it faster,
clearer, and more robust while keeping the key behaviors fully verified.

## Related issue

ray-project#58464 (comment)

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data go