-
Notifications
You must be signed in to change notification settings - Fork 7k
[Data] Simplify and remove the ordering dependency of download expression error handling tests #58518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Simplify and remove the ordering dependency of download expression error handling tests #58518
Conversation
Refactor test_download_expression.py to follow unit testing best practices: - Remove assumptions about output ordering by explicitly sorting results - Reduce test complexity by using minimal inputs that verify the behavior - Improve code clarity by using from_items() instead of from_arrow() - Remove redundant test that didn't add coverage beyond existing tests Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request does a great job of refactoring the download expression tests to be simpler, more robust, and easier to maintain. The changes, such as using ray.data.from_items for better readability, adding sorting to eliminate flakiness, and simplifying complex tests, are all positive improvements. I have a couple of minor suggestions to further improve the robustness of the tests by consistently using pytest's tmp_path fixture, which will help avoid potential filesystem permission issues in different environments.
| ], | ||
| names=["uri"], | ||
| {"uri": f"local://{valid_file}", "id": 0}, | ||
| {"uri": "local:///nonexistent.txt", "id": 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an absolute path like local:///nonexistent.txt could lead to permission issues in restricted environments. It's better practice to use the tmp_path fixture for creating test file paths, even for non-existent files. This ensures that all file operations are contained within the temporary directory managed by pytest.
| {"uri": "local:///nonexistent.txt", "id": 1}, | |
| {"uri": f"local://{tmp_path}/nonexistent.txt", "id": 1}, |
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
| ), | ||
| ], | ||
| names=["uri"], | ||
| {"uri": str(valid_file), "id": 0}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why would you change the format of the uri from f"local://{valid_file}" to str(valid_file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok to work without local:// prefix in this case, but I assume it's good to follow the pattern of other tests in the same file
|
|
||
| ds = ray.data.from_arrow(table) | ||
| # Create URIs that will fail size estimation (non-existent files). | ||
| ds = ray.data.from_items([{"uri": str(tmp_path / "nonexistent.txt")}]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| assert results[2]["bytes"] is None | ||
|
|
||
| def test_download_expression_all_size_estimations_fail(self): | ||
| def test_download_expression_all_size_estimations_fail(self, tmp_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the name of the test as we only test 1 row here
Update test name and annotation to explain the purpose of the test Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
…sion error handling tests (ray-project#58518) ## Description This PR refactors tests in `test_download_expression.py` to make them easier to maintain and less prone to brittle failures. Some of the previous tests were more complex than necessary and relied on assumptions that could occasionally cause false negatives. ### Key updates: * **Reduce flaky behavior**: Added explicit sorting by ID in `test_download_expression_handles_failed_downloads` to avoid relying on a specific output order, which isn’t guaranteed and could sometimes cause intermittent failures. * **Simplify test logic**: Reduced `test_download_expression_failed_size_estimation` from 30 URIs to just 1. A single failing URI is sufficient to confirm that failed downloads don’t trigger divide-by-zero errors, and this change makes the test easier to understand and faster to run. * **Improve readability**: Replaced `pa.Table.from_arrays()` with `ray.data.from_items()`, which makes the test setup more straightforward for future maintainers. * **Remove redundancy**: Deleted `test_download_expression_mixed_valid_and_invalid_size_estimation`, since its behavior is already covered by the other tests. Overall, these updates streamline the test suite, making it faster, clearer, and more robust while keeping the key behaviors fully verified. ## Related issue ray-project#58464 (comment) --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Signed-off-by: Robert Nishihara <robertnishihara@gmail.com> Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Co-authored-by: Robert Nishihara <robertnishihara@gmail.com> Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
…sion error handling tests (ray-project#58518) ## Description This PR refactors tests in `test_download_expression.py` to make them easier to maintain and less prone to brittle failures. Some of the previous tests were more complex than necessary and relied on assumptions that could occasionally cause false negatives. ### Key updates: * **Reduce flaky behavior**: Added explicit sorting by ID in `test_download_expression_handles_failed_downloads` to avoid relying on a specific output order, which isn’t guaranteed and could sometimes cause intermittent failures. * **Simplify test logic**: Reduced `test_download_expression_failed_size_estimation` from 30 URIs to just 1. A single failing URI is sufficient to confirm that failed downloads don’t trigger divide-by-zero errors, and this change makes the test easier to understand and faster to run. * **Improve readability**: Replaced `pa.Table.from_arrays()` with `ray.data.from_items()`, which makes the test setup more straightforward for future maintainers. * **Remove redundancy**: Deleted `test_download_expression_mixed_valid_and_invalid_size_estimation`, since its behavior is already covered by the other tests. Overall, these updates streamline the test suite, making it faster, clearer, and more robust while keeping the key behaviors fully verified. ## Related issue ray-project#58464 (comment) --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Signed-off-by: Robert Nishihara <robertnishihara@gmail.com> Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
…sion error handling tests (ray-project#58518) ## Description This PR refactors tests in `test_download_expression.py` to make them easier to maintain and less prone to brittle failures. Some of the previous tests were more complex than necessary and relied on assumptions that could occasionally cause false negatives. ### Key updates: * **Reduce flaky behavior**: Added explicit sorting by ID in `test_download_expression_handles_failed_downloads` to avoid relying on a specific output order, which isn’t guaranteed and could sometimes cause intermittent failures. * **Simplify test logic**: Reduced `test_download_expression_failed_size_estimation` from 30 URIs to just 1. A single failing URI is sufficient to confirm that failed downloads don’t trigger divide-by-zero errors, and this change makes the test easier to understand and faster to run. * **Improve readability**: Replaced `pa.Table.from_arrays()` with `ray.data.from_items()`, which makes the test setup more straightforward for future maintainers. * **Remove redundancy**: Deleted `test_download_expression_mixed_valid_and_invalid_size_estimation`, since its behavior is already covered by the other tests. Overall, these updates streamline the test suite, making it faster, clearer, and more robust while keeping the key behaviors fully verified. ## Related issue ray-project#58464 (comment) --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Signed-off-by: Robert Nishihara <robertnishihara@gmail.com> Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Co-authored-by: Robert Nishihara <robertnishihara@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…sion error handling tests (ray-project#58518) ## Description This PR refactors tests in `test_download_expression.py` to make them easier to maintain and less prone to brittle failures. Some of the previous tests were more complex than necessary and relied on assumptions that could occasionally cause false negatives. ### Key updates: * **Reduce flaky behavior**: Added explicit sorting by ID in `test_download_expression_handles_failed_downloads` to avoid relying on a specific output order, which isn’t guaranteed and could sometimes cause intermittent failures. * **Simplify test logic**: Reduced `test_download_expression_failed_size_estimation` from 30 URIs to just 1. A single failing URI is sufficient to confirm that failed downloads don’t trigger divide-by-zero errors, and this change makes the test easier to understand and faster to run. * **Improve readability**: Replaced `pa.Table.from_arrays()` with `ray.data.from_items()`, which makes the test setup more straightforward for future maintainers. * **Remove redundancy**: Deleted `test_download_expression_mixed_valid_and_invalid_size_estimation`, since its behavior is already covered by the other tests. Overall, these updates streamline the test suite, making it faster, clearer, and more robust while keeping the key behaviors fully verified. ## Related issue ray-project#58464 (comment) --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Signed-off-by: Robert Nishihara <robertnishihara@gmail.com> Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Co-authored-by: Robert Nishihara <robertnishihara@gmail.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
…sion error handling tests (ray-project#58518) ## Description This PR refactors tests in `test_download_expression.py` to make them easier to maintain and less prone to brittle failures. Some of the previous tests were more complex than necessary and relied on assumptions that could occasionally cause false negatives. ### Key updates: * **Reduce flaky behavior**: Added explicit sorting by ID in `test_download_expression_handles_failed_downloads` to avoid relying on a specific output order, which isn’t guaranteed and could sometimes cause intermittent failures. * **Simplify test logic**: Reduced `test_download_expression_failed_size_estimation` from 30 URIs to just 1. A single failing URI is sufficient to confirm that failed downloads don’t trigger divide-by-zero errors, and this change makes the test easier to understand and faster to run. * **Improve readability**: Replaced `pa.Table.from_arrays()` with `ray.data.from_items()`, which makes the test setup more straightforward for future maintainers. * **Remove redundancy**: Deleted `test_download_expression_mixed_valid_and_invalid_size_estimation`, since its behavior is already covered by the other tests. Overall, these updates streamline the test suite, making it faster, clearer, and more robust while keeping the key behaviors fully verified. ## Related issue ray-project#58464 (comment) --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Signed-off-by: Robert Nishihara <robertnishihara@gmail.com> Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
Description
This PR refactors tests in
test_download_expression.pyto make them easier to maintain and less prone to brittle failures. Some of the previous tests were more complex than necessary and relied on assumptions that could occasionally cause false negatives.Key updates:
test_download_expression_handles_failed_downloadsto avoid relying on a specific output order, which isn’t guaranteed and could sometimes cause intermittent failures.test_download_expression_failed_size_estimationfrom 30 URIs to just 1. A single failing URI is sufficient to confirm that failed downloads don’t trigger divide-by-zero errors, and this change makes the test easier to understand and faster to run.pa.Table.from_arrays()withray.data.from_items(), which makes the test setup more straightforward for future maintainers.test_download_expression_mixed_valid_and_invalid_size_estimation, since its behavior is already covered by the other tests.Overall, these updates streamline the test suite, making it faster, clearer, and more robust while keeping the key behaviors fully verified.
Related issue
#58464 (comment)