Skip to content

Commit 1e136b7

Browse files
bveeramanixyuzhrobertnishihara
authored andcommitted
[Data] Simplify and remove the ordering dependency of download expression error handling tests (ray-project#58518)
## Description This PR refactors tests in `test_download_expression.py` to make them easier to maintain and less prone to brittle failures. Some of the previous tests were more complex than necessary and relied on assumptions that could occasionally cause false negatives. ### Key updates: * **Reduce flaky behavior**: Added explicit sorting by ID in `test_download_expression_handles_failed_downloads` to avoid relying on a specific output order, which isn’t guaranteed and could sometimes cause intermittent failures. * **Simplify test logic**: Reduced `test_download_expression_failed_size_estimation` from 30 URIs to just 1. A single failing URI is sufficient to confirm that failed downloads don’t trigger divide-by-zero errors, and this change makes the test easier to understand and faster to run. * **Improve readability**: Replaced `pa.Table.from_arrays()` with `ray.data.from_items()`, which makes the test setup more straightforward for future maintainers. * **Remove redundancy**: Deleted `test_download_expression_mixed_valid_and_invalid_size_estimation`, since its behavior is already covered by the other tests. Overall, these updates streamline the test suite, making it faster, clearer, and more robust while keeping the key behaviors fully verified. ## Related issue ray-project#58464 (comment) --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Signed-off-by: Robert Nishihara <robertnishihara@gmail.com> Co-authored-by: Xinyu Zhang <60529799+xyuzh@users.noreply.github.com> Co-authored-by: Robert Nishihara <robertnishihara@gmail.com> Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
1 parent 80eb240 commit 1e136b7

File tree

1 file changed

+21
-66
lines changed

1 file changed

+21
-66
lines changed

python/ray/data/tests/test_download_expression.py

Lines changed: 21 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -299,6 +299,9 @@ def test_download_expression_with_malformed_uris(self, tmp_path):
299299
300300
This tests that various malformed URIs are caught and return None
301301
instead of crashing.
302+
303+
All of the URIs should be malformed in order to test the ZeroDivisionError
304+
described in https://github.com/ray-project/ray/issues/58462.
302305
"""
303306
malformed_uris = [
304307
f"local://{tmp_path}/nonexistent.txt", # File doesn't exist
@@ -324,78 +327,30 @@ def test_download_expression_with_malformed_uris(self, tmp_path):
324327
for result in results:
325328
assert result["bytes"] is None
326329

327-
def test_download_expression_all_size_estimations_fail(self):
328-
"""Test download expression when all URI size estimations fail.
329-
330-
This tests the failed download does not cause division by zero error.
331-
"""
332-
# Create URIs that will fail size estimation (non-existent files)
333-
# Using enough URIs to trigger size estimation sampling
334-
invalid_uris = [
335-
f"local:///nonexistent/path/file_{i}.txt"
336-
for i in range(30) # More than INIT_SAMPLE_BATCH_SIZE (25)
337-
]
330+
def test_download_expression_mixed_valid_and_invalid_uris(self, tmp_path):
331+
"""Test download expression when some but not all of the URIs are invalid."""
332+
# Create one valid file
333+
valid_file = tmp_path / "valid.txt"
334+
valid_file.write_bytes(b"valid content")
338335

339-
table = pa.Table.from_arrays(
340-
[pa.array(invalid_uris)],
341-
names=["uri"],
336+
# Create URIs: one valid and one non-existent file.
337+
ds = ray.data.from_items(
338+
[
339+
{"uri": str(valid_file), "id": 0},
340+
{"uri": str(tmp_path / "nonexistent.txt"), "id": 1},
341+
]
342342
)
343-
344-
ds = ray.data.from_arrow(table)
345343
ds_with_downloads = ds.with_column("bytes", download("uri"))
346344

347-
# Should not crash with divide-by-zero error
348-
# The PartitionActor should handle all failed size estimations gracefully
349-
# and fall back to using the number of rows in the block as partition size
350-
results = ds_with_downloads.take_all()
351-
352-
# All downloads should fail gracefully (return None)
353-
assert len(results) == 30
354-
for result in results:
355-
assert result["bytes"] is None
356-
357-
def test_download_expression_mixed_valid_and_invalid_size_estimation(
358-
self, tmp_path
359-
):
360-
"""Test download expression with mix of valid and invalid URIs for size estimation.
361-
362-
This tests that size estimation handles partial failures correctly.
363-
"""
364-
# Create some valid files
365-
valid_files = []
366-
for i in range(10):
367-
file_path = tmp_path / f"valid_{i}.txt"
368-
file_path.write_bytes(b"x" * 100) # 100 bytes each
369-
valid_files.append(str(file_path))
370-
371-
# Mix valid and invalid URIs
372-
mixed_uris = []
373-
for i in range(30):
374-
if i % 3 == 0 and i // 3 < len(valid_files):
375-
# Every 3rd URI is valid (for first 10)
376-
mixed_uris.append(f"local://{valid_files[i // 3]}")
377-
else:
378-
# Others are invalid
379-
mixed_uris.append(f"local:///nonexistent/file_{i}.txt")
345+
# Should not crash - failed downloads return None
346+
results = sorted(ds_with_downloads.take_all(), key=lambda row: row["id"])
347+
assert len(results) == 2
380348

381-
table = pa.Table.from_arrays(
382-
[pa.array(mixed_uris)],
383-
names=["uri"],
384-
)
349+
# First URI should succeed
350+
assert results[0]["bytes"] == b"valid content"
385351

386-
ds = ray.data.from_arrow(table)
387-
ds_with_downloads = ds.with_column("bytes", download("uri"))
388-
389-
# Should not crash - should handle mixed valid/invalid gracefully
390-
results = ds_with_downloads.take_all()
391-
assert len(results) == 30
392-
393-
# Verify valid URIs downloaded successfully
394-
for i, result in enumerate(results):
395-
if i % 3 == 0 and i // 3 < len(valid_files):
396-
assert result["bytes"] == b"x" * 100
397-
else:
398-
assert result["bytes"] is None
352+
# Second URI should fail gracefully (return None)
353+
assert results[1]["bytes"] is None
399354

400355

401356
class TestDownloadExpressionIntegration:

0 commit comments

Comments
 (0)