-
Notifications
You must be signed in to change notification settings - Fork 7k
Description
Test Name
test_limit_pushdown_conservative
Test Location
python/ray/data/tests/test_execution_optimizer_limit_pushdown.py:82
Issue Description
The test fails intermittently when checking limit pushdown behavior because it assumes a specific ordering of results that is not guaranteed by the Ray Data interface.
Root Cause
The test creates a dataset with override_num_blocks=100, applies operations, limits to 5 rows, and expects the result to be [{"id": 0}, {"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]. However, when tasks finish out of order, the actual result can be [{"id": 0}, {"id": 2}, {"id": 3}, {"id": 4}, {"id": 1}], causing the assertion to fail.
The issue is that take_all() returns rows in the order they are produced by the distributed execution, which depends on task scheduling and completion timing.
Example Failure
AssertionError: assert [{'id': 0}, {'id': 2}, {'id': 3}, {'id': 4}, {'id': 1}] == [{'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]
At index 1 diff: {'id': 2} != {'id': 1}
Proposed Fix
Rewrite the assertion in a way that avoids the assumption that results are returned in a specific order. This affects multiple test cases in the function that use _check_valid_plan_and_result().
Additional Context
This issue occurs because the test assumes deterministic ordering from distributed operations, which can vary based on task scheduling and completion timing.