Skip to content

[Data][Flaky] test_limit_pushdown_conservative fails due to non-deterministic task ordering #58561

@bveeramani

Description

@bveeramani

Test Name

test_limit_pushdown_conservative

Test Location

python/ray/data/tests/test_execution_optimizer_limit_pushdown.py:82

Issue Description

The test fails intermittently when checking limit pushdown behavior because it assumes a specific ordering of results that is not guaranteed by the Ray Data interface.

Root Cause

The test creates a dataset with override_num_blocks=100, applies operations, limits to 5 rows, and expects the result to be [{"id": 0}, {"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]. However, when tasks finish out of order, the actual result can be [{"id": 0}, {"id": 2}, {"id": 3}, {"id": 4}, {"id": 1}], causing the assertion to fail.

The issue is that take_all() returns rows in the order they are produced by the distributed execution, which depends on task scheduling and completion timing.

Example Failure

AssertionError: assert [{'id': 0}, {'id': 2}, {'id': 3}, {'id': 4}, {'id': 1}] == [{'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]
  At index 1 diff: {'id': 2} != {'id': 1}

Proposed Fix

Rewrite the assertion in a way that avoids the assumption that results are returned in a specific order. This affects multiple test cases in the function that use _check_valid_plan_and_result().

Additional Context

This issue occurs because the test assumes deterministic ordering from distributed operations, which can vary based on task scheduling and completion timing.

Metadata

Metadata

Assignees

Labels

P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tdataRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to Ray

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions