Skip to content

[Data][Flaky] test_hanging_detector_detects_issues intermittently fails to detect hanging #58562

@bveeramani

Description

@bveeramani

Test Name

TestHangingExecutionIssueDetector.test_hanging_detector_detects_issues

Test Location

python/ray/data/tests/test_issue_detection.py:136

Issue Description

The test intermittently fails to detect the intentionally created hanging execution, causing the assertion to fail when checking for the expected warning message in the logs.

Root Cause

The test creates a pipeline where one task sleeps for 1 second while others complete quickly, and configures the hanging detector with aggressive settings to detect this. However, the detection is based on timing and statistical thresholds (mean + std), which can be sensitive to:

  1. System timing variations
  2. Task scheduling delays
  3. The exact timing of when the detector runs
  4. Resource contention on CI machines

The hanging detector may not fire if:

  • Tasks complete too quickly relative to detection intervals
  • The statistical threshold isn't met due to timing variations
  • The detection interval misses the hanging window

Example Failure

AssertionError: <log output without expected hanging detection messages>

assert False

The test expects to find:

  • "has been running for"
  • "longer than the average task duration"

But these messages don't appear in the captured logs.

Proposed Fix

Make the test more robust to timing variations. Consider whether this functionality needs integration testing or if it could be better tested with unit tests using mocked time.

Additional Context

This test is checking infrastructure for detecting hanging executions, which is inherently timing-dependent. The test may need to be marked as a known flaky test or potentially redesigned to be more deterministic.

Metadata

Metadata

Assignees

Labels

P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tdataRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to Ray

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions