[data] Adding in test for issue detection. #58292

omatthew98 · 2025-10-29T22:27:04Z

Description

Adding missing test for issue detection

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Matthew Owen <mowen@anyscale.com>

gemini-code-assist

Code Review

This pull request adds a test file for issue detection that was missed during a previous upstreaming. The new tests cover hanging execution and high memory usage scenarios. I've provided a few suggestions to improve the test code's robustness and readability, such as ensuring logger cleanup, removing unused fixtures, and replacing magic numbers with constants.

gemini-code-assist · 2025-10-29T22:33:28Z

python/ray/data/tests/test_issue_detection.py

+        # Set up logging capture
+        log_capture = io.StringIO()
+        handler = logging.StreamHandler(log_capture)
+        logger = logging.getLogger("ray.data._internal.issue_detection")
+        logger.addHandler(handler)
+
+        # Set up mock stats to return values that will trigger adaptive threshold
+        mocked_mean = 2.0  # Increase from 0.5 to 2.0 seconds
+        mocked_stddev = 0.2  # Increase from 0.05 to 0.2 seconds
+        mock_stats = mock_stats_cls.return_value
+        mock_stats.count.return_value = 20  # Enough samples
+        mock_stats.mean.return_value = mocked_mean
+        mock_stats.stddev.return_value = mocked_stddev
+
+        # Set a short issue detection interval for testing
+        ctx = DataContext.get_current()
+        detector_cfg = ctx.issue_detectors_config.hanging_detector_config
+        detector_cfg.detection_time_interval_s = 0.00
+
+        # test no hanging doesn't log hanging warning
+        def f1(x):
+            return x
+
+        _ = ray.data.range(1).map(f1).materialize()
+
+        log_output = log_capture.getvalue()
+        warn_msg = (
+            r"A task of operator .+ with task index .+ has been running for [\d\.]+s"
+        )
+        assert re.search(warn_msg, log_output) is None, log_output
+
+        # # test hanging does log hanging warning
+        def f2(x):
+            time.sleep(5.0)  # Increase from 1.1 to 5.0 seconds to exceed new threshold
+            return x
+
+        _ = ray.data.range(1).map(f2).materialize()
+
+        log_output = log_capture.getvalue()
+        assert re.search(warn_msg, log_output) is not None, log_output


The logging.StreamHandler added to the logger is not removed after the test, which can leak the handler and cause side effects in subsequent tests (e.g., duplicate log messages). It's best practice to ensure cleanup. Using a try...finally block guarantees that logger.removeHandler(handler) is always called, making the test more robust.

# Set up logging capture log_capture = io.StringIO() handler = logging.StreamHandler(log_capture) logger = logging.getLogger("ray.data._internal.issue_detection") logger.addHandler(handler) try: # Set up mock stats to return values that will trigger adaptive threshold mocked_mean = 2.0 # Increase from 0.5 to 2.0 seconds mocked_stddev = 0.2 # Increase from 0.05 to 0.2 seconds mock_stats = mock_stats_cls.return_value mock_stats.count.return_value = 20 # Enough samples mock_stats.mean.return_value = mocked_mean mock_stats.stddev.return_value = mocked_stddev # Set a short issue detection interval for testing ctx = DataContext.get_current() detector_cfg = ctx.issue_detectors_config.hanging_detector_config detector_cfg.detection_time_interval_s = 0.00 # test no hanging doesn't log hanging warning def f1(x): return x _ = ray.data.range(1).map(f1).materialize() log_output = log_capture.getvalue() warn_msg = ( r"A task of operator .+ with task index .+ has been running for [\d\.]+s" ) assert re.search(warn_msg, log_output) is None, log_output # # test hanging does log hanging warning def f2(x): time.sleep(5.0) # Increase from 1.1 to 5.0 seconds to exceed new threshold return x _ = ray.data.range(1).map(f2).materialize() log_output = log_capture.getvalue() assert re.search(warn_msg, log_output) is not None, log_output finally: logger.removeHandler(handler)

gemini-code-assist · 2025-10-29T22:33:29Z

python/ray/data/tests/test_issue_detection.py

+    def test_realistic_hanging_detection(
+        self,
+        ray_start_10_cpus,
+        caplog,
+        propagate_logs,
+        restore_data_context,
+    ):


The caplog and propagate_logs fixtures are included in the test function signature but are not used. They should be removed to keep the code clean.

def test_realistic_hanging_detection( self, ray_start_10_cpus, restore_data_context, ):

gemini-code-assist · 2025-10-29T22:33:29Z

python/ray/data/tests/test_issue_detection.py

+@pytest.mark.parametrize(
+    "configured_memory, actual_memory, should_return_issue",
+    [
+        # User has appropriately configured memory, so no issue.
+        (4 * 1024**3, 4 * 1024**3, False),
+        # User hasn't configured memory correctly and memory use is high, so issue.
+        (None, 4 * 1024**3, True),
+        (1, 4 * 1024**3, True),
+        # User hasn't configured memory correctly but memory use is low, so no issue.
+        (None, 4 * 1024**3 - 1, False),
+    ],
+)


The magic number 4 * 1024**3 is used multiple times in the test parametrization. This value corresponds to _MEMORY_PER_CORE_ESTIMATE in HighMemoryIssueDetector. To improve readability and maintainability, it's better to define a constant for this value and use it in the parametrize decorator.

_MEMORY_PER_CORE_ESTIMATE = 4 * 1024**3 @pytest.mark.parametrize( "configured_memory, actual_memory, should_return_issue", [ # User has appropriately configured memory, so no issue. (_MEMORY_PER_CORE_ESTIMATE, _MEMORY_PER_CORE_ESTIMATE, False), # User hasn't configured memory correctly and memory use is high, so issue. (None, _MEMORY_PER_CORE_ESTIMATE, True), (1, _MEMORY_PER_CORE_ESTIMATE, True), # User hasn't configured memory correctly but memory use is low, so no issue. (None, _MEMORY_PER_CORE_ESTIMATE - 1, False), ], )

Signed-off-by: Matthew Owen <mowen@anyscale.com>

## Description Adding missing test for issue detection ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Matthew Owen <mowen@anyscale.com>

## Description Adding missing test for issue detection ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Matthew Owen <mowen@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

## Description Adding missing test for issue detection ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Matthew Owen <mowen@anyscale.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>

## Description Adding missing test for issue detection ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Matthew Owen <mowen@anyscale.com>

adding in test from downsteam

fdf1c6d

Signed-off-by: Matthew Owen <mowen@anyscale.com>

omatthew98 requested a review from a team as a code owner October 29, 2025 22:27

omatthew98 requested a review from goutamvenkat-anyscale October 29, 2025 22:27

omatthew98 added the go add ONLY when ready to merge, run all tests label Oct 29, 2025

This comment was marked as outdated.

Sign in to view

gemini-code-assist bot reviewed Oct 29, 2025

View reviewed changes

goutamvenkat-anyscale changed the title ~~[data] Adding in test from downstream for issue detection.~~ [data] Adding in test for issue detection. Oct 29, 2025

goutamvenkat-anyscale approved these changes Oct 29, 2025

View reviewed changes

ray-gardener bot added the data Ray Data-related issues label Oct 30, 2025

remove redundant test, move to better location

d6537ff

Signed-off-by: Matthew Owen <mowen@anyscale.com>

bveeramani merged commit c6ecc92 into ray-project:master Nov 6, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[data] Adding in test for issue detection. #58292

[data] Adding in test for issue detection. #58292

Uh oh!

omatthew98 commented Oct 29, 2025 •

edited by goutamvenkat-anyscale

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[data] Adding in test for issue detection. #58292

[data] Adding in test for issue detection. #58292

Uh oh!

Conversation

omatthew98 commented Oct 29, 2025 • edited by goutamvenkat-anyscale Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

This comment was marked as outdated.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

omatthew98 commented Oct 29, 2025 •

edited by goutamvenkat-anyscale

Loading