Skip to content

Conversation

@CaiZhanqi
Copy link
Contributor

Description

Fixed "dictionary changed size during iteration" error that occurs when shutdown() iterates over task_status_dict while background threads modify it concurrently.

Additional information

Why not use thread lock? The bug is in the shutdown part, and no other parts iterate it.

…ration

Signed-off-by: Cai Zhanqi <zhanqi.cai@shopee.com>
@CaiZhanqi CaiZhanqi requested a review from a team as a code owner November 21, 2025 09:40
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a RuntimeError: dictionary changed size during iteration in SparkJobServer.shutdown(). The error occurs because task_status_dict is iterated over while potentially being modified by background threads. The fix correctly resolves this race condition by creating a copy of the dictionary's keys before iteration using list(self.task_status_dict.keys()). This is a standard and effective way to prevent such errors. The change is simple, targeted, and correct. Good job!

@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Nov 21, 2025
@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Nov 21, 2025
@edoakes edoakes enabled auto-merge (squash) November 21, 2025 16:42
@edoakes
Copy link
Collaborator

edoakes commented Nov 21, 2025

Thanks @CaiZhanqi!

@edoakes
Copy link
Collaborator

edoakes commented Nov 21, 2025

I've triggered premerge CI tests and the PR will auto-merge if they pass. If not, please ping me.

@edoakes edoakes merged commit 62dd09a into ray-project:master Nov 21, 2025
7 checks passed
400Ping pushed a commit to 400Ping/ray that referenced this pull request Nov 21, 2025
…ration (ray-project#58888)

## Description
> Fixed "dictionary changed size during iteration" error that occurs
when shutdown() iterates over task_status_dict while background threads
modify it concurrently.

## Additional information
> Why not use thread lock? The bug is in the shutdown part, and no other
parts iterate it.

Signed-off-by: Cai Zhanqi <zhanqi.cai@shopee.com>
Co-authored-by: Cai Zhanqi <zhanqi.cai@shopee.com>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…ration (ray-project#58888)

## Description
> Fixed "dictionary changed size during iteration" error that occurs
when shutdown() iterates over task_status_dict while background threads
modify it concurrently.

## Additional information
> Why not use thread lock? The bug is in the shutdown part, and no other
parts iterate it.

Signed-off-by: Cai Zhanqi <zhanqi.cai@shopee.com>
Co-authored-by: Cai Zhanqi <zhanqi.cai@shopee.com>
Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…ration (ray-project#58888)

## Description
> Fixed "dictionary changed size during iteration" error that occurs
when shutdown() iterates over task_status_dict while background threads
modify it concurrently.

## Additional information
> Why not use thread lock? The bug is in the shutdown part, and no other
parts iterate it.

Signed-off-by: Cai Zhanqi <zhanqi.cai@shopee.com>
Co-authored-by: Cai Zhanqi <zhanqi.cai@shopee.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants