Skip to content

Conversation

@jjyao
Copy link
Collaborator

@jjyao jjyao commented Oct 24, 2025

Description

Symptom

reference_counter.cc:1269: An unexpected system state has occurred. You have likely discovered a bug in Ray. Please report this issue at https://github.com/ray-project/ray/issues and we'll work with you to fix it. Check failed: inserted

Root cause

The ray check assumption is not right: the object_id can be inserted to stored_in_objects multiple times in the case of task retry due to network failures. It's fine since stored_in_objects is a map.

Related issues

Closes #57997

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@jjyao jjyao requested a review from a team as a code owner October 24, 2025 17:20
@jjyao jjyao added the go add ONLY when ready to merge, run all tests label Oct 24, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a RAY_CHECK failure within the reference counter, which occurred due to an incorrect assumption about object ID insertions during task retries with network failures. The fix correctly removes the assertion, allowing for idempotent insertions into the stored_in_objects map. A new regression test has been added to specifically simulate this network failure scenario and validate the fix. The changes are well-targeted, the fix is correct, and the accompanying test ensures this issue will not regress. The removal of an unused import is also a good cleanup. Overall, this is a solid contribution.

jjyao added 2 commits October 24, 2025 11:27
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@ray-gardener ray-gardener bot added the core Issues that should be addressed in Ray Core label Oct 24, 2025
Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you also audited the rest of the codepath for idempotence?

We should add C++ tests for idempotence of the relevant ref counter methods (following @Sparks0219's playbook)

jjyao added 2 commits October 24, 2025 12:30
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@jjyao
Copy link
Collaborator Author

jjyao commented Oct 24, 2025

Assuming you also audited the rest of the codepath for idempotence?

I checked this particular borrow case but I haven't checked all possible borrow cases yet and I won't be surprised that there might be other issues there.

@jjyao jjyao requested a review from edoakes October 24, 2025 21:44
@jjyao jjyao enabled auto-merge (squash) October 24, 2025 22:28
@jjyao jjyao merged commit 73deda6 into ray-project:master Oct 24, 2025
7 checks passed
@jjyao jjyao deleted the jjyao/borrowing branch October 25, 2025 04:16
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 27, 2025
…58092)

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: xgui <xgui@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…58092)

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

3 participants