-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core][Object Store]Fix object reconstruction hang on arguments pending creation. #47607
base: master
Are you sure you want to change the base?
[Core][Object Store]Fix object reconstruction hang on arguments pending creation. #47607
Conversation
debd360
to
d52a62e
Compare
if (pending_creation) { | ||
RAY_LOG(WARNING) << "Object[" << object_id << "] has already been " | ||
<< "marked as failed and cannot continue pending creation"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should never happen?
RAY_CHECK_OK(PutInLocalPlasmaStore(object, object_id, /*pin_object=*/true)); | ||
if (mark_object_failed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to have mark_object_failed
boolean: we can always set pending_creation to false whenever put_in_local_plasma_callback
is called.
@@ -811,6 +813,8 @@ class ReferenceCounter : public ReferenceCounterInterface, | |||
|
|||
/// Whether or not this object was spilled. | |||
bool did_spill = false; | |||
|
|||
bool mark_object_failed = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we need this.
@jjyao The cause of this issue is that when the actor task is resubmitted, it is actually because the actor has already failed, which prevents the task from being dispatched. However, The reason I chose to add |
@Catch-Bull I'm trying a different fix #47645. PTAL |
Why are these changes needed?
@jjyao @rynewang Do you have any other concerns about this fix? If you generally agree on this PR, I will add the UT later and start the formal review.
Related issue number
Closes #47606
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.