Skip to content

Conversation

@Sushisource
Copy link
Member

What was changed

Avoid retrying worker shutdown RPCs and also fix a shutdown hang

Why?

Avoid hangs on shutdown

Checklist

  1. Closes [Feature Request] Use shorter/lesser retry on shutdown worker call #1045

  2. How was this tested:
    Added test, this test easily reprod the shutdown hang that I think Andrew is looking into.

  3. Any docs updates needed?

@Sushisource Sushisource requested a review from a team as a code owner November 8, 2025 04:45
Comment on lines +280 to +283
// Bump the workflow stream with a pointless input, since if a client initiates shutdown
// and then immediately blocks waiting on a workflow activation poll, it's possible that
// there may not be any more inputs ever, and that poll will never resolve.
self.workflows.send_get_state_info_msg();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the real fix for the shutdown hang. There was a race where "ever polled" could be set true, but nothing else ever ended up in the stream, and driving the stream with bonus messages doesn't happen until we get to shutdown itself, but that won't happen (at least in some test setups, but maybe some actual SDK loops too) until pollers have returned.

Comment on lines +96 to +99
let hb_callbacks = {
heartbeat_map_clone.read().values().cloned().collect::<Vec<_>>()
};
for heartbeat_callback in hb_callbacks {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just some bonus cleanup to avoid holding this lock while doing the callback gathering

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have seen a need for customizable retries from lang yet on any call, though it is nice that this exists now if we do get a request. Today on a per-call basis they can only opt in/out of retries and set timeouts.

@Sushisource Sushisource merged commit 9a87ebf into master Nov 10, 2025
33 of 34 checks passed
@Sushisource Sushisource deleted the no-retry-worker-shutdown branch November 10, 2025 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Use shorter/lesser retry on shutdown worker call

4 participants