Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update critical_service_loop to throw a runtime error on failure #9267

Merged
merged 2 commits into from
Apr 20, 2023

Conversation

zanieb
Copy link
Contributor

@zanieb zanieb commented Apr 20, 2023

Fixes bug where the agent hangs when only one of its critical service loops fails.

We've been observing this for a while, but it was not clear what the cause was. It turns out we just returned from this function on error because we assumed it was being used in a context where that would result in an exit. Since the agent now checks for cancelled runs in addition to scheduled runs in separate loops, if one loop failed the agent could continue running without reporting the failure. This almost definitely patches the same bug for workers.

Loosely related to some behavior in #7442 e.g. #7442 (comment)
cc @jawnsy who first reported this behavior
Closes #9052

Fixes bug where the agent hangs when only one of its critical service loops fails
@zanieb zanieb added the fix A fix for a bug in an existing feature label Apr 20, 2023
@netlify
Copy link

netlify bot commented Apr 20, 2023

Deploy Preview for prefect-docs-preview ready!

Name Link
🔨 Latest commit 1f07f63
🔍 Latest deploy log https://app.netlify.com/sites/prefect-docs-preview/deploys/644155ea852d5e0007aa8656
😎 Deploy Preview https://deploy-preview-9267--prefect-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Comment on lines -67 to -68
except KeyboardInterrupt:
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never swallow keyboard interrupts

@zanieb zanieb marked this pull request as ready for review April 20, 2023 15:10
@zanieb zanieb requested a review from a team as a code owner April 20, 2023 15:10
@zanieb zanieb requested a review from a team April 20, 2023 15:10
@desertaxle
Copy link
Member

This should also resolve #9052

@zanieb
Copy link
Contributor Author

zanieb commented Apr 20, 2023

@desertaxle thank you that's the one I was looking for!

Copy link
Member

@desertaxle desertaxle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@zanieb zanieb merged commit e5ef50d into main Apr 20, 2023
@zanieb zanieb deleted the fix-service-loop branch April 20, 2023 15:48
asmundo pushed a commit to asmundo/prefect that referenced this pull request May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix A fix for a bug in an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Temporary network connectivity problems cause Worker to stop working
2 participants