-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC plugin framework should be able to recover from panics #1742
Comments
Plugin framework is just https://github.com/hashicorp/go-plugin I don't know if it supports recovery from plug-in crashes. |
I'd like to pick this one up. I have some experience with this. |
Having some issues with a gRPC storage plugin I'm writing and really struggling to get good reason/stack data out. @radekg or anyone else -- any suggestions on how to better get visibility? |
I think @gouthamve had very similar issues. @gouthamve: do you have a couple of minutes to share your experience? |
Just saw #2071, which might help with this. |
@radekg what solution were you thinking about? I'm wondering if it would be best to simply crash/exit with an error code, so k8s/systemd would restart jaeger and normal alerting would detect the issue.
On the other hand, since the plugins are isolated, one could argue that we don't have to worry about the first case, and the second case might not be relevant because the process can't do anything useful without a working storage plugin any way. |
Hi @chlunde. It's been a while since the last time I looked at this. The go-plugin implements a ping message in the protocol. Apparently that message can be used to find out if the plugin is running. That would have been the route I would have taken. IMHO, restarting the complete collector seems to be a bit heavy handed. |
Requirement - what kind of business use case are you trying to solve?
We are implementing a custom gRPC-based storage plugin as per this doc.
Problem - what in Jaeger blocks you from solving the requirement?
There are two related problems:
Impact:
defer/recover
to prevent it from fully crashing and display helpful debug infoProposal - what do you suggest to solve the problem or improve the existing situation?
Reason and a stacktrace should be already dumped into
stderr
of a plugin process at the time of panic, so Jaeger should be able to capture and log it.Crashed plugins should be restarted.
Any open questions to address
The text was updated successfully, but these errors were encountered: