-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix panic when servers return a wrapped error with status OK #6374
Conversation
2fe624f
to
667903b
Compare
I'm in agreement with the spirit of this change, i.e grpc should better protect itself from misbehaving wrapped errors that implement the Also, @dfawley is OOO and is back mid next week. I would want to get his thoughts too on this one. And backporting it to as many releases as necessary shouldn't be an issue. @atollena : Is your production service still affected? How badly/frequently? Or do you have a workaround for now? Thanks. |
I think the only trace of documentation for the
We mitigated that problem by reverting to v1.54.1, so we are only affected in that we are stuck on an older version. The problem is infrequent but exists. We fixed the occurrences that we found, some of them in third party libraries: googleapis/google-cloud-go#8127 (in case of errors retrying to GCS, I think). It's not trivial to audit the codebase to be 100% sure if there is no other occurence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM module some minor nits.
When wrapping an error that has a gRPC status, we reuse the underlying status message field and put the error message in there. If the returned gRPC status is nil, then there is a nil dereference of the status Message field. This change fixes this behavior by returning Unknown status and the error message when the wrapped error has status OK.
@easwars I took your suggestion thank you. I also updated the RELEASE NOTES string in the PR description. I saw that you approved but in the past, I've let you take care of merging. Should I merge or should I let you do it? Thanks! |
I'm confused about this.
How is this not an error in usage? Why/when are you wrapping a "not-an-error" status error ( |
There was an occurence in the google cloud sdk (it's been fixed in the latest release). This made services that use Google Cloud Storage crash on retry after the grpc upgrade, while they were working fine with 1.54.1. Even though, as you point out, this is an error in usage, I do think this change in behaviour is dangerous. Do you think we should just fix all call sites as we discover them? (note that this can be difficult when it involves third party libraries, like was the case for us -- but we could hope that this never happens again). |
Thanks for the explanation. I agree we shouldn't panic when this happens even if it's an error in usage, and so this change is probably good as-is. I just wanted to make sure you weren't actually doing this kind of thing intentionally. Is the fix in |
You're right (and looking at the code again it took me a moment to understand why Could you merge this PR please, now that it got 2 approvals? Or do you want any other changes applied? Thanks! |
IIUC this needs to be backported to the v1.56.x branch too? |
It would be very useful to us, so if it isn't too much to ask, yes please. |
I'll backport this to 1.55 and 1.56 today :) |
When wrapping an error that has a gRPC status, we reuse the underlying status message field and put the error message in there. If the returned gRPC status is nil, then there is a nil dereference of the status Message field, causing panic.
This change fixes this behavior by returning an Unknown status code and the error message whenever the wrapped error
has status OK.
RELEASE NOTES:
status.FromError
now returns an error withcodes.Unknown
when the error implements theGRPCStatus()
method, and callingGRPCStatus()
returns nil.