-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: [RMS-1524] add DeadlineExceeded status code #416
Conversation
@@ -68,6 +68,9 @@ const ( | |||
// Deprecated: In reality server-side errors should fall into one of the above 3 errors, and this inclusion was | |||
// a mistake. It's not worth a breaking change to revoke at this time, though, so it shall live on. | |||
UnknownError StatusCode = 803 | |||
// DeadlineExceeded is for when the server is unable to complete the request within the configured deadline and the | |||
// request times out. This error is retriable, but the duration for backoff is unknown. | |||
DeadlineExceeded StatusCode = 804 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this would betterfall under "Unavailable", since a "DeadlineExceeded" doesn't really provide any more color to an error than Unavailable
.
The client would also have the same response to the error as that. I think, if the goal is to provide different metrics, an entirely new status code isn't the right approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess returning timeouts as Unavailable
can lead to some misunderstandings in the future as Unavailable
is generally meant for a temporary downtime of the whole service and DeadlineExceeded
is meant as a failure of just the one particular request. The gRPC statuscodes also define it as two separate cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think even the FE can show two different messages to the user to better explain the situation.
Unavailable
- Service is temporarily unavailable, please try again laterDeadlineExceeded
- The request took too long, please try again later
In the second case the user can understand that the action he's performing is probably resource-intensive and can try to optimize the request, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. I'm for this 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion the case does not need a dedicated status code.
If server failed to process user request within certain timeout it is an InternalServerError - transient condition that can and should be retried by the client
Eng reaction to it should either be
- increase timeout because it is too low
- improve performance of the request processing (DB optimization, caching, etc...)
But this opinion is not blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention for this status code was mainly to track it separately in our gRPC dashboards but since it has wider implications I will start discussion with FE team and PM as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked with RME and our PM and confirmed that FE can benefit from this and show more concrete message to the user.
🎉 This PR is included in version 1.85.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
What this PR does / why we need it
This PR is adding
DeadlineExceeded
status code. It's error type that is quite different to other server side errors and we would like to track and measure it separately (in our dashboards etc.).Jira ID
RMS-1524
Notes for your reviewers