Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [RMS-1524] add DeadlineExceeded status code #416

Merged
merged 2 commits into from
Dec 8, 2023

Conversation

dundee
Copy link
Contributor

@dundee dundee commented Nov 28, 2023

What this PR does / why we need it

This PR is adding DeadlineExceeded status code. It's error type that is quite different to other server side errors and we would like to track and measure it separately (in our dashboards etc.).

Jira ID

RMS-1524

Notes for your reviewers

@dundee dundee marked this pull request as ready for review November 28, 2023 16:08
@dundee dundee requested a review from a team as a code owner November 28, 2023 16:08
@@ -68,6 +68,9 @@ const (
// Deprecated: In reality server-side errors should fall into one of the above 3 errors, and this inclusion was
// a mistake. It's not worth a breaking change to revoke at this time, though, so it shall live on.
UnknownError StatusCode = 803
// DeadlineExceeded is for when the server is unable to complete the request within the configured deadline and the
// request times out. This error is retriable, but the duration for backoff is unknown.
DeadlineExceeded StatusCode = 804
Copy link
Contributor

@jaredallard jaredallard Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this would betterfall under "Unavailable", since a "DeadlineExceeded" doesn't really provide any more color to an error than Unavailable.

The client would also have the same response to the error as that. I think, if the goal is to provide different metrics, an entirely new status code isn't the right approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess returning timeouts as Unavailable can lead to some misunderstandings in the future as Unavailable is generally meant for a temporary downtime of the whole service and DeadlineExceeded is meant as a failure of just the one particular request. The gRPC statuscodes also define it as two separate cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even the FE can show two different messages to the user to better explain the situation.

  • Unavailable - Service is temporarily unavailable, please try again later
  • DeadlineExceeded - The request took too long, please try again later

In the second case the user can understand that the action he's performing is probably resource-intensive and can try to optimize the request, for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. I'm for this 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion the case does not need a dedicated status code.
If server failed to process user request within certain timeout it is an InternalServerError - transient condition that can and should be retried by the client

Eng reaction to it should either be

  • increase timeout because it is too low
  • improve performance of the request processing (DB optimization, caching, etc...)

But this opinion is not blocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention for this status code was mainly to track it separately in our gRPC dashboards but since it has wider implications I will start discussion with FE team and PM as well.

Copy link
Contributor Author

@dundee dundee Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked with RME and our PM and confirmed that FE can benefit from this and show more concrete message to the user.

@dundee dundee enabled auto-merge (squash) December 8, 2023 10:13
@dundee dundee merged commit 58dca12 into main Dec 8, 2023
2 checks passed
@dundee dundee deleted the dundee/feat/err-deadline-exceeded branch December 8, 2023 10:15
@getoutreach-ci-1
Copy link
Contributor

🎉 This PR is included in version 1.85.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants