-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change GRPC message limit SQL error type #6630
Change GRPC message limit SQL error type #6630
Conversation
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
this is ... mabye a bit much with the factoring out but maybe not Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
@harshit-gangal I applied suggested error code changes; I believe the unit test that is failing is unrelated since, afaict, it's erroring out with relation to failing to connect to a mysql instance. Possible due to bad auth? |
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
@setassociative this could break applications that depend on the exact error code? Can you summarize the net effect in a |
@@ -120,7 +120,7 @@ func NewSQLErrorFromError(err error) error { | |||
case vtrpcpb.Code_UNAUTHENTICATED: | |||
num = ERAccessDeniedError | |||
case vtrpcpb.Code_RESOURCE_EXHAUSTED: | |||
num = ERTooManyUserConnections | |||
num = demuxResourceExhaustedErrors(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still required after changing to through mysql error directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- if we directly map all ResourceExhausted into ERNetPacketTooLarge then we end up in the same position where we are grouping errors that don't necessarily make sense together.
Specifically when we overflow a connection pool (vttablet: rpc error: code = ResourceExhausted desc = pool ConnPool waiter count exceeded
) and the like we want to distinguish that as something that can be solved via backoff (ResourceExhausted) instead of queries that are inherently less retryable because they are likely to generate "bad" data (ErNetPacketTooLarge).
go/mysql/sql_error_test.go
Outdated
testCase{"grpc: received message larger than max (99282 vs. 1234): trailer", ERTooManyUserConnections}, | ||
testCase{"grpc: received message larger than max (1234 vs. 1234)", ERNetPacketTooLarge}, | ||
testCase{"header: grpc: received message larger than max (1234 vs. 1234)", ERNetPacketTooLarge}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of them should say ERNetPacketTooLarge. you should change the pattern matcher. I do not see a reason why the first test case should not say ERNetPacketTooLarge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Sure thing
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
* impl+log Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * correctly escape regex Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * simple tests, remove logging Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * years are dumb Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * handle the other tow RE cases. this is ... mabye a bit much with the factoring out but maybe not Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * fix up comments, move impls around Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * better better error comment <_< Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * assert! Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * to run test suite Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * fix up testsv Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * tests pass; remove dead code-as-comments Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * pull out unnecessary processing Signed-off-by: Richard Bailey <rbailey@slack-corp.com> * don't differentiate trailing vs leading clarification Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
…, but only for inserts, selects of too much data would still report ERTooManyUserConnections Signed-off-by: Jacques Grove <aquarapid@gmail.com>
…, but only for inserts, selects of too much data would still report ERTooManyUserConnections Signed-off-by: Jacques Grove <aquarapid@gmail.com>
Summary
There are a collection of errors bundled under the generic GRPC ResourceExhausted error. Several of them are not necessarily related to resource exhausted but to limits that are being exceeded in a way that backing off and waiting for reduced load will not resolve.
This change introduces a processing step in the GRPC -> mysql error conversion that rewrites those errors from the current
1203 - ERTooManyUserConnections
to a (somewhat) more appropriate1153 - ERNetPacketTooLarge
. Two other errors are also recategorized in the vtgate executor to be treated as the 1153 instead of 1203.The three errors being changed:
grpc_max_message_size
)Before this change each of these would the SQL error
ERTooManyUserConnections
when using the MySQL interface. This is a fairly unfortunate conflation of that error and doesn't make sense to be consumed as "too many users" when the actual error is various flavors of "too much data."Further in a situation where you want to use error codes to apply back pressure to callers having these two classes of error combined means that it's not a great signal.
We discussed briefly whether we should also change the GRPC error so that ResourceExhausted isn't overloaded but it seems that those errors need to match GRPC error set (ref). Additionally it would be a much larger change because ResourceExhausted is used for some session management. If we decide later to revisit this decision that will be fine as the GRPC and SQL errors are independent (also, even if it does feel like what we'd want to do I would prefer it to be a separate change from this from a pragmatic standpoint).
Misc
1️⃣ This change exceeds our initial plan which covered only the GRPC message limit case and I'm open for pushback on the other two but I do think that all these cases are better suited as 1153 than 1203.
2️⃣ The error message checks are extremely factored and I'm happy to pull them out of
vterrors
for another home or deal with that differently if folks have opinions. I didn't know what was the most reasonable location since they needed to be shared between sql_errors and the vtgateTesting
ERROR 1153
in the following:Reference
We talked about this in the Vitess slack and decided on
ERNetPacketTooLarge
as the new error code there.Release Note Required
A MySQL error code 1153 is now being returned in the following cases:
Previously these would be returned as MySQL 1203 if received over the MySQL connector or
Code_RESOURCE_EXHAUSTED
over GRPC.