Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change GRPC message limit SQL error type #6630

Merged
merged 14 commits into from
Sep 17, 2020

Conversation

setassociative
Copy link
Contributor

@setassociative setassociative commented Aug 26, 2020

Summary

There are a collection of errors bundled under the generic GRPC ResourceExhausted error. Several of them are not necessarily related to resource exhausted but to limits that are being exceeded in a way that backing off and waiting for reduced load will not resolve.

This change introduces a processing step in the GRPC -> mysql error conversion that rewrites those errors from the current 1203 - ERTooManyUserConnections to a (somewhat) more appropriate 1153 - ERNetPacketTooLarge. Two other errors are also recategorized in the vtgate executor to be treated as the 1153 instead of 1203.

The three errors being changed:

  • When a GRPC message coming back to the vtgate exceeds the configured size limit (configured via grpc_max_message_size)
  • If a write is attempting to make a change with a payload that is larger than the configured allowable size
  • if a scatter query is attempting to collect process too many rows in memory within the vtgate to construct a response

Before this change each of these would the SQL error ERTooManyUserConnections when using the MySQL interface. This is a fairly unfortunate conflation of that error and doesn't make sense to be consumed as "too many users" when the actual error is various flavors of "too much data."

Further in a situation where you want to use error codes to apply back pressure to callers having these two classes of error combined means that it's not a great signal.

We discussed briefly whether we should also change the GRPC error so that ResourceExhausted isn't overloaded but it seems that those errors need to match GRPC error set (ref). Additionally it would be a much larger change because ResourceExhausted is used for some session management. If we decide later to revisit this decision that will be fine as the GRPC and SQL errors are independent (also, even if it does feel like what we'd want to do I would prefer it to be a separate change from this from a pragmatic standpoint).

Misc

1️⃣ This change exceeds our initial plan which covered only the GRPC message limit case and I'm open for pushback on the other two but I do think that all these cases are better suited as 1153 than 1203.

2️⃣ The error message checks are extremely factored and I'm happy to pull them out of vterrors for another home or deal with that differently if folks have opinions. I didn't know what was the most reasonable location since they needed to be shared between sql_errors and the vtgate

Testing

  • Added unit tests for the method that processes the resource exhausted errors
  • Built and deployed a modified vtgate and observed expected (new) error code when running with an artificially low grpc message limit (3k). Note ERROR 1153 in the following:
mysql> select id, data from test_table limit 100;
ERROR 1153 (HY000): vtgate: http://vtgate-dev/: target: keyspace.-.master, used tablet: <tablet_uid> (<tablet-host>): vttablet: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (12579 vs. 3072)
  • Observed row-count limit related errors returning 1153 mysql error:
mysql> select id, data from test_table limit 10000;
ERROR 1153 (HY000): vtgate: http://vtgate-dev/: in-memory row count exceeded allowed limit of 100
  • Explicitly: did not test the payload-too-large case and relying on unit tests 🤠

Reference

We talked about this in the Vitess slack and decided on ERNetPacketTooLarge as the new error code there.

Release Note Required

A MySQL error code 1153 is now being returned in the following cases:

  • When a GRPC message coming back to the vtgate exceeds the configured size limit
  • If a write is attempting to make a change with a payload that is larger than the configured allowable size
  • if a scatter query is attempting to process too many rows in memory within the vtgate constructing the result set

Previously these would be returned as MySQL 1203 if received over the MySQL connector or Code_RESOURCE_EXHAUSTED over GRPC.

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
@setassociative setassociative changed the title Change GRPC message limit SQL error type [draft] Change GRPC message limit SQL error type Aug 26, 2020
@zmagg zmagg requested a review from dweitzman August 26, 2020 19:30
@setassociative setassociative removed the request for review from sougou August 26, 2020 19:41
@setassociative setassociative marked this pull request as draft August 26, 2020 23:45
this is ... mabye a bit much with the factoring out but maybe not

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
@setassociative setassociative changed the title [draft] Change GRPC message limit SQL error type Change GRPC message limit SQL error type Aug 27, 2020
@setassociative setassociative marked this pull request as ready for review August 27, 2020 02:11
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
go/mysql/sql_error_test.go Outdated Show resolved Hide resolved
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
go/vt/vtgate/executor.go Outdated Show resolved Hide resolved
go/vt/vtgate/scatter_conn.go Outdated Show resolved Hide resolved
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
@setassociative
Copy link
Contributor Author

@harshit-gangal I applied suggested error code changes; I believe the unit test that is failing is unrelated since, afaict, it's erroring out with relation to failing to connect to a mysql instance. Possible due to bad auth?

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
@deepthi
Copy link
Member

deepthi commented Sep 14, 2020

@setassociative this could break applications that depend on the exact error code? Can you summarize the net effect in a Release Note Required section so that we can put it into the release notes?

@@ -120,7 +120,7 @@ func NewSQLErrorFromError(err error) error {
case vtrpcpb.Code_UNAUTHENTICATED:
num = ERAccessDeniedError
case vtrpcpb.Code_RESOURCE_EXHAUSTED:
num = ERTooManyUserConnections
num = demuxResourceExhaustedErrors(err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still required after changing to through mysql error directly?

Copy link
Contributor Author

@setassociative setassociative Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- if we directly map all ResourceExhausted into ERNetPacketTooLarge then we end up in the same position where we are grouping errors that don't necessarily make sense together.

Specifically when we overflow a connection pool (vttablet: rpc error: code = ResourceExhausted desc = pool ConnPool waiter count exceeded) and the like we want to distinguish that as something that can be solved via backoff (ResourceExhausted) instead of queries that are inherently less retryable because they are likely to generate "bad" data (ErNetPacketTooLarge).

Comment on lines 33 to 35
testCase{"grpc: received message larger than max (99282 vs. 1234): trailer", ERTooManyUserConnections},
testCase{"grpc: received message larger than max (1234 vs. 1234)", ERNetPacketTooLarge},
testCase{"header: grpc: received message larger than max (1234 vs. 1234)", ERNetPacketTooLarge},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of them should say ERNetPacketTooLarge. you should change the pattern matcher. I do not see a reason why the first test case should not say ERNetPacketTooLarge

Copy link
Contributor Author

@setassociative setassociative Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Sure thing

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
@harshit-gangal harshit-gangal merged commit fd43218 into vitessio:master Sep 17, 2020
@askdba askdba added this to the v8.0 milestone Oct 6, 2020
setassociative added a commit to tinyspeck/vitess that referenced this pull request Nov 7, 2020
* impl+log

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* correctly escape regex

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* simple tests, remove logging

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* years are dumb

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* handle the other tow RE cases.

this is ... mabye a bit much with the factoring out but maybe not

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* fix up comments, move impls around

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* better better error comment <_<

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* assert!

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* to run test suite

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* fix up testsv

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* tests pass; remove dead code-as-comments

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* pull out unnecessary processing

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>

* don't differentiate trailing vs leading clarification

Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
aquarapid added a commit to planetscale/vitess that referenced this pull request Sep 20, 2021
…, but only

for inserts, selects of too much data would still report ERTooManyUserConnections

Signed-off-by: Jacques Grove <aquarapid@gmail.com>
aquarapid added a commit to planetscale/vitess that referenced this pull request Oct 5, 2021
…, but only

for inserts, selects of too much data would still report ERTooManyUserConnections

Signed-off-by: Jacques Grove <aquarapid@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants