Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backend: send the error to the client when the handler encounters an error #187

Merged
merged 13 commits into from
Jan 16, 2023

Conversation

djshow832
Copy link
Collaborator

@djshow832 djshow832 commented Jan 11, 2023

What problem does this PR solve?

Issue Number: close #180

Problem Summary:
Currently, when the serverless tier encounters an error, the error is logged but not returned to the client. The client cannot figure out the error easily.
We need to wrap the error into a MySQL error and return it to the client.

What is changed and how it works:

  • Write a MySQL error to the client when the fetcher and handler return errors

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Start a TiProxy without any TiDB instances. Then start a MySQL client to connect to the TiProxy:

mysql -h127.1 -uroot -P6000 test
ERROR 1105 (HY000): No available TiDB instances, please check TiDB cluster

Notable changes

  • Has configuration change
  • Has HTTP API interfaces change (Don't forget to add the declarative for API)
  • Has tiproxyctl change
  • Other user behavior changes

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@djshow832 djshow832 requested a review from xhebox January 11, 2023 08:07
Copy link
Collaborator

@xhebox xhebox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 42 to 48
func WriteUnknownError(clientIO *pnet.PacketIO, err error, lg *zap.Logger) {
if err != nil {
if writeErr := clientIO.WriteErrPacket(mysql.NewErr(mysql.ErrUnknown, err.Error())); writeErr != nil {
lg.Error("writing error to client failed", zap.NamedError("mysql_err", err), zap.NamedError("write_err", writeErr))
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is always called with err != nil. It should be renamed to something like TryWriteUnknownError with the extra judge of err != nil.

Copy link
Collaborator Author

@djshow832 djshow832 Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's always ensured err != nil from the callers. But I want to make another check to ensure that.

Comment on lines 232 to 234
if err != nil && mgr.clientIO != nil {
WriteUnknownError(mgr.clientIO, err, mgr.logger)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be possible to do something like this:

var v *backoff.PermanentError
if errors.As(err, &v) {
   err = v.Unwrap()
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's OK, because:

func (e *PermanentError) Error() string {
	return e.Err.Error()
}

pkg/proxy/backend/backend_conn_mgr.go Outdated Show resolved Hide resolved
pkg/proxy/backend/backend_conn_mgr.go Outdated Show resolved Hide resolved
lib/config/proxy.go Outdated Show resolved Hide resolved
@djshow832 djshow832 requested a review from xhebox January 12, 2023 02:43
Copy link
Collaborator

@xhebox xhebox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not wrap so many errors, it is fine to wrap errors before L137/frontend cap negotiation with ErrBeforeClientResp:

if errors.Is(err, ErrBeforeClientResp) {
  // only log
} else if errors.As(err, &UserError) {
  // log and send custom error, or 'client cap', 'no instance'
} else if errors.Is(err, Deadline) {
  // log and send `timeout, bad network situation` 
} else {
  // log and send `cluster configuration/topo wrong, contact admin or check proxy log`, or whatever
}

In fact, I think only client cap negotiation, no instance, dial timeout and possible errors from gateway should be sent to user.

All other errors are likely mistakes of cluster configuration, which could be solved by auditing tiproxy log and fixing cluster topology. And on tidb cloud, that is impossible for users.

}

if err := auth.verifyBackendCaps(logger, backendCapability); err != nil {
return err
return WrapUserError(err, capabilityErrMsg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap the error in verifyBackendCaps. BTW, i guess frontend cap verify should also be checked.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between wrapping it here or in verifyBackendCaps? I don't want to call WrapUserError everywhere.

Copy link
Collaborator

@xhebox xhebox Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because verifyBackendCaps is invoked in handshakeSecondTime, too. It is causing inconsistent errors somewhat. We are already invoke WrapUserError everywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If handshakeSecondTime fails, the error should not return to the client. The proxy just uses the original backend and won't switch backends.

@@ -146,7 +146,7 @@ func (auth *Authenticator) handshakeFirstTime(logger *zap.Logger, cctx ConnConte

clientResp := pnet.ParseHandshakeResponse(pkt)
if err = handshakeHandler.HandleHandshakeResp(cctx, clientResp); err != nil {
return err
return WrapUserError(err, err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to invoke WrapUserError here.

I mean, why not just let the handler/serverless-tier to wrap errors. Maybe they want to return abnormal and normal errors at the same time, from one single function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just makes the API too complicated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. They clearly will have internal and non-internal errors, just like us.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can just log internal errors and return user errors.

userMsg string
}

func WrapUserError(err error, userMsg string) *UserError {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is duplicated with errors.Wrapf(err, "Ggg"), which is basically the samething. And you can get the msg by errors.Unwrap(wrapedErr).Error().

We just need to define ErrUserError to reuse lib/util/errors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean:
To wrap: err = errors.Wrapf(errors.Wrapf(ErrUserError, err), userMsg)
To get userMsg: errors.Unwrap(err).Error()
To get logMsg: err.Error() or errors.Unwrap(errors.Unwrap(err)).Error()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean:

To wrap: err = errors.Wrapf(ErrUserError, userMsg)
To get userMsg: errors.Unwrap(user_err).Error()

If it is user error, then it is also safe to log. If it is not, the internal error should be logged by either gateway or tiproxy. I mean logMsg and userMsg are the samething for internal errors.

Copy link
Collaborator Author

@djshow832 djshow832 Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal error msg like dial timeout, and EOF, are lost? Where do I log them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, no instance and dial timeout are both replaced by ErrUserError, then when I log the msg, I don't know the original error.
A walkaround is to log the internal errors whenever they are generated, but this brings more code.

pkg/proxy/backend/backend_conn_mgr.go Outdated Show resolved Hide resolved
@@ -156,23 +156,29 @@ func (auth *Authenticator) handshakeFirstTime(logger *zap.Logger, cctx ConnConte
// In case of testing, backendIO is passed manually that we don't want to bother with the routing logic.
backendIO, err := getBackendIO(cctx, auth, clientResp, 5*time.Second)
if err != nil {
return err
return WrapUserError(err, connectErrMsg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically wraps all errors in getBackendIO to connectErrMsg.

While the underlying errors may contain messages from handshakeHandler.GetRouter(), errors.As will stop at the most outside error, i.e. the error wrapped this line. We should not wrap user errors multiple times: invoke errors.As onTiproxy wrapped -> backoff wrapped -> handler wrapped -> real msg will stop at Tiproxy wrapped.

Copy link
Collaborator Author

@djshow832 djshow832 Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked it in WrapUserError:

if ue, ok := err.(*UserError); ok {
	return ue
}

It won't wrap multiple times.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if WrapUserError(errors.Errorf("%w", WrapUserError())))? It can be a long chain since we can't decide how gateway use this function. Either we just don't export the function, or we just let them call, IMO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a problem, I can replace err.(*UserError) with errors.As.

@djshow832
Copy link
Collaborator Author

I think we should not wrap so many errors, it is fine to wrap errors before L137/frontend cap negotiation with ErrBeforeClientResp:

if errors.Is(err, ErrBeforeClientResp) {
  // only log
} else if errors.As(err, &UserError) {
  // log and send custom error, or 'client cap', 'no instance'
} else if errors.Is(err, Deadline) {
  // log and send `timeout, bad network situation` 
} else {
  // log and send `cluster configuration/topo wrong, contact admin or check proxy log`, or whatever
}

In fact, I think only client cap negotiation, no instance, dial timeout and possible errors from gateway should be sent to user.

All other errors are likely mistakes of cluster configuration, which could be solved by auditing tiproxy log and fixing cluster topology. And on tidb cloud, that is impossible for users.

I didn't wrap errors before L137, so they will be logged currently.
I should report errors when parsing the HandshakeResp or verifying client capability fails, just like TiDB does. And the error code and messages should also be the same as TiDB. But I didn't, it will make this PR too long.
no instance and dial timeout are internal logic. There are similar errors like deadline exceeds. I wrapped them all together into one kind of error. The user should check the cluster in whichever case.

@xhebox xhebox merged commit 8288910 into pingcap:main Jan 16, 2023
@djshow832 djshow832 deleted the mysql_error branch January 16, 2023 07:18
xhebox pushed a commit to xhebox/TiProxy that referenced this pull request Mar 7, 2023
xhebox pushed a commit to xhebox/TiProxy that referenced this pull request Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Send the error to the client when the handler encounters an error
2 participants