backend: send the error to the client when the handler encounters an error #187

djshow832 · 2023-01-11T08:07:03Z

What problem does this PR solve?

Issue Number: close #180

Problem Summary:
Currently, when the serverless tier encounters an error, the error is logged but not returned to the client. The client cannot figure out the error easily.
We need to wrap the error into a MySQL error and return it to the client.

What is changed and how it works:

Write a MySQL error to the client when the fetcher and handler return errors

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Start a TiProxy without any TiDB instances. Then start a MySQL client to connect to the TiProxy:

mysql -h127.1 -uroot -P6000 test
ERROR 1105 (HY000): No available TiDB instances, please check TiDB cluster

Notable changes

Has configuration change
Has HTTP API interfaces change (Don't forget to add the declarative for API)
Has tiproxyctl change
Other user behavior changes

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

xhebox

LGTM

xhebox · 2023-01-11T08:13:10Z

pkg/proxy/backend/util.go

+func WriteUnknownError(clientIO *pnet.PacketIO, err error, lg *zap.Logger) {
+	if err != nil {
+		if writeErr := clientIO.WriteErrPacket(mysql.NewErr(mysql.ErrUnknown, err.Error())); writeErr != nil {
+			lg.Error("writing error to client failed", zap.NamedError("mysql_err", err), zap.NamedError("write_err", writeErr))
+		}
+	}
+}


I think it is always called with err != nil. It should be renamed to something like TryWriteUnknownError with the extra judge of err != nil.

Yes, it's always ensured err != nil from the callers. But I want to make another check to ensure that.

xhebox · 2023-01-11T08:20:16Z

pkg/proxy/backend/backend_conn_mgr.go

+	if err != nil && mgr.clientIO != nil {
+		WriteUnknownError(mgr.clientIO, err, mgr.logger)
+	}


Should be possible to do something like this:

var v *backoff.PermanentError if errors.As(err, &v) { err = v.Unwrap() }

Actually it's OK, because:

func (e *PermanentError) Error() string { return e.Err.Error() }

pkg/proxy/backend/backend_conn_mgr.go

lib/config/proxy.go

xhebox

I think we should not wrap so many errors, it is fine to wrap errors before L137/frontend cap negotiation with ErrBeforeClientResp:

if errors.Is(err, ErrBeforeClientResp) {
  // only log
} else if errors.As(err, &UserError) {
  // log and send custom error, or 'client cap', 'no instance'
} else if errors.Is(err, Deadline) {
  // log and send `timeout, bad network situation` 
} else {
  // log and send `cluster configuration/topo wrong, contact admin or check proxy log`, or whatever
}

In fact, I think only client cap negotiation, no instance, dial timeout and possible errors from gateway should be sent to user.

All other errors are likely mistakes of cluster configuration, which could be solved by auditing tiproxy log and fixing cluster topology. And on tidb cloud, that is impossible for users.

xhebox · 2023-01-12T07:50:25Z

pkg/proxy/backend/authenticator.go

 	}

 	if err := auth.verifyBackendCaps(logger, backendCapability); err != nil {
-		return err
+		return WrapUserError(err, capabilityErrMsg)


Maybe wrap the error in verifyBackendCaps. BTW, i guess frontend cap verify should also be checked.

What's the difference between wrapping it here or in verifyBackendCaps? I don't want to call WrapUserError everywhere.

Yes, because verifyBackendCaps is invoked in handshakeSecondTime, too. It is causing inconsistent errors somewhat. We are already invoke WrapUserError everywhere.

If handshakeSecondTime fails, the error should not return to the client. The proxy just uses the original backend and won't switch backends.

xhebox · 2023-01-12T07:55:30Z

pkg/proxy/backend/authenticator.go

@@ -146,7 +146,7 @@ func (auth *Authenticator) handshakeFirstTime(logger *zap.Logger, cctx ConnConte

 	clientResp := pnet.ParseHandshakeResponse(pkt)
 	if err = handshakeHandler.HandleHandshakeResp(cctx, clientResp); err != nil {
-		return err
+		return WrapUserError(err, err.Error())


I don't think we need to invoke WrapUserError here.

I mean, why not just let the handler/serverless-tier to wrap errors. Maybe they want to return abnormal and normal errors at the same time, from one single function.

This just makes the API too complicated.

I don't think so. They clearly will have internal and non-internal errors, just like us.

They can just log internal errors and return user errors.

xhebox · 2023-01-12T08:08:56Z

pkg/proxy/backend/error.go

+	userMsg string
+}
+
+func WrapUserError(err error, userMsg string) *UserError {


I think this is duplicated with errors.Wrapf(err, "Ggg"), which is basically the samething. And you can get the msg by errors.Unwrap(wrapedErr).Error().

We just need to define ErrUserError to reuse lib/util/errors.

Do you mean:
To wrap: err = errors.Wrapf(errors.Wrapf(ErrUserError, err), userMsg)
To get userMsg: errors.Unwrap(err).Error()
To get logMsg: err.Error() or errors.Unwrap(errors.Unwrap(err)).Error()

I mean:

To wrap: err = errors.Wrapf(ErrUserError, userMsg)
To get userMsg: errors.Unwrap(user_err).Error()

If it is user error, then it is also safe to log. If it is not, the internal error should be logged by either gateway or tiproxy. I mean logMsg and userMsg are the samething for internal errors.

The internal error msg like dial timeout, and EOF, are lost? Where do I log them?

For example, no instance and dial timeout are both replaced by ErrUserError, then when I log the msg, I don't know the original error.
A walkaround is to log the internal errors whenever they are generated, but this brings more code.

pkg/proxy/backend/backend_conn_mgr.go

xhebox · 2023-01-12T08:15:11Z

pkg/proxy/backend/authenticator.go

@@ -156,23 +156,29 @@ func (auth *Authenticator) handshakeFirstTime(logger *zap.Logger, cctx ConnConte
 	// In case of testing, backendIO is passed manually that we don't want to bother with the routing logic.
 	backendIO, err := getBackendIO(cctx, auth, clientResp, 5*time.Second)
 	if err != nil {
-		return err
+		return WrapUserError(err, connectErrMsg)


This basically wraps all errors in getBackendIO to connectErrMsg.

While the underlying errors may contain messages from handshakeHandler.GetRouter(), errors.As will stop at the most outside error, i.e. the error wrapped this line. We should not wrap user errors multiple times: invoke errors.As onTiproxy wrapped -> backoff wrapped -> handler wrapped -> real msg will stop at Tiproxy wrapped.

I checked it in WrapUserError:

if ue, ok := err.(*UserError); ok { return ue }

It won't wrap multiple times.

What if WrapUserError(errors.Errorf("%w", WrapUserError())))? It can be a long chain since we can't decide how gateway use this function. Either we just don't export the function, or we just let them call, IMO.

If this is a problem, I can replace err.(*UserError) with errors.As.

djshow832 · 2023-01-12T09:41:41Z

I think we should not wrap so many errors, it is fine to wrap errors before L137/frontend cap negotiation with ErrBeforeClientResp:
if errors.Is(err, ErrBeforeClientResp) {
  // only log
} else if errors.As(err, &UserError) {
  // log and send custom error, or 'client cap', 'no instance'
} else if errors.Is(err, Deadline) {
  // log and send `timeout, bad network situation` 
} else {
  // log and send `cluster configuration/topo wrong, contact admin or check proxy log`, or whatever
}
In fact, I think only client cap negotiation, no instance, dial timeout and possible errors from gateway should be sent to user.

All other errors are likely mistakes of cluster configuration, which could be solved by auditing tiproxy log and fixing cluster topology. And on tidb cloud, that is impossible for users.

I didn't wrap errors before L137, so they will be logged currently.
I should report errors when parsing the HandshakeResp or verifying client capability fails, just like TiDB does. And the error code and messages should also be the same as TiDB. But I didn't, it will make this PR too long.
no instance and dial timeout are internal logic. There are similar errors like deadline exceeds. I wrapped them all together into one kind of error. The user should check the cluster in whichever case.

…error (pingcap#187)

write mysql error

0cefe52

djshow832 requested a review from xhebox January 11, 2023 08:07

djshow832 added 2 commits January 11, 2023 16:15

fix error format

79a6c69

fix test

c0539f8

xhebox reviewed Jan 11, 2023

View reviewed changes

lib/config/proxy.go Outdated Show resolved Hide resolved

djshow832 added 3 commits January 11, 2023 17:27

move WriteUnknownError to authenticator

db3b7c9

add UserError

e56382b

revert some code

1765d35

djshow832 requested a review from xhebox January 12, 2023 02:43

djshow832 added 6 commits January 12, 2023 11:09

add handshakeErrMsg

8118267

return handshake error

3f18602

fix write error

0dcafdb

add user error

a71f93a

fix TestNetworkError

5ffeb8e

add more tests

63554ae

xhebox reviewed Jan 12, 2023

View reviewed changes

refine getBackendIO

ca04f04

xhebox approved these changes Jan 16, 2023

View reviewed changes

xhebox merged commit 8288910 into pingcap:main Jan 16, 2023

djshow832 deleted the mysql_error branch January 16, 2023 07:18

xhebox pushed a commit to xhebox/TiProxy that referenced this pull request Mar 7, 2023

backend: send the error to the client when the handler encounters an …

6686c0f

…error (pingcap#187)

xhebox pushed a commit to xhebox/TiProxy that referenced this pull request Mar 13, 2023

backend: send the error to the client when the handler encounters an …

eeb0187

…error (pingcap#187)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backend: send the error to the client when the handler encounters an error #187

backend: send the error to the client when the handler encounters an error #187

djshow832 commented Jan 11, 2023 •

edited

Loading

xhebox left a comment

xhebox Jan 11, 2023

djshow832 Jan 11, 2023 •

edited

Loading

xhebox Jan 11, 2023

djshow832 Jan 11, 2023

xhebox left a comment

xhebox Jan 12, 2023

djshow832 Jan 12, 2023

xhebox Jan 12, 2023 •

edited

Loading

djshow832 Jan 12, 2023

xhebox Jan 12, 2023

djshow832 Jan 12, 2023

xhebox Jan 12, 2023

djshow832 Jan 12, 2023

xhebox Jan 12, 2023

djshow832 Jan 12, 2023

xhebox Jan 12, 2023

djshow832 Jan 12, 2023 •

edited

Loading

djshow832 Jan 12, 2023

xhebox Jan 12, 2023

djshow832 Jan 12, 2023 •

edited

Loading

xhebox Jan 12, 2023

djshow832 Jan 12, 2023

djshow832 commented Jan 12, 2023

backend: send the error to the client when the handler encounters an error #187

backend: send the error to the client when the handler encounters an error #187

Conversation

djshow832 commented Jan 11, 2023 • edited Loading

What problem does this PR solve?

Check List

Release note

xhebox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djshow832 Jan 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xhebox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xhebox Jan 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djshow832 Jan 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djshow832 Jan 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djshow832 commented Jan 12, 2023

djshow832 commented Jan 11, 2023 •

edited

Loading

djshow832 Jan 11, 2023 •

edited

Loading

xhebox Jan 12, 2023 •

edited

Loading

djshow832 Jan 12, 2023 •

edited

Loading

djshow832 Jan 12, 2023 •

edited

Loading