Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachpb: use EncodedError in Error #56603

Merged
merged 1 commit into from
Nov 13, 2020
Merged

Conversation

tbg
Copy link
Member

@tbg tbg commented Nov 12, 2020

roachpb: use EncodedError in Error

roachpb.Error has long been poor to work with. It could only transport
structured errors for a hard-coded enum message (ErrorDetail) and had
a complex and circuituous interaction with the standard library's
error type: pErr.GoError() would return an error that was
sometimes just pErr itself (hidden behind an identical type); other
times it was an UnhandledRetryableError wrapping pErr, and then
of course sometimes it would be the structured error from which pErr
was created in the first place.

Thanks to cockroachdb/errors, there's a much better option around,
namely errorspb.EncodedError which has powerful capabilities to
essentially carry any error across RPC boundaries (and notably does so
"out of the box" for protobuf-backed errors) while retaining its
original structure, as long as the error type is known on both sides of
the RPC.

The simplifications that result from this primitive would be enough to
motivate this change in itself, but additionally after this change we
will be able to reliably detect context cancellations, etc, without
relying on error-prone and insecure string matching.

Going into more detail, this commit makes the following changes:

  • adds an EncodedError to Error
  • deprecates the following fields on Error, which are henceforth
    deduced from the EncodedError where present:
    • transaction_restart
    • message
    • detail
      These fields are populated for the time being so that 20.2 nodes
      understand the errors emitted by 21.1 nodes. In 21.2, we can drop
      them all.
  • deprecates (*Error).SetDetail, which has only one real caller
    anyway (and that caller will be able to replace EncodedError instead
    once 21.2 comes around).
  • makes (*Error).SafeFormat redact properly based on EncodedError
    (yay!), though there's a larger TODO that's left for a follow-up
    change (around the fact that error details can depend on the
    surrounding Error for rendering, particularly relevant when SetTxn
    is called)

We can capitalize on EncodedError before the 21.2 cycle, but this
is left for future PRs.

Fixes #54939.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re the several initial commits:

if Error has Equal() codegen'ed for it, it needs to be present
on EncodedError as well, which it isn't.

It should! there's no obstacle against it. I would rather go that route and trim these commits. reflect.DeepEqual is fraught with peril.

Regarding the rest: I see you are keeping the roachpb.Error a protobuf message. This blocks you from embedding a Go error field inside. This also prevents you from implementing the Cause() / Unwrap() interface. That makes me sad somewhat. Is this really what we want?

I could see a world where the roachpb.Error message is renamed roachpb.ErrorProto, and roachpb.Error becomes a regular go struct, which embeds its details as an error, and only translate to/from roachpb.ErrorProto at RPC boundaries. What do you think?

Reviewed 7 of 7 files at r1, 3 of 3 files at r2, 2 of 2 files at r3, 2 of 2 files at r4, 14 of 14 files at r5, 6 of 6 files at r6, 9 of 9 files at r7, 1 of 1 files at r8, 16 of 17 files at r9.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/ccl/backupccl/backup_processor.go, line 227 at r9 (raw file):

				rawRes, pErr := kv.SendWrappedWith(ctx, flowCtx.Cfg.DB.NonTransactionalSender(), header, req)
				if pErr != nil {
					if _, ok := pErr.GetDetail().(*roachpb.WriteIntentError); ok {

errors.As?


pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 611 at r9 (raw file):

			ctx, abortedErr, roachpb.NormalUserPriority, tc.clock)
		return roachpb.NewError(roachpb.NewTransactionRetryWithProtoRefreshError(
			abortedErr.String(), tc.mu.txn.ID, newTxn))

are we comfortable embedding the string, and not the full error object?


pkg/roachpb/errors.go, line 334 at r9 (raw file):

	if e.EncodedError != (errorspb.EncodedError{}) {
		err := errors.DecodeError(context.Background(), e.EncodedError)
		if iface, ok := err.(transactionRestartError); ok {

errors.As?


pkg/sql/execinfra/processorsbase.go, line 670 at r9 (raw file):

				// TransactionRetryWithProtoRefreshErrors) don't have any uncertainty.
				if ure := (*roachpb.UnhandledRetryableError)(nil); errors.As(err, &ure) {
					if _, uncertain := ure.PErr.GetDetail().(*roachpb.ReadWithinUncertaintyIntervalError); uncertain {

errors.As?

@knz
Copy link
Contributor

knz commented Nov 12, 2020

cockroachdb/errors#57

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should! there's no obstacle against it. I would rather go that route and trim these commits. reflect.DeepEqual is fraught with peril.

I don't mind adding gogoproto.equal to errors, but I still want to remove all of this useless codegen. It actually makes me uneasy for quite some of the structs that someone should want to compare them. If a struct legit needs to be compared - sure, no problem, but I haven't found that to be true for any of the ones removed in that PR.

Reviewed 6 of 7 files at r1, 1 of 3 files at r2, 2 of 2 files at r4, 10 of 14 files at r5, 4 of 6 files at r6, 9 of 9 files at r7, 1 of 1 files at r8, 17 of 17 files at r9.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/ccl/backupccl/backup_processor.go, line 227 at r9 (raw file):

Previously, knz (kena) wrote…

errors.As?

I want to do a full pass against GetDetail() to make it return an error (instead of an ErrorDetail). Then your error linter will kick in and I'll fix the 5000 other instances of this that aren't captured in this diff. Not in this PR though.


pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 611 at r9 (raw file):

Previously, knz (kena) wrote…

are we comfortable embedding the string, and not the full error object?

What do you mean? The message is identical to String() at the time of writing. I don't like that we're stuffing it in that way, but it's the old behavior and I don't want to solve all of the paper cuts in this PR. Note that I left a TODO on NewTransactionRetryWithProtoRefreshError about this, curious what you think of it.


pkg/roachpb/errors.go, line 334 at r9 (raw file):

Previously, knz (kena) wrote…

errors.As?

Done.


pkg/sql/execinfra/processorsbase.go, line 670 at r9 (raw file):

Previously, knz (kena) wrote…

errors.As?

See other comment, keeping it mechanical for now

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)


pkg/kv/kvserver/store_send.go, line 247 at r10 (raw file):

			}
			t.AppendRangeInfo(ctx, desc, l)
			// We have to write `t` back to `pErr` so that it picks up the changes.

This was made obvious to me through a test failure. I will need to run a full audit (yikes) before merging.
Essentially, we were mutating something returned from GetDetail() and expecting that to affect the backing pErr. But now with EncodedError GetDetail() basically makes a fresh copy for you. Which - maybe that is bad and we'll want to look into it for perf reasons, but I still think that pattern remains bad and we need to be explicit about mutating the original pErr through it and not through some handle into its innards.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't mind adding gogoproto.equal to errors, but I still want to remove all of this useless codegen

Discussed offline - yes, agreed. However I think we should then have some rule that reflect.DeepEqual should not be used throughout production code, just in tests.

LGTM otherwise - modulo completion of your investigation about interior mutability of the error detail.

Reviewed 5 of 5 files at r10.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 611 at r9 (raw file):

Previously, tbg (Tobias Grieger) wrote…

What do you mean? The message is identical to String() at the time of writing. I don't like that we're stuffing it in that way, but it's the old behavior and I don't want to solve all of the paper cuts in this PR. Note that I left a TODO on NewTransactionRetryWithProtoRefreshError about this, curious what you think of it.

Understood thanks.


pkg/kv/kvserver/store_send.go, line 247 at r10 (raw file):

I still think that pattern remains bad and we need to be explicit about mutating the original pErr through it and not through some handle into its innards.

Agreed. Thanks for figuring it out.

@tbg tbg requested a review from knz November 12, 2020 16:23
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline - yes, agreed. However I think we should then have some rule that reflect.DeepEqual should not be used throughout production code, just in tests.

I am not opposed to that, though it's somewhat orthogonal here. Even if .Equal is defined, someone might use DeepEqual in production.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/kv/kvserver/store_send.go, line 247 at r10 (raw file):

Previously, knz (kena) wrote…

I still think that pattern remains bad and we need to be explicit about mutating the original pErr through it and not through some handle into its innards.

Agreed. Thanks for figuring it out.

I did do the audit (at least for the ~40 or so prod callers) and this was the one bad apple.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not opposed to that, though it's somewhat orthogonal here

ok works for me!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the rest: I see you are keeping the roachpb.Error a protobuf message. This blocks you from embedding a Go error field inside. This also prevents you from implementing the Cause() / Unwrap() interface. That makes me sad somewhat. Is this really what we want?

I haven't thought this far ahead, but generally speaking I would like that. A separation at the RPC boundaries makes sense and by having a bare struct everywhere else we can avoid the repeated calls to DecodeError that the current implementation incurs. I'll save it for another PR, though. Also, I might be missing something, but I don't see why we can't implement Cause and Unwrap as is.

I've also started thinking about pulling the protos apart a little bit. Ideally, you would have a kvpb package that has all the possible structured errors that kv can emit. I think there is a big migration snag here though, in that the current errors would need to be kept in two locations for the time being, which I don't love. Given that and the limited mandate for making something happen here, I'll probably stop short of that, especially since a lot of actual code cleanup remains to be done. But I wanted to dream at least for a second!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we can't implement Cause and Unwrap as is.

The only way to implement them in your current framework would be to re-instantiate an error every time they are called, via Decode. This is rather expensive - these functions are called all over the place in error predicates.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)

@tbg tbg force-pushed the roachpb-error-redo branch 3 times, most recently from 067254f to 740cd22 Compare November 13, 2020 10:49
@tbg tbg marked this pull request as ready for review November 13, 2020 11:12
@tbg tbg requested a review from a team November 13, 2020 11:12
@tbg tbg requested a review from a team as a code owner November 13, 2020 11:12
@tbg tbg requested review from pbardea and removed request for a team November 13, 2020 11:12
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 14 files at r11.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pbardea and @tbg)

`roachpb.Error` has long been poor to work with. It could only transport
structured errors for a hard-coded enum message (`ErrorDetail`) and had
a complex and circuituous interaction with the standard library's
`error` type: `pErr.GoError()` would return an `error` that was
sometimes just `pErr` itself (hidden behind an identical type); other
times it was an `UnhandledRetryableError` wrapping `pErr`, and then
of course sometimes it would be the structured error from which `pErr`
was created in the first place.

Thanks to `cockroachdb/errors`, there's a much better option around,
namely `errorspb.EncodedError` which has powerful capabilities to
essentially carry any error across RPC boundaries (and notably does so
"out of the box" for protobuf-backed errors) while retaining its
original structure, as long as the error type is known on both sides of
the RPC.

The simplifications that result from this primitive would be enough to
motivate this change in itself, but additionally after this change we
will be able to reliably detect context cancellations, etc, without
relying on error-prone and insecure string matching.

Going into more detail, this commit makes the following changes:

- adds an `EncodedError` to `Error`
- deprecates the following fields on `Error`, which are henceforth
  deduced from the `EncodedError` where present:
  - `transaction_restart`
  - `message`
  - `detail`
  These fields are populated for the time being so that 20.2 nodes
  understand the errors emitted by 21.1 nodes. In 21.2, we can drop
  them all.
- deprecates `(*Error).SetDetail`, which has only one real caller
  anyway (and that caller will be able to replace EncodedError instead
  once 21.2 comes around).
- makes `(*Error).SafeFormat` redact properly based on `EncodedError`
  (yay!), though there's a larger TODO that's left for a follow-up
  change (around the fact that error details can depend on the
  surrounding Error for rendering, particularly relevant when SetTxn
  is called)

We can capitalize on EncodedError before the 21.2 cycle, but this
is left for future PRs.

Fixes cockroachdb#54939.

Release note: None
@tbg
Copy link
Member Author

tbg commented Nov 13, 2020

bors r=knz

@craig
Copy link
Contributor

craig bot commented Nov 13, 2020

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Nov 13, 2020

Build succeeded:

@craig craig bot merged commit e7a5761 into cockroachdb:master Nov 13, 2020
@tbg tbg deleted the roachpb-error-redo branch November 13, 2020 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachpb: use cockroachdb/errors.EncodedError inside of internalError
3 participants