Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For discussion: rework error plumbing when proto.Call is involved #1403

Closed
tamird opened this issue Jun 17, 2015 · 16 comments
Closed

For discussion: rework error plumbing when proto.Call is involved #1403

tamird opened this issue Jun 17, 2015 · 16 comments
Assignees
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.

Comments

@tamird
Copy link
Contributor

tamird commented Jun 17, 2015

We currently have an inconsistent error handling story where proto.Response is concerned (usually via proto.Call). Some grep of all the functions which handle these arguments (excluding tests) is below. Note that many of these have an explicit error return.

$ git grep -E 'proto\.(Call|Response)' | grep func | grep -v _test
client/batch.go:func (b *Batch) InternalAddCall(call proto.Call) {
client/db.go:func (db *DB) send(calls ...proto.Call) (err error) {
client/http_sender.go:func (s *httpSender) Send(_ context.Context, call proto.Call) {
client/http_sender.go:func (s *httpSender) post(call proto.Call) error {
client/rpc/rpc_sender.go:func (s *Sender) Send(_ context.Context, call proto.Call) {
client/sender.go:type SenderFunc func(context.Context, proto.Call)
client/sender.go:func (f SenderFunc) Send(ctx context.Context, c proto.Call) {
client/txn.go:func (ts *txnSender) Send(ctx context.Context, call proto.Call) {
client/txn.go:func (txn *Txn) send(calls ...proto.Call) error {
client/txn.go:func (txn *Txn) updateState(calls []proto.Call) {
kv/db.go:func createArgsAndReply(method string) (proto.Request, proto.Response) {
kv/db.go:func (s *rpcDBServer) executeCmd(args proto.Request, reply proto.Response) error {
kv/dist_sender.go:func (ds *DistSender) sendAttempt(desc *proto.RangeDescriptor, call proto.Call) (retry.Status, error) {
kv/dist_sender.go:func (ds *DistSender) getDescriptors(call proto.Call) (*proto.RangeDescriptor, *proto.RangeDescriptor, error) {
kv/dist_sender.go:func (ds *DistSender) Send(_ context.Context, call proto.Call) {
kv/local_sender.go:func (ls *LocalSender) Send(ctx context.Context, call proto.Call) {
kv/txn_coord_sender.go:func (tc *TxnCoordSender) Send(_ context.Context, call proto.Call) {
kv/txn_coord_sender.go:func (tc *TxnCoordSender) sendOne(call proto.Call) {
kv/txn_coord_sender.go:func (tc *TxnCoordSender) updateResponseTxn(argsHeader *proto.RequestHeader, replyHeader *proto.ResponseHeader) {
server/node.go:func (n *nodeServer) executeCmd(args proto.Request, reply proto.Response) error {
server/status/feed.go:func (nef NodeEventFeed) CallComplete(args proto.Request, reply proto.Response) {
storage/range.go:var TestingCommandFilter func(proto.Request, proto.Response) bool
storage/range.go:func (r *Range) AddCmd(ctx context.Context, call proto.Call, wait bool) error {
storage/range.go:func (r *Range) addAdminCmd(ctx context.Context, args proto.Request, reply proto.Response) error {
storage/range.go:func (r *Range) addReadOnlyCmd(ctx context.Context, args proto.Request, reply proto.Response) error {
storage/range.go:func (r *Range) addWriteCmd(ctx context.Context, args proto.Request, reply proto.Response, wait bool) error {
storage/range.go:func (r *Range) proposeRaftCommand(ctx context.Context, args proto.Request, reply proto.Response) (<-chan error, *pendingCmd) {
storage/range.go:func (r *Range) applyRaftCommand(ctx context.Context, index uint64, originNodeID proto.RaftNodeID, args proto.Request, reply proto.Response) error {
storage/range_command.go:func (r *Range) executeCmd(batch engine.Engine, ms *proto.MVCCStats, args proto.Request, reply proto.Response) ([]proto.Intent, error) {
storage/response_cache.go:func (rc *ResponseCache) GetResponse(cmdID proto.ClientCmdID, reply proto.Response) (bool, error) {
storage/response_cache.go:func (rc *ResponseCache) PutResponse(cmdID proto.ClientCmdID, reply proto.Response) error {
storage/response_cache.go:func (rc *ResponseCache) shouldCacheResponse(reply proto.Response) bool {
storage/store.go:func (s *Store) ExecuteCmd(ctx context.Context, call proto.Call) error {

Some methods return the error outright, some set it on the proto.Response directly, and some do a mix of the two. We should resolve this inconsistency because it leads to confusing code and a very high probability of subtle bugs. There are two approaches we could take:

standardize on ResponseHeader.SetGoError()

This requires that we remove error returns from all methods that operate on proto.Responses. In addition, it will mean that we will need to remember to check .GoError() each time we call such a method. This is fragile, and we will have no tool support.

standardize on error returns

This requires that we create a new structure to be obtained by unwrapping the proto.Response which will not be able to carry an error. All methods which currently operate on proto.Response (with the exception of those that live at process boundaries) will need to change to return explicit errors rather than attaching errors directly to their proto.Response argument.

This option is my preference. It provides for tooling support via errcheck and is more idiomatic. That being said, the floor is open. Please 🚲 🏡 !

@tbg
Copy link
Member

tbg commented Jun 17, 2015

I don't think the first option is really there because it means that any error must be a proto error, on top of its other drawbacks.

In the second option, proto.Response will still carry an error (after all, the client needs to get an error sometimes) but we can make it a point to only set it in one (or a few) consistent places just before we're done with the request and I think it's a good idea to try to streamline error handling that way.

@bdarnell
Copy link
Contributor

What do you mean by "unwrapping the proto.Response"? Currently a Response contains the response header (and error).

One other consideration is batches: in a batch, each response carries its own error (and the batch has a whole has an error which is defined to be the first non-nil error in the batch). This is more difficult to express with returned errors than with errors in the header (although maybe batches can be treated as something like a transport boundary)

@tamird
Copy link
Contributor Author

tamird commented Jun 17, 2015

What do you mean by "unwrapping the proto.Response"? Currently a Response contains the response header (and error).

I meant that the proto.Response is the wrapper, so we unwrap it to recover the headerless message, operate on that within the process, and then write the error to the header at the process boundary.

One other consideration is batches: in a batch, each response carries its own error (and the batch has a whole has an error which is defined to be the first non-nil error in the batch). This is more difficult to express with returned errors than with errors in the header (although maybe batches can be treated as something like a transport boundary)

I think this is orthogonal. We would treat each request/response in the batch as described, and do the batch special magic in a central place (process boundary anyway).

@bdarnell
Copy link
Contributor

But proto.Response is not currently a wrapper; it's an interface implemented by all the concrete response type. We don't currently have a type corresponding to the headerless message. Are you proposing a refactoring where proto.Response is no longer an interface but a concrete message containing a ResponseHeader and a byte array (representing an encoded headerless response of an unspecified type)?

@tamird
Copy link
Contributor Author

tamird commented Jun 17, 2015

The end result should be that we can decompose the equivalent of today's proto.Response to the header and the "rest". I'm not ready to suggest a specific implementation yet.

@tbg
Copy link
Member

tbg commented Jun 17, 2015

Could simply have split Response into interfaces ResponseWithoutHeader and Header (depending on what's needed, and named more aptly) and pass that down to calls. Sure, one could type-assert back but who would.

@bdarnell
Copy link
Contributor

ResponseWithoutHeader is just a gogoproto.Message which is pretty useless on its own; type-asserting it to a concrete type (PutResponse, etc) is the common case.

@tamird
Copy link
Contributor Author

tamird commented Jun 24, 2015

@bdarnell I think the point of ResponseWithoutHeader is to be a "marker" interface, so that the list of implementers is suitably short. We can probably just do this by requiring a Method method.

@bdarnell
Copy link
Contributor

Yes, but how does that marker interface help? Any method that takes a ResponseWithoutHeader is going to convert it to a concrete type like PutResponse which has no protections against setting the error in the header.

@tamird
Copy link
Contributor Author

tamird commented Jun 24, 2015

I was thinking that Response would become a concrete struct containing ResponseWithoutHeader and Header, where each of {Get,Put,...}Response only implement ResponseWithoutHeader.

@bdarnell
Copy link
Contributor

So you would change the protos themselves so that PutResponse and friends no longer contain a header field?

@tamird
Copy link
Contributor Author

tamird commented Jun 24, 2015

Yeah, I think that's the only way to get the properties described here

@vivekmenezes
Copy link
Contributor

I think the proposal is to have Response contain a Header, and a oneof with all the different types, where all the different types themselves have a KVHeader that contains the keys, etc and not the error. That's a pretty big change.

Another idea is to just enforce that error is not set at the boundary in our code. Not enforced by the language but an easier change to make.

@tamird tamird self-assigned this Jul 22, 2015
@tamird tamird added the C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. label Jul 22, 2015
@tamird
Copy link
Contributor Author

tamird commented Jul 30, 2015

Update on this: the only remaining non-conformances are Sender.Send and most of the Batch API. On the Sender side there is an open question: what do we want the return values to be? We may inherently have both transmission and application errors arising from these methods; do we want to distinguish the two? If an error occurs, do we still want the reply returned?

$ git grep -F 'proto.Call' | grep -F func | grep -vF _test
client/batch.go:func (b *Batch) InternalAddCall(call proto.Call) {
client/db.go:func (db *DB) send(calls ...proto.Call) (err error) {
client/http_sender.go:func (s *httpSender) Send(_ context.Context, call proto.Call) {
client/rpc/rpc_sender.go:func (s *Sender) Send(_ context.Context, call proto.Call) {
client/sender.go:type SenderFunc func(context.Context, proto.Call)
client/sender.go:func (f SenderFunc) Send(ctx context.Context, c proto.Call) {
client/txn.go:func (ts *txnSender) Send(ctx context.Context, call proto.Call) {
client/txn.go:func endTxnCall(commit bool) proto.Call {
client/txn.go:func (txn *Txn) send(calls ...proto.Call) error {
client/txn.go:func (txn *Txn) updateState(calls []proto.Call) {
kv/dist_sender.go:func (ds *DistSender) getDescriptors(call proto.Call) (*proto.RangeDescriptor, *proto.RangeDescriptor, error) {
kv/dist_sender.go:func (ds *DistSender) Send(ctx context.Context, call proto.Call) {
kv/local_sender.go:func (ls *LocalSender) Send(ctx context.Context, call proto.Call) {
kv/txn_coord_sender.go:func (tc *TxnCoordSender) Send(ctx context.Context, call proto.Call) {
kv/txn_coord_sender.go:func (tc *TxnCoordSender) sendOne(ctx context.Context, call proto.Call) {

@tamird tamird removed their assignment Sep 2, 2015
@tamird
Copy link
Contributor Author

tamird commented Sep 24, 2015

@tschottdorf we might want to extend this issue to the removal of errors from non-batch responses; what do you think?

@tbg
Copy link
Member

tbg commented Sep 24, 2015

Yes, I think that's a good idea.
On Sep 24, 2015 20:13, "Tamir Duberstein" notifications@github.com wrote:

@tschottdorf https://github.com/tschottdorf we might want to extend
this issue to the removal of errors from non-batch responses; what do you
think?


Reply to this email directly or view it on GitHub
#1403 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.
Projects
None yet
Development

No branches or pull requests

4 participants