Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context: add WithoutCancel #40221

Closed
CAFxX opened this issue Jul 15, 2020 · 58 comments
Closed

context: add WithoutCancel #40221

CAFxX opened this issue Jul 15, 2020 · 58 comments

Comments

@CAFxX
Copy link
Contributor

CAFxX commented Jul 15, 2020

What

I propose to add a WithoutCancel function in context that, given a parent Context returns a new child Context with the same values of the parent, but that is not canceled when the parent is canceled. If needed, although it seems not entirely warranted given the widespread use of similar functions (see section below), this new function could initially live in x/net/context and be migrated to context later.

In addition to the above, we should clarify a point that is only laid out implicitly in the context API docs, i.e., that values attached to the Context can also be used after the Context has been canceled.

https://go.dev/cl/459016 contains the proposed implementation.

Why

This is useful in multiple frequently recurring and important scenarios:

  • handling of rollback/cleanup operations in the context of an event (e.g., HTTP request) that has to continue regardless of whether the triggering event is canceled (e.g., due to timeout or the client going away)
  • handling of long-running operations triggered by an event (e.g., HTTP request) that terminates before the termination of the long-running operation

This is doable today by not propagating the triggering event's context and replacing it instead with a new context obtained from context.Background(). This is problematic though, as the new context does not contain any of the values attached to the context of the triggering event, and these values are important to e.g., ensure correct authentication/logging/tracing/error recovery functionality (a common scenario when using a middleware-based approach to request/event handling).

As noted below with @davecheney, a nice consequence that naturally falls out of this approach is that it effectively turns cancellation into a regular context value that can be overridden in children contexts. It doesn't solve the problem of cancellation being conflated with the intent of context being just a "bag of values" (that would almost certainly require breaking changes to solve) but it's an effective step into alleviating the situation, and it's backward compatible.

As noted further below with @martisch the benefit of this approach is that it's pretty much as minimal, composable, and in line with the current design of the context package as possible, requiring a single new public API and eschewing conflating additional mechanisms (goroutines).

An important point to be made is that all existing implementations of this (see below) rely on an internal/undocumented guarantee of cancelCtx. If this proposal is shot down at least that guarantee should be explicitly documented in the exported API.

Existing implementations

Looking around it is possible to find multiple reimplementations of this proposal, almost identical but with different names. I'm not advocating for a specific name here.

Implementations are trivial and would add a single public function to context.

@gopherbot gopherbot added this to the Proposal milestone Jul 15, 2020
@ianlancetaylor ianlancetaylor changed the title Proposal: context: add a way to detach from parent cancellation proposal: context: add a way to detach from parent cancellation Jul 15, 2020
@davecheney
Copy link
Contributor

I can't help but feel that this request is a direct by product of context's conflation of cancellation and a skiplist of values. I agree with the rationale for this, but also feel that this is pushing the use case of context as a bag of values beyond its intention.

@CAFxX
Copy link
Contributor Author

CAFxX commented Jul 15, 2020

I can't help but feel that this request is a direct by product of context's conflation of cancellation and a skiplist of values.

Yes, without said conflation this proposal wouldn't be needed.

I agree with the rationale for this, but also feel that this is pushing the use case of context as a bag of values beyond its intention.

While my primary intent here (that is much more immediate) wasn't to move that discussion forward... in a way, this proposal could also be seen as a first step in sidestepping (undoing?) the conflation, since it effectively turns cancellation into a regular copy-on-write value like all other values carried by the Context. It doesn't magically solve all the other problems with cancellation, but it's nevertheless a step in the direction of alleviating the problems with conflation.

@martisch
Copy link
Contributor

martisch commented Jul 15, 2020

I also have seen a detach package before with a function FromContext that takes a parent context and creates a new context without propagation of the cancelation deadline and returns a new context that keeps all the values of the parent.

It might however be better to encapsulate the specifc use case of detached background tasks by having a Task struct and a Go(ctx, f(ctx)) Task method that takes a function f to execute in a new goroutine. That Task struct can then offer methods itself e.g. useful for canceling and checking if the task is finished. The Go function can have an internal mechanisms with registration or an additional parameter to decide what values to expose in the detached context.

This might also avoid adding more exposed API to context itself.

@CAFxX
Copy link
Contributor Author

CAFxX commented Jul 15, 2020

@martisch i wouldn't add goroutines to the mix. There are legitimate use cases (rollback/cleanup operations that can run in a defer or inside an if err != nil {} being an example) where they are not needed.

Also, your counterproposal seem to be way more complex in terms of number of new exposed APIs (one new package, one new struct/interface and at least one new method) compared to the one proposed here. You're right though it wouldn't need to be in context, although I question the intuitiveness of that (since all packages in the wild are basically called some close variation of context, I have to assume that's where people do expect such a function to live).

To be 100% clear though, I completely agree that the current design of context is less than ideal due to the conflation mentioned above. And it would be ideal if a new context design solved that.

But, as mentioned above, my intent here was not really to propose a new design for context: it's just to address an omission that fails to cover important use cases, within the framework of the current context design. To that end, creating a new/separated "task" package with a design that seem to differ significantly from that of context could lead to a significantly less intuitive and discoverable mechanism (but maybe this is just because I misunderstood your idea... in which case an example/sketch of what you have in mind may help clarify that).

It's worth noting that a new context design that solved the conflation would naturally address the scenarios mentioned in the proposal. Therefore, even if this proposal was accepted, it would not pose additional issues in case users had to migrate to the new design. As such, there's not much for the argument that these scenarios must wait for a context redesign to be addressed.

@navytux
Copy link
Contributor

navytux commented Jul 17, 2020

Directly related:

#19643
#36503 (comment)

@bcmills
Copy link
Contributor

bcmills commented Jul 17, 2020

Note that the detach package mentioned by @martisch is essentially the same API as was proposed in #19643 (thanks @navytux for the cross-reference!). See #19643 (comment) in particular for some discussion of why that API is needed.

To summarize: values stored in a Context, such as logging or tracing entries, may be scoped to a specific call or operation and flushed or finalized at the end of that call or operation. The API for detaching from such a Context should provide a means to extend or annotate such a value while it is still valid.

@CAFxX
Copy link
Contributor Author

CAFxX commented Jul 17, 2020

values stored in a Context, such as logging or tracing entries, may be scoped to a specific call or operation and flushed or finalized at the end of that call or operation

Do we have any actual example of this need? I am thinking at all middlewares we are using and can't think of any that would need that (including tracing and logging).

Furthermore, even if it was really needed, middleware (again, thinking mostly of logging and tracing) will normally already happily let you override the current values (e.g. a logger or trace/span), so I'm not really convinced about the need for a specific, more complicated to cover it. Maybe a concrete example may help understand.

@bcmills
Copy link
Contributor

bcmills commented Jul 17, 2020

Do we have any actual example of this need?

...and presumably others: anything with a local buffer of logs or traces, or a local tree of spans or regions, that needs the buffer or parts of the tree to be flushed to a disk or a remote log server when they are “complete”.

@bcmills
Copy link
Contributor

bcmills commented Jul 17, 2020

Furthermore, even if it was really needed, middleware (again, thinking mostly of logging and tracing) will normally already happily let you override the current values (e.g. a logger or trace/span)

You can only override values that you know about. The point of Context.Value is that most parts of the program don't need to know what's in it. You should be able to use a library without telling that library in particular about which (if any) logging and/or tracing libraries the rest of your program (above and below it) happens to be using.

In practice, that means one of two things:

  1. Synchronous functions and operations, as (potentially) in the case of your “rollback/cleanup operations” use-case. But those should generally be agnostic to cancellation anyway: either they cannot fail or block (in which case there is nothing to cancel), or they must be robust to failure to clean up (in which case why bother cleaning up at all?), or they leak arbitrary resources (in which case we are back to “failure to clean up” by way of an OOM signal).

    • “Detach” would be a misnomer for such an API, because it is not actually detaching from the parent context or the parent operation — rather, it is ignoring cancellation for a task that is still logically part of and scoped within the same operation.
  2. Some global, standardized notion of “the [non-]copyable parts of a context.Context” that decouples ”a library spawning an asynchronous task” from “the set of scoped logging systems in use in the program”.

@CAFxX
Copy link
Contributor Author

CAFxX commented Jul 20, 2020

Do we have any actual example of this need?

...and presumably others: anything with a local buffer of logs or traces, or a local tree of spans or regions, that needs the buffer or parts of the tree to be flushed to a disk or a remote log server when they are “complete”.

Thanks for the list. I am familiar with some, but not all, of these... can you help me understand exactly what you wouldn't be able to do with the current proposal?

My current impression is that flushing a logger when the triggering request completes is not a very convincing example, as whatever middleware added the logger to the context could trivially flush the logger when the request terminates; similarly the asynchronous task knows which logger it is using, so it can flush it when the task completes (I am setting aside for a second the fact that flushing logs at every request is probably not such a common requirement; all loggers and tracers I worked with normally flush based on time or buffer size, and at process shutdown).

The other use-case I read somewhere is IIUC ensuring that values attached to the context that are needed before cancellation but not after are not retained while the derived child context stays alive. I must say I never faced this problem... maybe some concrete examples of where this was needed could help me understand better.

“Detach” would be a misnomer for such an API, because it is not actually detaching from the parent context or the parent operation — rather, it is ignoring cancellation for a task that is still logically part of and scoped within the same operation.

Clarified I did not specifically argue for a specific name, thanks! (our internal impl is not called Detach, just to give context)

You can only override values that you know about. The point of Context.Value is that most parts of the program don't need to know what's in it. You should be able to use a library without telling that library in particular about which (if any) logging and/or tracing libraries the rest of your program (above and below it) happens to be using.

Doesn't this apply in reverse to #19643 as well? The code that registers a value as "preservable" should be aware of all possible uses of a certain context value.

Also, in that design, what would happen if different code paths were to require different values to be preserved?

Synchronous functions and operations, as (potentially) in the case of your “rollback/cleanup operations” use-case. But those should generally be agnostic to cancellation anyway: either they cannot fail or block (in which case there is nothing to cancel), or they must be robust to failure to clean up (in which case why bother cleaning up at all?), or they leak arbitrary resources (in which case we are back to “failure to clean up” by way of an OOM signal).

I'm not sure about the point here; the use case I'm trying to cover is the non-synchronous one (w.r.t. parent cancellation; if the parent has already been cancelled because the client connection broke there's no real difference between running in a separate goroutine or not)

Some global, standardized notion of “the [non-]copyable parts of a context.Context” that decouples ”a library spawning an asynchronous task” from “the set of scoped logging systems in use in the program”.

I understand that there's a pre-existing design in #19643, but I think it's worth noting that it was declined because of lack of evidence it was the correct one. Has this changed?

@bcmills
Copy link
Contributor

bcmills commented Jul 20, 2020

Thanks for the list. I am familiar with some, but not all, of these... can you help me understand exactly what you wouldn't be able to do with the current proposal?

Depends on the library, but in general you end up with some form of use-after-Close bug when you invoke some library that adds a logging or trace span during an asynchronous call. That could manifest as a silent failure to log something you intended to log, or an explicit error return from a call you expected to always succeed, or in extreme cases a run-time panic.

@bcmills
Copy link
Contributor

bcmills commented Jul 20, 2020

Doesn't this apply in reverse to #19643 as well? The code that registers a value as "preservable" should be aware of all possible uses of a certain context value.

Also, in that design, what would happen if different code paths were to require different values to be preserved?

Different code paths cannot require different values. (Values should be preserved by intersecting the set of preservable keys with the set of keys found in the parent context.)

The preservation semantics are based on the meaning of the context value, not any detail about its usage, so the “preserver” code is independent of the point of use. For example, if you have a tracing library, “detaching” for an asynchronous operation should be an event in the parent trace that results in its own (related but distinct) tracing span: that is the only coherent way to “preserve” continuity of the trace, and it should be correct to do any time a context is used asynchronously.

@bcmills
Copy link
Contributor

bcmills commented Jul 20, 2020

I understand that there's a pre-existing design in #19643, but I think it's worth noting that it was declined because of lack of evidence it was the correct one. Has this changed?

I think we have pretty solid evidence that using a context asynchronously is incorrect, or at least error-prone, but I'm not sure that we have any more evidence than we did about whether the API proposed in #19643 is ideal.

@bsiegert
Copy link
Contributor

FWIW, speaking for the Frameworks Go team within Google, having a way to detach from the parent context would be enormously beneficial for us. We want the trace context, request IDs and some other values preserved in all work that is started from a request handler, or from the framework-specific Task primitive.

In fact, we have static analysis that flags the use of context.Background in this kind of handler, and it keeps tripping up on open-source code that the internal stuff depends on.

@CAFxX
Copy link
Contributor Author

CAFxX commented Feb 4, 2021

I think we have pretty solid evidence that using a context asynchronously is incorrect, or at least error-prone

To advance the conversation it would be ideal to have the evidence about being "error-prone" be available for discussion.

As for incorrect, I am not sure where this may come from, as I see nothing in the Context contract that seem to suggest that. If anything, it seems to (indirectly) suggest the opposite:

Use context Values only for request-scoped data that transits processes and APIs, not for passing optional parameters to functions.

The same Context may be passed to functions running in different goroutines; Contexts are safe for simultaneous use by multiple goroutines.

I'm assuming that "request-scoped" can't be interpreted to mean that as soon as the request is "over" the values in the Context can't be used anymore: otherwise, it would be illegal to e.g. use a logger attached to a context as soon as the context is canceled (since your code may be running in a different goroutine, and the response may already have been fully completed by the time your code attempts to use said logger). That interpretation would basically make the use of any value inherently racy since there would be no way for a user to ensure that a value remains valid until they are done with it.

Depends on the library, but in general you end up with some form of use-after-Close bug when you invoke some library that adds a logging or trace span during an asynchronous call. That could manifest as a silent failure to log something you intended to log, or an explicit error return from a call you expected to always succeed, or in extreme cases a run-time panic.

If we were to agree on the previous point, then this would be an issue with the specific library - as it's relying on unspecified behavior, no? True, if you have already flushed something it's unrealistic to pretend to unflush it so in some cases you won't be able to satisfy the call (I don't really consider the other two cases, error and panic, as valid; in the former, it would be on the user that ignores the returned error, in the latter it would be a bug in the library for relying on unspecified behavior).

But, crucially, this is not something that is allowed today because anyway all values are lost when people are forced to use context.Background() to achieve similar "detach semantics" - so it's not like we would be breaking existing code. Only new code that were to use the proposed Detach would be affected.

That being said, I agree it we may at least consider the possibility of having values attached to a context opt-in to (or opt-out of) propagation to a detached context (e.g. via an optional interface on the context key? or with yet another value in the context?)

@matttproud
Copy link
Contributor

matttproud commented Feb 26, 2021

evidence about being "error-prone" be available for discussion.

I have not seen a particularly cogent demonstration of these to date (at least in defect reports), which is not to say they could exist. From my own experience, however, I have seen many cases of folks attempting to "work around" cancellation and simulate detachment by using context.Background inappropriately (in non-program root contexts) and create other errors that are easier to induce in production. Such errors are particularly poignant with loss of a comprehensible cancellation or shutdown semantics. Those problems are material and concrete today.

But generally speaking even the loss of context metadata for (not exclusively for but rather citing it because it is concrete) tracing is particularly bad when using context.Background in non-root circumstances, and each metadata-setting library needs to provide a way to let a user re-attach the data as needed, and I have not seem a holistic solution beyond our in-house detachment APIs.

func (b *BankService) TransferMoney(ctx context.Context, in *transferpb.Request) (out *transferpb.Response, error) {
  // Validate request: sufficient funds, source and destination accounts are correct.
  
  // Low latency and internally batched for later if failures arise.
  err := b.moveMoney(ctx, in.GetSourceIBAN(), in.GetDestIBAN(), in.GetDate(), in.GetAmount())
  if err != nil {
    return nil, err
  }

  // Keep data store tidy as a way to minimize transfer kiting (https://en.wikipedia.org/wiki/Check_kiting).
  // 
  // It is slow and requires a partial data set audit and compaction of records and should not be done
  // directly in the critical path.  Execution is best-effort, so we do not care about errors.  If it fails,
  // periodic data store reconciliation/cleansing efforts find, report, and act on problems.  This is merely
  // to provide a more realistic view of account standing.
  //
  // The operation itself is derived from the request TransferMoney receives, so it should still receive its
  // authorization credentials that were attached to the ctx as well as any trace spans for purposes of
  // operational forensics.
  //
  // But because the data store reconciliation can be slow and is run at batch priority (don't want to starve)
  // user-interactive requests, its execution often outlives the remaining bits of TransferMoney.
  if in.GetAmount() > outstandingLimit {
    // Dilemma:
    // 1. use context.Background() and lose any sense of metadata and have to do a bespoke re-attachment dance for
    //     all relevant data
    // 2. use local ctx and run the risk of compaction not running to completion
    go b.compactRecords(ctx, in.GetSourceIBAN(), in.GetDestIBAN())
  }

  // Skim the millicents from transfers involving Initech clients.
}

In short, I am a proponent of either an API in the /x/ hierarchy or something that is a first-class companion of package context that enables formal detachment capability. A formal API to do this is a critical part of a well-lit path of the user journey to service these workloads. On quick survey of our product's users, I estimate we have about several hundred uses for long-running operation (LRO) alone.

request-scoped data

I am glad you raised that excerpt. This has been something I have wanted to correct in the documentation, at least propose. In short, that it is an overspecification. My bread and butter are servers, so you'll see that bias what I write. Typical servers deal with two scopes (possibly three):

  • server-scoped: bound to the lifetime of the server itself. This is relevant for setting up the domain types used by the server for its lifetime (e.g., open up a well-known database to service user queries).

  • request-scoped: bound to individual requests in HTTP or RPC handlers or similar.

  • less common ones, that may be derived from either server- or request-scoped contexts:

    • session scoped: oriented around a persistent user connection over multiple operations
    • pipeline scoped: a long set of batch operation that may outlive the initiating request

Contexts remain valid for all of these situations, and the documentation indicating otherwise does it a disservice.

@kylelemons
Copy link
Contributor

kylelemons commented Nov 13, 2021

I'd like to raise another use-case for detaching a context from cancellation and most values that I don't think has been raised yet, and that would be security context -- authentication, for example. As a framework author, I very explicitly do not want users to have to think about this, but code running under the scope of the context should carry and propagate this information, even if they need to do an opportunistic roll-back when a context is cancelled.

To be more concrete with the example: an RPC services a user request. The RPC framework translates the user closing the connection into context cancellation. The RPC framework propagates security credentials (incoming and outgoing) to/from out of band RPC metadata and internally via context. To service the request, a transaction is opened, and a rollback operation (which is another RPC, and thus requires a context) is deferred. The backend server in question enforces credentials from both our server (available to all contexts generated by the framework) and the calling user (available within the request handler). Not every language has the cancellation semantics Go has, so the API presumes you will still send these when rolling back, as opposed to giving you a token or something that can be used for rollback without the out of band RPC auth metadata.

Now, in Go, I've seen multiple approaches to handling this:

  • The first is to try to subtract and leave some time within the context deadline to complete cleanup tasks. This doesn't work in the face of cancellation, only deadline timeout.
  • context.Background, which for this specific case doesn't work, because it "drops privileges".
  • Passing through all values ala x/tools/internal/xdetach will probably work as long as middleware waits for me to return and I don't try to do any of it asynchronously, but I could see some middleware transparently running it's inner in a goroutine.

The other option, which I tend to prefer, is to have a "detach" library that knows how to copy only opted-in context values, and they can register a function to copy themselves. This would be ideal: it works both for this use-case, but also for the long-running task use case.

Unfortunately, this only works if there is a single registry for such values. Otherwise you can't propagate values outside of a single ecosystem. So, I'd propose something like:

package detach
func FromContext(context.Context) context.Context
func RegisterValue[T any](key interface{}, derive func(orig T) T)

This addresses the core need to be able to copy, and possibly derive, detachable values like security contexts.

This doesn't address logging (that buffers), traces (that want a new span), etc. One way to handle this would be to add APIs that let them spin op a goroutine when the sub-context is done:

func WithValuesFrom(ctx, donor context.Context) context.Context
func RegisterCopy(copy func(ctx, orig context.Context) context.Context)

This means extra goroutines though, which doesn't excite me. Maybe something like this would be better:

type Task func(context.Context) error
func Run(ctx context.Context, f Task, additional ...Wrapper) error

type Wrapper func(ctx, orig context.Context, next Task) Task
func RegisterWrap(wrap Wrapper)

(orig would be a derived context from the original, probably pre-cancelled to avoid confusion)

This would allow cleaning up when the task completes, which is more useful for span-style context values. It would also allow for adding local wrappers to apply timeouts, etc on a per-call basis.

So FromContext would be the API for in-band detach, and Run would be the API for long running tasks.

@srowles
Copy link

srowles commented Jul 28, 2022

I ran into this today while discussing on the Gophers slack.

Having the ability to detach a context so that you can keep, for example, trace and other user information in the context that is used by downstream calls would be useful.

We have our own implementation that we use in a reasonably large number of places in our code base, but it would be nice if the standard library supported this directly.

@rsc rsc moved this from Active to Likely Accept in Proposals Feb 1, 2023
@CAFxX CAFxX closed this as completed Feb 2, 2023
@CAFxX CAFxX reopened this Feb 2, 2023
@rsc rsc moved this from Likely Accept to Accepted in Proposals Feb 9, 2023
@rsc
Copy link
Contributor

rsc commented Feb 9, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: context: add WithoutCancel context: add WithoutCancel Feb 9, 2023
@rsc rsc modified the milestones: Proposal, Backlog Feb 9, 2023
@CAFxX
Copy link
Contributor Author

CAFxX commented Feb 10, 2023

CL ready for review: https://go-review.googlesource.com/c/go/+/459016

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/479918 mentions this issue: context: add WithoutCancel

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/486535 mentions this issue: doc: add release notes for new context functions

gopherbot pushed a commit that referenced this issue Apr 20, 2023
For #40221
For #56661
For #57928

Change-Id: Iaf7425bb26eeb9c23235d13c786d5bb572159481
Reviewed-on: https://go-review.googlesource.com/c/go/+/486535
Run-TryBot: Damien Neil <dneil@google.com>
Reviewed-by: Sameer Ajmani <sameer@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
@davidljubadev22
Copy link

CL ready for review

https://stackoverflow.com/a/75883438/1901067

Tx @CAFxX !

@dmitshur dmitshur modified the milestones: Backlog, Go1.21 Jun 4, 2023
eric pushed a commit to fancybits/go that referenced this issue Aug 23, 2023
WithoutCancel returns a copy of parent that is not canceled when parent is canceled.
The returned context returns no Deadline or Err, and its Done channel is nil.
Calling Cause on the returned context returns nil.

API changes:
+pkg context, func WithoutCancel(Context) Context

Fixes golang#40221

Change-Id: Ide29631c08881176a2c2a58409fed9ca6072e65d
Reviewed-on: https://go-review.googlesource.com/c/go/+/479918
Run-TryBot: Sameer Ajmani <sameer@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@rsc rsc removed this from Proposals Apr 4, 2024
@golang golang locked and limited conversation to collaborators Jun 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests