Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: context: add Merge #36503

Closed
navytux opened this issue Jan 10, 2020 · 81 comments
Closed

proposal: context: add Merge #36503

navytux opened this issue Jan 10, 2020 · 81 comments

Comments

@navytux
Copy link
Contributor

navytux commented Jan 10, 2020

EDIT 2023-02-05: Last try: #36503 (comment).
EDIT 2023-01-18: Updated proposal with 2 alternatives: #36503 (comment).
EDIT 2023-01-06: Updated proposal: #36503 (comment).
EDIT 2020-07-01: The proposal was amended to split cancellation and values concerns: #36503 (comment).


( This proposal is alternative to #36448. It proposes to add context.Merge instead of exposing general context API for linking-up third-party contexts into parent-children tree for efficiency )

Current context package API provides primitives to derive new contexts from one parent - WithCancel, WithDeadline and WithValue. This functionality covers many practical needs, but not merging - the case where it is neccessary to derive new context from multiple parents. While it is possible to implement merge functionality in third-party library (ex. lab.nexedi.com/kirr/go123/xcontext), with current state of context package, such implementations are inefficient as they need to spawn extra goroutine to propagate cancellation from parents to child.

To solve the inefficiency I propose to add Merge functionality to context package. The other possibility would be to expose general mechanism to glue arbitrary third-party contexts into context tree. However since a) Merge is a well-defined concept, and b) there are (currently) no other well-known cases where third-party context would need to allocate its own done channel (see #28728; this is the case where extra goroutine for cancel propagation needs to be currently spawned), I tend to think that it makes more sense to add Merge support to context package directly instead of exposing a general mechanism for gluing arbitrary third-party contexts.

Below is description of the proposed API and rationale:

---- 8< ----

Merging contexts

Merge could be handy in situations where spawned job needs to be canceled whenever any of 2 contexts becomes done. This frequently arises with service methods that accept context as argument, and the service itself, on another control line, could be instructed to become non-operational. For example:

func (srv *Service) DoSomething(ctx context.Context) (err error) {
	defer xerr.Contextf(&err, "%s: do something", srv)

	// srv.serveCtx is context that becomes canceled when srv is
	// instructed to stop providing service.
	origCtx := ctx
	ctx, cancel := xcontext.Merge(ctx, srv.serveCtx)
	defer cancel()

	err = srv.doJob(ctx)
	if err != nil {
		if ctx.Err() != nil && origCtx.Err() == nil {
			// error due to service shutdown
			err = ErrServiceDown
		}
		return err
	}

	...
}

func Merge

func Merge(parent1, parent2 context.Context) (context.Context, context.CancelFunc)

Merge merges 2 contexts into 1.

The result context:

  • is done when parent1 or parent2 is done, or cancel called, whichever happens first,
  • has deadline = min(parent1.Deadline, parent2.Deadline),
  • has associated values merged from parent1 and parent2, with parent1 taking precedence.

Canceling this context releases resources associated with it, so code should call cancel as soon as the operations running in this Context complete.

---- 8< ----

To do the merging of ctx and srv.serveCtx done channels current implementation has to allocate its own done channel and spawn corresponding goroutine:

https://lab.nexedi.com/kirr/go123/blob/5667f43e/xcontext/xcontext.go#L90-118
https://lab.nexedi.com/kirr/go123/blob/5667f43e/xcontext/xcontext.go#L135-150

context.WithCancel, when called on resulting merged context, will have to spawn its own propagation goroutine too.

For the reference here is context.Merge implementation in Pygolang that does parents - child binding via just data:

https://lab.nexedi.com/kirr/pygolang/blob/64765688/golang/context.cpp#L74-76
https://lab.nexedi.com/kirr/pygolang/blob/64765688/golang/context.cpp#L347-352
https://lab.nexedi.com/kirr/pygolang/blob/64765688/golang/context.cpp#L247-251
https://lab.nexedi.com/kirr/pygolang/blob/64765688/golang/context.cpp#L196-226

/cc @Sajmani, @rsc, @bcmills

@navytux
Copy link
Contributor Author

navytux commented Feb 6, 2020

Judging by #33502 this proposal seems to have been missed. Could someone please add it to Proposals/Incoming project? Thanks.

@dweomer
Copy link

dweomer commented May 20, 2020

	ctx, cancel := xcontext.Merge(ctx, srv.serveCtx)

Is not the struct-held reference to a context a smell (regardless that it is long-lived)? If your server must be cancellable isn't it better practice to establish a "done" channel (and select on that + ctx in main thread) for it and write to that when the server should be done? This does not incur an extra goroutine.

@navytux
Copy link
Contributor Author

navytux commented May 21, 2020

@dweomer, as I already explained in the original proposal description there are two cancellation sources: 1) the server can be requested to be shutdown by its operator, and 2) a request can be requested to be canceled by client who issued the request. This means that any request handler that is spawned to serve a request must be canceled whenever any of "1" or "2" triggers. How does "select on done + ctx in main thread" helps here? Which context should one pass into a request handler when spawning it? Or do you propose we pass both ctx and done into all handlers and add done into every select where previously only ctx was there? If it is indeed what your are proposing, I perceive Merge as a much cleaner solution, because handlers still receive only one ctx and the complexity of merging cancellation channels is not exposed to users.

Re smell: I think it is not. Go is actually using this approach by itself in database/sql, net/http (2, 3, 4, 5, 6) and os/exec. I suggest to read Go and Dogma as well.

@seebs
Copy link
Contributor

seebs commented May 22, 2020

I just reinvented this independently. The situation is that I have two long-lived things, a shared work queue which several things could be using, and the individual things, and it's conceptually possible to want to close the shared work queue and make a new one for the things to use... And then there's an operation where the work queue scans through one of the things to look for extra work. That operation should shut down if either the thing it's scanning shuts down, or the shared work queue in general gets shut down.

Of course, context.Merge wouldn't quite help as one of them currently exposes a chan struct{}, not a Context.

navytux added a commit to navytux/go123 that referenced this issue May 27, 2020
golang/go#36503
golang/go#36448

They are there for 6 months already without being considered for real
and so the progress is very unlikely, but still...
@rsc
Copy link
Contributor

rsc commented Jun 10, 2020

I think I understand the proposal.

I'm curious how often this comes up. The Merge operation is significantly more complex to explain than any existing context constructor we have. On the other hand, the example of "server has a shutdown context and each request has its own, and have to watch both" does sound pretty common. I guess I'm a little confused about why the request context wouldn't already have the server context as a parent to begin with.

I'm also wondering whether Merge should take a ...context.Context instead of hard-coding two (think io.MultiReader, io.MultiWriter).

@seebs
Copy link
Contributor

seebs commented Jun 10, 2020

I do like the MultiReader/MultiWriter parallel; that seems like it's closer to the intent.

In our case, we have a disk-intensive workload that wants to be mediated, and we might have multiple network servers running independently, all of which might want to do some of that kind of work. So we have a worker that sits around waiting for requests, which come from those servers. And then we want to queue up background scans for "work that got skipped while we were too busy but we wanted to get back to it". The background scan of any given individual network server's workload is coming in parented by the network server, but now it also wants to abort if the worker decides it's closing. But the worker's not really contingent on the network server, and in some cases could be stopped or restarted without changing the network servers.

It's sort of messy, and I'm not totally convinced that this design is right. I think it may only actually matter during tests, because otherwise we wouldn't normally be running multiple network servers like this at once in a single process, or even on a single machine.

@bcmills
Copy link
Contributor

bcmills commented Jun 10, 2020

@seebs, if the background work is continuing after the network handler returns, it's generally not appropriate to hold on to arbitrary values from the handler's Context anyway. (It may include stale values, such as tracing or logging keys, and could end up preventing a lot of other data reachable via ctx.Value() from being garbage-collected.)

@seebs
Copy link
Contributor

seebs commented Jun 10, 2020

... I think I mangled my description. That's true, but it doesn't happen in this case.

Things initiated from the network requests don't keep the network request context if any of their work has to happen outside of that context. They drop something in a queue and wander off.

The only thing that has a weird hybrid context is the "background scanning", because the background scanning associated with a given network server should stop if that server wants to shut down, but it should also stop if the entire worker queue wants to shut down even when the network server is running. But the background scanning isn't triggered by network requests, it's something the network server sets up when it starts. It's just that it's contingent on both that server and the shared background queue which is independent from all the servers.

@navytux
Copy link
Contributor Author

navytux commented Jun 16, 2020

@rsc, thanks for feedback.

Yes, as you say, the need for merge should be very pretty common - practically in almost all client-server cases on both client and server sides.

I guess I'm a little confused about why the request context wouldn't already have the server context as a parent to begin with.

For networked case - when client and server interoperate via some connection where messages go serialized - it is relatively easy to derive handler context from base context of server and manually merge it with context of request:

  • client serializes context values into message on the wire;
  • client sends corresponding message when client-side context is canceled;
  • server creates context for handler deriving it from base server context;
  • server applies decoded values from wire message to derived context;
  • server stores derived context cancel in data structure associated with stream through which request was received;
  • server calls handler.cancel() when receiving through the stream a message corresponding to request cancellation.

Here merging can happen manually because client request arrives to server in serialized form.
The cancellation linking for client-server branch is implemented via message passing and serve loop. The data structures used for gluing resemble what Merge would do internally.

In other cases - where requests are not serialized/deserialized - the merge is needed for real, for example:

  1. on server a handler might need to call another internal in-process service ran with its own contexts;
  2. client and server are in the same process ran with their own contexts;
  3. on client every RPC stub that is invoked with client-provided context, needs to make sure to send RPC-cancellation whenever either that user-provided context is canceled, or underlying stream is closed;
  4. etc...

Since, even though they are found in practice, "1" and "2" might be viewed as a bit artificial, lets consider "3" which happens in practice all the time:

Consider any client method for e.g. RPC call - it usually looks like this:

func (cli *Client) DoSomething(ctx context.Context, ...) {
    cli.conn.Invoke(ctx, "DoSomething", ...)
}

conn.Invoke needs to make sure to issue request to server under context that is canceled whenever ctx is canceled, or whenever cli.conn is closed. For e.g. gRPC cli.conn is multiplexed stream over HTTP/2 transport, and stream itself must be closed whenever transport link is closed or brought down. This is usually implemented by way of associating corresponding contexts with stream and link and canceling stream.ctx <- link.ctx on link close/down. cli.conn.Invoke(ctx,...) should thus do exactly what Merge(ctx, cli.conn.ctx) is doing.

Now, since there is no Merge, everyone is implementing this functionality by hand with either extra goroutine, or by doing something like

reqCtx, reqCancel = context.WithCancel(ctx)

, keeping registry of issued requests with their cancel in link / stream data structures, and explicitly invoking all those cancels when link / stream goes down.

Here is e.g. how gRPC implements it:

And even though such explicit gluing is possible to implement by users, people get tired of it and start to use "extra goroutine" approach at some point:

In other words the logic and complexity that Merge might be doing internally, well and for everyone, without Merge is scattered to every user and is intermixed with the rest of application-level logic.

On my side I would need the Merge in e.g. on client,

and on server where context of spawned handlers is controlled by messages from another server which can tell the first server to stop being operational (it can be as well later told by similar message from second server to restart providing operational service):

https://lab.nexedi.com/kirr/neo/blob/85658a2c/go/neo/storage.go#L52-56
https://lab.nexedi.com/kirr/neo/blob/85658a2c/go/neo/storage.go#L422-431
https://lab.nexedi.com/kirr/neo/blob/85658a2c/go/neo/storage.go#L455-457
https://lab.nexedi.com/kirr/neo/blob/85658a2c/go/neo/storage.go#L324-343

and in many other places...


I often see simplicity as complexity put under control and wrapped into simple interfaces.
From this point of view Merge is perfect candidate because 1) it is a well-defined concept, 2) it allows to offload users from spreading that complexity throughout their libraries/applications, and 3) it kind of makes a full closure for group of context operations, which was incomplete without it.

On "3" I think the following analogies are appropriate:

Without Merge context package is like

  • Git with commit and branches, but no merge;
  • Go with go and channels, but no select;
  • SSA without φ nodes,
  • ...

In other words Merge is a fundamental context operation.

Yes, Merge requires willingness from Go team to take that complexity and absorb it inside under Go API. Given that we often see reluctance to do so in other cases, I, sadly, realize that it is very unlikely to happen. On the other hand there is still a tiny bit of hope on my side, so I would be glad to be actually wrong on this...

Kirill

P.S. I tend to agree about converting Merge to accept (parentv ...context.Context) instead of (parent1, parent2 context.Context).

P.P.S. merging was also discussed a bit in #30694 where @taralx wrote: "While it is possible to do this by wrapping the handler and merging the contexts, this is error-prone and requires an additional goroutine to properly merge the Done channels."

@rsc rsc changed the title proposal: context: Add Merge proposal: context: add Merge Jun 17, 2020
@rsc
Copy link
Contributor

rsc commented Jun 17, 2020

@Sajmani and @bcmills, any thoughts on whether we should add context.Merge as described here? (See in particular the top comment.)

@rsc
Copy link
Contributor

rsc commented Jun 17, 2020

/cc @neild @dsnet as well for more context opinions

@neild
Copy link
Contributor

neild commented Jun 17, 2020

Within Google's codebase, where the context package originated, we follow the rule that a context.Context should only be passed around via the call stack.

From https://github.com/golang/go/wiki/CodeReviewComments#contexts:

Don't add a Context member to a struct type; instead add a ctx parameter to each method on that type that needs to pass it along. The one exception is for methods whose signature must match an interface in the standard library or in a third party library.

This rule means that at any point in the call stack, there should be exactly one applicable Context, received as a function parameter. When following this pattern, the merge operation never makes sense.

While merging context cancellation signals is straightforward, merging context values is not. Contexts can contain trace IDs and other information; which value would we pick when merging two contexts?

I also don't see how to implement this efficiently without runtime magic, since it seems like we'd need to spawn a goroutine to wait on each parent context. Perhaps I'm missing something.

@bcmills
Copy link
Contributor

bcmills commented Jun 18, 2020

For values, Merge would presumably bias toward one parent context or the other. I don't see that as a big problem.

I don't think runtime magic is needed to avoid goroutines, but we would at least need some (subtle) global lock-ordering for the cancellation locks, since we could no longer rely on the cancellation graph being tree-structured. It would at least be subtle to implement and test, and might carry some run-time overhead.

@Sajmani
Copy link
Contributor

Sajmani commented Jun 19, 2020

Context combines two somewhat-separable concerns: cancelation (via the Deadline, Done, and Err methods) and values. The proposed Merge function combines these concerns again, defining how cancelation and values are merged. But the example use case only relies on cancelation, not values: https://godoc.org/lab.nexedi.com/kirr/go123/xcontext#hdr-Merging_contexts

I would feel more comfortable with this proposal if we separated these concerns by providing two functions, one for merging two cancelation signals, another for merging two sets of values. The latter came up in a 2017 discussion on detached contexts: #19643 (comment)

For the former, we'd want something like:

ctx = context.WithCancelContext(ctx, cancelCtx)

which would arrange for ctx.Done to be closed when cancelCtx.Done is closed and ctx.Err to be set from cancelCtx.Err, if it's not set already. The returned ctx would have the earlier Deadline of ctx and cancelCtx.

We can bikeshed the name of WithCancelContext, of course. Other possibilities include WithCanceler, WithCancelFrom, CancelWhen, etc. None of these capture Deadline, too, though.

@rsc
Copy link
Contributor

rsc commented Jun 24, 2020

@navytux, what do you think about Sameer's suggestion to split the two operations of WithContextCancellation and WithContextValues (with better names, probably)?

@navytux
Copy link
Contributor Author

navytux commented Jul 1, 2020

@Sajmani, @rsc, everyone, thanks for feedback.

First of all I apologize for the delay with replying as I'm overbusy this days and it is hard to find time to properly do. This issue was filed 7 months ago when things were very different on my side. Anyway, I quickly looked into what @Sajmani referenced in #36503 (comment), and to what other says; my reply is below:

Indeed Context combines two things in one interface: cancellation and values. Those things, however, are orthogonal. While merging cancellation is straightforward, merging values is not so - in general merging values requires merging strategy to see how to combine values from multiple sources. And in general case merging strategy is custom and application dependent.

My initial proposal uses simple merging strategy with values from parent1 taking precedence over values from parent2. It is simple merging strategy that I've came up with while trying to make Merge work universally. However the values part of my proposal, as others have noted, is indeed the weakest, as that merging strategy is not always appropriate.

Looking into what @Sajmani has said in #19643 (comment) and #19643 (comment), and with the idea to separate cancellation and values concerns, I propose to split Context interface into cancellation-part and values-part and rework the proposal as something like follows:

// CancelCtx carries deadline and cancellation signal across API boundaries.
type CancelCtx interface {
        Deadline() (deadline time.Time, ok bool)
        Done() <-chan struct{}
        Err() error
}

// CancelFunc activates CancelCtx telling an operation to abandon its work.
type CancelFunc func()

// Values carries set of key->value pairs across API boundaries.
type Values interface {
        Value(key interface{}) interface{}
}

// Context carries deadline, cancellation signal, and other values across API boundaries.
type Context interface {
        CancelCtx
        Values
}

// ... (unchanged)
func WithCancel   (parent Context) (ctx Context, cancel) 
func WithDeadline (parent Context, d  time.Time) (ctx Context, cancel) 
func WithTimeout  (parent Context, dt time.Duration) (ctx Context, cancel) 
func WithValue    (parent Context, key,val interface{}) Context 


// MergeCancel merges cancellation from parent and set of cancel contexts.
//
// It returns copy of parent with new Done channel that is closed whenever
//
//      - parent.Done is closed, or
//      - any of CancelCtx from cancelv is canceled, or
//      - cancel called
//
// whichever happens first.
//
// Returned context has Deadline as earlies of parent and any of cancels.
// Returned context inherits values from parent only.
func MergeCancel(parent Context, cancelv ...CancelCtx) (ctx Context, cancel CancelFunc)

// WithNewValues returns a Context with a fresh set of Values. 
//
// It returns a Context that satisfies Value calls using vs.Value instead of parent.Value.
// If vs is nil, the returned Context has no values. 
//
// Returned context inherits deadline and cancellation only from parent. 
//
// Note: WithNewValues can be used to extract "only-cancellation" and
// "only-values" parts of a Context via
//
//      ctxNoValues := WithNewValues(ctx, nil)           // only cancellation
//      ctxNoCancel := WithNewValues(Background(), ctx)  // only values
func WithNewValues(parent Context, vs Values) Context 

Values and WithNewValues essentially come from #19643. Merge is reworked to be MergeCancel and only merging cancellation signal, not values. This separates values vs cancellation concerns, is general (does not hardcode any merging strategy for values), and can be implemented without extra goroutine.

For the reference, here is how originally-proposed Merge could be implemented in terms of MergeCancel and WithNewValues:

// Merge shows how to implement Merge from https://github.com/golang/go/issues/36503
// in terms of MergeCancel and WithNewValues.
func Merge(parent1, parent2 Context) (Context, cancel) {
        ctx, cancel := MergeCancel(parent1, parent2)
        v12 := &vMerge{[]Values{parent1, parent2}}
        ctx = WithNewValues(ctx, v12)
        return ctx, cancel
}

// vMerge implements simple merging strategy: values from vv[i] are taking
// precedence over values from vv[j] for i>j.
type vMerge struct {
        vv []Values
}

func (m *vMerge) Value(key interface{}) interface{} {
        for _, v := range m.vv {
                val := v.Value(key)
                if val != nil {
                        return val
                }
        }
        return nil
}

Regarding implementation: it was already linked-to in my original message, but, as people still raise concerns on whether "avoid extra-goroutine" property is possible, and on lock ordering, here it is once again how libgolang implements cancellation merging without extra goroutine and without any complex lock ordering:

https://lab.nexedi.com/nexedi/pygolang/blob/0e3da017/golang/context.h
https://lab.nexedi.com/nexedi/pygolang/blob/0e3da017/golang/context.cpp

Maybe I'm missing something, and of course it will have to be adapted to MergeCancel and NewValues, but to me the implementation is relatively straightforward.

Kirill

/cc @zombiezen, @jba, @ianlancetaylor, @rogpeppe for #19643

@rsc
Copy link
Contributor

rsc commented Jul 8, 2020

Thanks for the reply. We're probably not going to split the Context interface as a whole at this point.
Note that even the ...CancelCtx would not accept a []Context, so that would be a stumbling block for users.

The value set merge can be done entirely outside the context package without any inefficiency. And as @neild points out, it's the part that is the most problematic.

The cancellation merge needs to be inside context, at least with the current API, or else you'd have to spend a goroutine on each merged context. (And we probably don't want to expose the API that would be required to avoid that.)

So maybe we should focus only on the cancellation merge and ignore the value merge entirely.

It still doesn't seem like we've converged on the right new API to add, though.

@bradfitz points out that not just the cancellation but also the timeouts get merged, right?
(And the error that distinguishes between those two cases gets propagated?)
So it's not really only merging cancellation.

It does seem like the signature is

func SOMETHING(parent Context, cancelv ...CancelCtx) (ctx Context, cancel CancelFunc)

Or maybe the op to expose is being able to cancel one context when another becomes done, like:

// Link arranges for context x to become done when context y does.
func Link(x, y Context) 

(with better names).

?

It seems like we're not yet at an obviously right answer.

@navytux
Copy link
Contributor Author

navytux commented Jul 15, 2020

@rsc, thanks for feedback. I think I need to clarify my previous message:

  • last time I did not proposed to include Merge - I showed only how it could be possible to implement Merge functionality in third-party place with what comes in the proposal;

  • my proposal consist only of context.MergeCancel and context.WithNewValues;

  • context.WithNewValues comes directly from proposal: context: add API for performing tasks after Context is Done #19643 (comment) and proposal: context: add API for performing tasks after Context is Done #19643 (comment) - it is exactly what @Sajmani proposed there;

  • context.WithNewValues and thinking about context.Context as being composed of two parts (cancellation + values) comes due to @Sajmani request to do so in proposal: context: add Merge #36503 (comment), where he says:

    "Context combines two somewhat-separable concerns: cancelation (via the Deadline, Done, and Err methods) and values
    ...
    I would feel more comfortable with this proposal if we separated these concerns by providing two functions, one for merging two cancelation signals, another for merging two sets of values..."

  • as @Sajmani explains WithNewValues cannot be efficiently implemented out of context package, that's why I brought it in to see full picture.

  • regarding merging of cancellation it is

    func MergeCancel(parent Context, cancelv ...CancelCtx) (ctx Context, cancel CancelFunc)

    the only potential drawback here is that automatic conversion of []Context to []CancelCtx does not work.

    However there is the same issue with e.g. io.MultiReader(readerv ...io.Reader): if, for example, someone has []io.ReadCloser, or []OTHERTYPE with OTHERTYPE implementing io.Reader, it won't be possible to pass that slice to io.MultiReader directly without explicit conversion:

    package main
    
    import "io"
    
    func something(rv ...io.Reader) {}
    
    func f2() {
            var xv []io.ReadCloser
            something(xv...)
    }
    ./argv.go:9:11: cannot use xv (type []io.ReadCloser) as type []io.Reader in argument to something
    

    From this point of view, personally, I think it is ok for cancelv to be ...CancelCtx, because

    • explicit usage with direct arguments - even of Context type - without ... will work;
    • for rare cases where users will want to use ... they will have to explicitly convert to []Context, which is currently generally unavoidable as io.MultiReader example shows.

    Said that I'm ok if we change MergeCancel to accept ...Context or even just only one extra explicit argument MergeCancel(parent Context, cancelCtx Context). However I think that would be worse overall because it looses generality.

  • timeouts are indeed merged as part of cancellation, because timeouts are directly related to cancellation and are included into CancelCtx for that reason. They are not values. We cannot skip merging timeouts when merging cancellation.

    For example after

    func (srv) doSomething(ctx) {
        ctx, cancel := context.Merge(ctx, srv.serveCtx)

    I, as a user, expect

    • ctx to be cancelled in particular when srv.serveCtx is cancelled;
    • ctx to have deadline not longer than deadline for srv.serveCtx.

    if we merge only cancel signal, but not deadline, the result context can be cancelled early - due to srv.serveCtx timeout, but its deadline could be infinity, if original ctx is e.g. background. To me it is not good to report deadline=∞ when it is known in advance that the operation will be canceled due to timeout.

    That's my rationale for treating and merging deadlines together with cancellation.

    I think it coincides with @Sajmani treatment that cancellation is constituted by Deadline, Done and Err instead of only Done and Err without Deadline.


Regarding Link - I think it is better we indeed try to avoid exposing this general functionality to API. Link can create cycles and besides that it is not possible to implement Link for arbitrary third-party context because having only Context interface there is no way to cancel it even via extra goroutine or whatever. At least without introducing other extra interfaces a context must expose to be linkable. Contrary to that, MergeCancel is well-defined operation and can be implemented generally - efficiently if all arguments are native to context package, and via extra goroutine to propagate cancellation for contexts coming from third-party places.

What do you think? Does my feedback clarify anything?
It would be good to also see what @Sajmani thinks.

Kirill

@rsc
Copy link
Contributor

rsc commented Jul 15, 2020

@navytux,

FWIW, @Sajmani's comment from 2017 #19643 (comment) is out of date. WithNewValues can be implemented efficiently outside the context package, after changes we made recently.

Re: MergeCancel(parent Context, cancelCtx ...Context) being "worse overall because it looses generality", what generality does it lose? No one has any values of type CancelCtx today, so there is nothing to generalize. Even if we added the CancelCtx type, wouldn't every instance be a Context anyway? Certainly the only ones we can handle efficiently would be contexts.

It does sound like we're converging on

//  MergeCancel returns a copy of parent with additional deadlines and cancellations applied
// from the list of extra contexts. The returned context's Done channel is closed
// when the returned cancel function is called or when parent's Done channel is closed,
// or when any of the extra contexts' Done channels are closed.
//
// Canceling this context releases resources associated with it, so code should
// call cancel as soon as the operations running in this Context complete.
func MergeCancel(parent Context, extra ...Context) (ctx Context, cancel CancelFunc)

Does anyone object to those semantics? Maybe it should be MergeDone? Some better name?

@adamluzsi
Copy link

I have a use case with context as the carrier for the cancellation signal.
It doesn't help MergeCancel's case, but it's an example of when the context is used to signal shutdown.

It allows me to achieve simplicity by decoupling my component from application lifecycle management.
For example, I have a high-level component to handle shutdown signals and background jobs,
and these jobs use a context as a parent context for all sub-interactions.
If the shutdown is signalled, all cleanup is taken care of as the synchronous process calls finish their execution and return.

Here is an example package that describes the approach in more detail:
https://github.com/adamluzsi/frameless/tree/main/pkg/jobs

@navytux
Copy link
Contributor Author

navytux commented Feb 5, 2023

@Sajmani, @rsc, everyone, thanks for feedback.
I could find the time to reply only today. I apologize for the delay.

The main criticisms of hereby proposal are

a) that there is no real compelling use case for MergeCancel, and
b) that the number of places where MergeCancel could be useful is small.

Let's go through those:

The need for MergeCancel

Even though the original overview of this proposal does not say anything
regarding that Service is stopped via context cancellation, such argument was used
against hereby proposal with saying that it is not ok to bind a Service
lifetime to context. Given that argument, I believe, it makes sense to describe
everything once again from scratch:

Imagine we are implementing a Service. This service has dedicated methods to
be created, started and stopped. For example:

type Service { ... }
func NewService() Service
func (srv *Service) Run()
func (srv *Service) Stop()

The Service provides operations to its users, e.g.

func (srv *Service) DoSomething(ctx context.Context) error

and we want those operations to be canceled on both

a) cancellation of ctx provided by user, and
b) if the service is stopped by way of srv.Stop called from another goroutine.

So let's look how that could be implemented. Since inside DoSomething it might
need to invoke arbitrary code, including code from another packages, it needs to
organize a context that is cancelled on "a" or "b" and invoke that, potentially
third-party code, with that context.

To organize a context that is cancelled whenever either user-provided ctx is
cancelled, or whenever Service.Stop is called we could create a new context
via WithCancel, and register its cancel to be invoked by Service.Stop:

struct Service {
	...

	// Stop invokes functions registered in cancelOnStop
	stopMu       sync.Mutex
	stopped      bool
	cancelOnStop set[context.CancelFunc]
}

func (srv *Service) Stop() {
	srv.stopMu.Lock()
	defer srv.stopMu.Unlock()

	srv.stopped = true
	for cancel := range srv.cancelOnStop {
		cancel()
	}

	...
}

func (srv *Service) DoSomething(ctx context.Context) (err error) {
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	srv.stopMu.Lock()
	if srv.stopped {
		return ErrServiceDown
		srv.stopMu.Unlock()
	}
	srv.cancelOnStop.Add(cancel)
	srv.stopMu.Unlock()

	defer func() {
		srv.stopMu.Lock()
		defer srv.stopMu.Unlock()
		srv.cancelOnStop.Del(cancel)

		if err != nil && srv.stopped {
			err = ErrServiceDown
		}
	}()

	return thirdparty.RunJob(ctx)
}

This pattern is scattered and kind of used everywhere including in the gRPC internals I
explained in detail in #36503 (comment).

It is also not always implemented in such simple form, as people sometimes use
dedicated registry for cancels, and sometimes they keep on other objects and
retrieve created cancel func indirectly from there which makes the logic more
scattered and harder to follow.

But what all this code is doing in a sense - is that it is duplicating
functionality of context package by manually propagating cancellation from
Stop to every spawned job. Let me say it once again:

_Every_ service implementation has to _manually_ propagate the
cancellation from its `Stop` or `Shutdown` to _every_ spawned job in
_every_ of its methods.

Even though this could be done, does the need for manual duplication of the
context functionality suggest that functionality should be provided by the
context package in the first place?

And because this propagation is usually scattered, searching code for it does
not yield results with simple queries like Merge(Context). Most people keep
on copying the pattern from one place to another preferring not to go against
official guidelines not to store context in structures: it hides the problem if
the context and propagation logic is stored in expanded form.

For the reference: Detach operation is significantly easier to implement and
does not need to go against style guide that context should not be stored anywhere.
That's why searching for it yields more results. But the interpretation of the
search should be normalized to difficulty and willingness to go against
existing recommendation with risking to receive "obey dogma" feedback.

For the reference 2: Go is actually storing contexts inside structures, e.g. in database/sql, net/http (2, 3, 4, 5, 6) and os/exec.

Actually in 2017 in #22602 (comment) @Sajmani wrote

I had also pushed for the explicit parameter restriction so that we could
more easily automate refactorings to plumb context through existing code. But
seeing as we've failed to produce such tools, we should probably loosen up
here and deal with the tooling challenges later.

so I guess that the restriction that context should not be kept in structs, should not be that strong now (/cc @cespare).

For the reference 3: and even if context.OnDone proposal is accepted, without
adding MergeCancel to standard package context, the scheme of manual cancel
propagation from Stop to operations will need to be kept in exactly the same
form as it is now and everywhere because people still get a message that "storing
serverCtx in struct is not ok". By the way, why it is context.OnDone instead
of chan.OnDone? If we have a way to attach callback to channel operations
many things become possible without any need to modify context or other
packages. Is that path worth going? I mean internally OnDone might be
useful, but is it a good idea to expose that as public API?

For the reference 4: for cancelling operations on net.Conn the solution, from
my point of view, should be to extend all IO operations to also take ctx parameter
and cancel IO on ctx cancel. This is the path e.g. xio package
takes. And internally, after an IO operation was submitted, the implementation of that IO
on Go side does select on IO completion or ctx cancel, and if ctx is cancelled
issues cancel command via io_uring. That's how it should work. True, we could
use wrappers over existing Reader and Writer that call Close via ctx.OnDone.
While it somewhat works, closing the IO link on read cancel is not what read
canceler really expects. And going this way also exposes the inner details of context
machinery as public API. While that, once again, might work in the short term,
in my view that won't be a good evolution in the longer run.


Another critic note regarding hereby proposal was that there is no
proof-of-concept implementation shown.

But the proof of concept implementation is there and was pointed out originally
right in the first message of hereby proposal. Here it is once again:

https://lab.nexedi.com/kirr/pygolang/blob/39dde7eb/golang/context.h
https://lab.nexedi.com/kirr/pygolang/blob/39dde7eb/golang/context.cpp
https://lab.nexedi.com/kirr/pygolang/blob/39dde7eb/golang/context_test.py
https://lab.nexedi.com/kirr/pygolang/blob/39dde7eb/golang/libgolang.h

It is in C++, not Go, but it shows that the implementation is straightforward.


At this stage I have very little hope that this proposal will be accepted. As
it looks it will likely be declined next Wednesday and closed. That's sad, but
I will hopefully try to find my way.

Kirill

@powerman
Copy link

powerman commented Feb 7, 2023

To organize a context that is cancelled whenever either user-provided ctx is
cancelled, or whenever Service.Stop is called we could create a new context
via WithCancel, and register its cancel to be invoked by Service.Stop:

@navytux While I like the MergeCancel idea, it should be noted actual implementation can be much simpler - at cost of adding 1 extra goroutine per method call (same 1 extra goroutine which is currently created anyway by 3rd-party MergeCancel implementations):

struct Service {
	...
	stopped      chan struct{}
}

func (srv *Service) Stop() {
	close(srv.stopped)
}

func (srv *Service) DoSomething(ctx context.Context) (err error) {
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	go func() {
		select {
		case <-srv.stopped:
			cancel()
		case <-ctx.Done():
		}
	}()

	return thirdparty.RunJob(ctx)
}

@rsc
Copy link
Contributor

rsc commented Feb 8, 2023

Will move back to active instead of likely decline.

@rsc
Copy link
Contributor

rsc commented Feb 8, 2023

Note that #57928 may be a better or equally good answer, since it would let people implement Merge efficiently outside the standard library.

@navytux
Copy link
Contributor Author

navytux commented Feb 8, 2023

Thanks, Russ.

Regarding your note I'd like to point out that "let people implement X outside the project" might be not universally good criteria. For example from this point of view we could also say that "with callbacks there is no need for channels and goroutines to be built-in, since with callbacks those primitives could be implemented outside". I might be missing something but in my view what makes sense to do is to provide carefully selected and carefully thought-out reasonably small set of high-level primitives instead of lower-level ones for "doing anything outside".

In #36503 (comment) I already explained that MergeCancel is fundamental context operation paralleling it to merges in git, select in go and φ in SSA. From this point of view adding MergeCancel makes sense to do because it makes context operations to be kind of full closure, which were previously incomplete.

Kirill

P.S. @powerman, thanks. I was implicitly assuming we want to avoid that extra goroutine cost, but you are right it would be better to state that explicitly and to show plain solution as well.

@ChrisHines
Copy link
Contributor

ChrisHines commented Feb 8, 2023

@navytux, thanks for the detailed scenario for helping to think about this proposal.

When people have asked me about how to handle these situations in the past I have usually encouraged them to figure out how to get the two contexts to have a common parent context rather than try to figure out how to merge unrelated contexts.

In your example it doesn't appear that is possible at first glance because the context passed to Service.DoSomething isn't related to anything inside Service.Stop. But maybe it's not as difficult as it seems.

Isn't there always a higher scope in the application that manages the lifetime of the Service and also the code that calls DoSomething? It seems possible that the higher scope can ensure that there is a base context that gets canceled when Service.Stop is called.

For example:

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    svc := NewService()
    go func() {
        svc.Run() // FWIW, I usually write Run(ctx) here, and avoid Stop methods.
    }()

    go func() {
        <-ctx.Done()
        svc.Stop()
    }()

    DoThingsWithService(ctx, svc)
}

func DoThingsWithService(ctx context.Context, svc *Service) {
    ctx, cancel := context.WithTimeout(ctx, time.Second)
    defer cancel()

    // ....

    svc.DoSomething(ctx)
}

Granted, the higher scope has to orchestrate more this way, but I haven't found that difficult to manage in the code I've worked on. I am curious if there is a fundamental reason why it shouldn't be done this way?

@neild
Copy link
Contributor

neild commented Feb 8, 2023

Abstract examples are difficult to work with. What is a Service? Why does it stop? Should stopping a service (possibly optionally) block until outstanding requests have been canceled? Should stopping a service (possibly optionally) let outstanding requests finish before stopping? If either of these features are desired, then context cancellation isn't sufficient; the service needs to be aware of the lifetimes of outstanding requests, which contexts don't provide.

I'd expect any network server, such as an HTTP or RPC server, to support graceful shutdown where you stop accepting new requests but let existing ones run to completion. But perhaps a Service in this example is something else.

I still feel that the motivating case for MergeCancel is unconvincing. However, even granting a compelling use case, MergeCancel confuses the context model by separating cancellation and value propagation, cannot be implemented efficiently in the general case, and does not address common desires such as bounding a network operation or condition variable wait by a context's lifetime. context.OnDone/context.AfterFunc as proposed in #57928 fits neatly within the context model, lets us efficiently implement operations on third-party contexts that are inefficient today, and can be easily used to implement MergeCancel or the equivalent.

@rsc
Copy link
Contributor

rsc commented Feb 9, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Likely Decline to Active in Proposals Feb 9, 2023
@powerman
Copy link

powerman commented Feb 9, 2023

I'd expect any network server, such as an HTTP or RPC server, to support graceful shutdown where you stop accepting new requests but let existing ones run to completion.

@neild Please don't take this reply as one "for MergeCancel" and "against context.After" - I just like to clarify this use case for you.

For trivial fast RPC calls - you right, it's usually preferable to complete current requests on graceful shutdown instead of interrupting them. But some RPC can be slow and also RPC can be streaming (i.e. never ending) - in these cases cancelling them using context is really natural way to go.

Also, once again, it's worth to remind about less common but valid use case when we need to cancel some group of requests but not all of them - i.e. not a graceful shutdown case. It may be requests of some user account which was blocked by admin or runs out of money, may be requests from some external service/client whose certificate has just expired, may be requests related to some deprecated API which deprecation time happens to begin right now, cancelling a group of jobs, etc. And, yeah, here I'm still talking about slow/long-running/streaming type of requests.

@rsc
Copy link
Contributor

rsc commented Mar 1, 2023

Waiting on #57928, but assuming we do find a good answer there, it will probably make sense to let that one go out and get used before we return to whether Merge needs to be in the standard library.

@rsc
Copy link
Contributor

rsc commented Apr 5, 2023

Having accepted #57928, it seems like we can decline this and let people build it as needed with AfterFunc.
(They could build it before using a goroutine, but now AfterFunc lets them avoid the goroutine.)

Do I have that right?

@neild
Copy link
Contributor

neild commented Apr 5, 2023

I believe that's right; AfterFunc should make it simple to efficiently implement MergeCancel.

@shabbyrobe
Copy link

shabbyrobe commented Apr 7, 2023

Apologies folks, if you got this via the email, I got stung by github's "oops you hit control-enter and now it's sent" misfeature and sent this too early, I have edited since to clean up some of the clumsy wording.

Seems better to leave it on hold to me. Merge is a very useful building block and even if it's built on AfterFunc, I think it still merits inclusion, but I don't think we'll know until AfterFunc has been out in the wild for a bit.

The shutdown propagation thing is the most valuable use case I've encountered. I'm aware that there are other preferences for how to implement that, but nothing to me has been cleaner or easier to explain to other engineers than merged contexts. Graceful shutdowns in complex systems are hard enough that I rarely see them even attempted in practice, much less done well.

I think there are many ways of composing context cancellations that are underserved by the standard library, not just Merge, Merge is a useful primitive building block that I think can help unlock this space a bit (along with relaxing the guidance on context retention, which I think is restrictive, in favour of something more nuanced), and if it's simple enough to implement on top of AfterFunc, I can't help but wonder if that's actually reason to include it, not a reason not to, especially given there's a lot of interest here, a lot of examples, and a lot of +1s.

Part of the problem with some of the examples given in this thread, all of which seem somewhat familiar to me in some ways and very alien in others, is that it's really hard to give a good, narrow example of exactly how we have seen a need for something like this. It's usually in the context of a much bigger problem, with constraints and nuances that are very hard to relay in the bounds of a small example. It's really one of those things where you have to feel the fiber of the fabric of it between your fingers to know. This can lead us to a lot of strawmanning of examples and suggestions that inappropriate alternatives are the Absolute Right Way, even though we don't really know enough about somebody's problem, and whether this would in fact be a good solution, to make that judgement. This feels like a case where the challenge has been to come up with a small-enough demonstration to be convincing. Maybe that's a sign that in this case, "is there a concrete use case? is the wrong question, and a better one might be "do enough folks see value in the potential unlocked by something composable and is the implementation simple enough to just let the flowers bloom?"

Just my 2c, but I think it's best to leave this on hold for a while, see how things shake out after AfterFunc has had a chance to percolate, then re-evaluate with a less hard-edged approach to demanding perfectly sculpted examples. If it is indeed simple enough to build on top of AfterFunc, I think that's a reason to include it.

@rsc
Copy link
Contributor

rsc commented Apr 12, 2023

We can decline this for now and then feel free to open a new proposal for Merge in a year or so if there is new evidence.

@rsc
Copy link
Contributor

rsc commented Apr 12, 2023

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals Apr 12, 2023
@rsc rsc moved this from Likely Decline to Declined in Proposals Apr 19, 2023
@rsc
Copy link
Contributor

rsc commented Apr 19, 2023

No change in consensus, so declined.
— rsc for the proposal review group

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests