Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagating context information to child-tasks and remote calls? #35757

Closed
oschulz opened this issue May 5, 2020 · 81 comments
Closed

Propagating context information to child-tasks and remote calls? #35757

oschulz opened this issue May 5, 2020 · 81 comments
Labels
parallelism Parallel or distributed computation

Comments

@oschulz
Copy link
Contributor

oschulz commented May 5, 2020

@tkf and me have been discussing ways to propagate information about available workers (or resources in general) in distributed hierarchical computations:

https://discourse.julialang.org/t/propagation-of-available-assigned-worker-ids-in-hierarchical-computations

Adding resource arguments to every function call would be impractical, and using Cassette & friends to add add context by rewriting the whole computation would be very heavy-handed, since it might tough large code stacks (and may also be not be a complete solution when remote calls are involved?).

Could we add something like a context field to Task, in addition to the storage field - with the difference that context is automatically propagated to spawned tasks and remote calls? Adding the possibility to propagate a context through a computation in this fashion could also be useful for other use cases too, I think.

@vtjnash
Copy link
Member

vtjnash commented May 5, 2020

C.f. #34543

@tkf
Copy link
Member

tkf commented May 5, 2020

Thanks for opening the issue.

I'm actually against copying task_local_storage API. I think something similar to contextvars in Python (see PEP 567 for discussion) would be better:

const var1 = ContextVar(:var1, 42)  # with default
@contextvar var1 = 42 # possible sugar

const var2 = ContextVar{Lockable{Vector{Int}}}(:var2)  # without default, with type
@contextvar var2::Lockable{Vector{Int}} # possible sugar
# Lockable from https://github.com/JuliaLang/julia/pull/34400

const var3 = ContextVar(:var3)  # equivalent to ContextVar{Any}(:var3)
@contextvar var3 # possible sugar


function f()
    @sync @async @show var1[] # var1[] => 42
    var1[] = 0
    @sync @async @show var1[] # var1[] => 0

    setting(var1 => 1) do
        @show var1[] # var1[] => 1
    end
    @show var1[] # var1[] => 0
end

In contrast to task_local_storage(:var):

  1. var[] can be inferred
  2. var is forced to be namespaced (i.e., it has to exist in some module name space)
  3. var can be backed up by an efficient concrete key type (e.g., UUID)
  4. var allows small-size optimization when the value type can be inlined into the context storage

We may want to use something like HAMT (as in PEP 567) for the context storage.


A possible/reference implementation of ContextVar may be something like

struct ContextVar{T}
    name::Symbol  # only for human readability
    key::UUID     # (for example)
    default::T    # or `Union{Some{T},Nothing}`?
    has_default::Bool
end

getindex(var::ContextVar{T}) where {T} =
    if var.has_default
        get(task_local_storage(:CONTEXT), var.key, var.default)
    else
        task_local_storage(:CONTEXT)[var.key]
    end :: T

@tkf
Copy link
Member

tkf commented May 7, 2020

@oschulz
Copy link
Contributor Author

oschulz commented May 7, 2020

Oh, nice! Will this work across remote-call boundaries?

Maybe use with_context instead of with_variables? Variable is such a generic term. :-)

@tkf
Copy link
Member

tkf commented May 7, 2020

Yeah, with_context sounds like a much better name. :)

Will this work across remote-call boundaries?

I think it'd work with local context variables (i.e., captured in closures) but not with global const context variables. But I think it's possible in principle. We need to tweak Distributed (or you need to propagate it manually) though.

Notes: Global const doesn't work because I'm using uuid4 (which is based on the global RNG) to generate the key. I can use uuid5 to make the key deterministic (i.e., hashing the package's UUID, module namespaces, and the variable name). We need a macro to make this easier.

@c42f
Copy link
Member

c42f commented May 8, 2020

Yes we need something better here, thanks again for opening this issue! I feel an alternative name for this issue could be "taking dynamic scope seriously" ;-) Though perhaps "context variable" is a more straightforward name for this.

If we somehow had efficient codegen for context variables, there's a lot of interesting language facilities which can be improved based on this. For example

At #35690 (comment) I wondered about one possible way to specialize code based on context variables. As noted in the OP, specializing a whole call stack based on the presence/type of a context variable is just really heavy handed. But on the other hand, I feel like specializing a leaf function (or a few innermost inlined frames) on a context var could be very powerful, and possibly of acceptable cost.

Getting this idea to be sound and practical within inference seems quite tricky. I guess it would require a context calling convention to allow inference to reason about the innermost frames in a systematic way. If it actually panned out, I'm imagining the compiler could hoist context variable access out of innermost loops and across inlined function calls.

Kotlin seems to have some interesting APIs around context and coroutines (and they have taken structured concurrency seriously!) so we might learn something from examining that closer https://kotlinlang.org/docs/reference/coroutines/coroutine-context-and-dispatchers.html

@oschulz
Copy link
Contributor Author

oschulz commented May 8, 2020

Getting this idea to be sound and practical within inference seems quite tricky.

I guess it would depend on how deep we want to take it. Context attached to tasks could be done with just a few small, lightweight changes. I think that would cover many use cases already.

Context at a deeper level, propagating through all function calls will come with a higher cost, I guess - but there are quite a few uses cases that would need that, of course, though I think less concerned with scheduling and resource allocation questions.

@tkf
Copy link
Member

tkf commented May 8, 2020

@c42f What do you think about the design of ContextVar{T} I posted above? I think it solves the problem of inferrability. There is still a dynamic code inside getindex(::ContextVar{T}) but LICM'ing it (or maybe rather get(::ContextVar{T}) :: Union{Some{T}, Nothing} which won't throw) sounds much easier with the current infrastructure than implementing the new effect specialization facility. I think ContextVar{T} is a neat design that completely eliminates the type-instability on the user side of the code.

Also, I don't think it's possible to eliminate some dynamic typing. Since we need to allow arbitrary value types, and we can't specialize for the whole set of the context variables currently set (i.e., equivalent to using NamedTuple as the context), the valtype would be Any. So I think type-stabilizing at use-site is a good strategy.

(I wonder if it can be generalized to the strategy for implementing dynamically scoped effect handler. That is to say, type information is statically "hoisted out" to the global/lexical scope and then there is a tiny bits of run-time information invoking those pre-defined entrypoints.) (Edit: actually, I guess this happens automatically anyway in effect handlers like chain-of-custody error handling.)

I also thought about supporting the deterministic parallel RNG. It probably is possible via implementing some kind of hook system via AbstractContextVar. But I fear there will be too much dynamic dispatches on the task creation. Though it looks like Kotlin's context system allows user-defined restoreThreadContext/updateThreadContext? I wonder how they make it efficient. Is it doable because Kotlin is statically typed or something?

@c42f
Copy link
Member

c42f commented May 8, 2020

What do you think about the design of ContextVar{T} I posted above? I

Well, I like it! It's simple and can be implemented right away with only some small changes to the runtime. My comments above about effect systems, etc, are all pretty much pie-in-the-sky speculation at this stage, though it would be nice to have a rough feeling for where things might go.

So anyway, if we provided ContextVar, that at least gives us strongly typed task local storage which would be reasonably efficient, convenient and solve some problems with context propagation which are currently unsolvable for users. It also doesn't seem too far-fetched to teach the Julia optimizer about the scoping semantics so that some form of LICM will work in the medium term. Specialization based on type of context vars could eventually be introduced as an optimization (if it's even feasible).

@oschulz
Copy link
Contributor Author

oschulz commented May 8, 2020

I also thought about supporting the deterministic parallel RNG

That's actually something I already have in BAT.jl

https://github.com/bat/BAT.jl/blob/master/src/rngs/rng_init.jl

This provides reproducible random numbers for hierarchical parallel applications, via hierarchical partitioning of counter-based RNGs .

I'll make that more widely available via ParallelProcessingTools.jl. It's just in BAT.jl currently because I needed it fast and didn't have time to do a nice API at the time. I'll get on it.

@oschulz
Copy link
Contributor Author

oschulz commented May 8, 2020

Context variables would be super-useful to propagate those RNG partitions!

@JeffBezanson
Copy link
Member

I think the core problem is the cost of the key lookup. I believe it would be much too slow for something like RNG.

@oschulz
Copy link
Contributor Author

oschulz commented May 8, 2020

I think the core problem is the cost of the key lookup. I believe it would be much too slow for something like RNG.

Well, I guess in many use cases, the task would get it's RNG once, and then pass it on to the functions it calls as an explicit parameter. And then pass partitions of that RNG to sub-tasks, when they spawn. I really wasn't trying to go for a full resource-injection solution with this issue, just something on the task-level for low/medium-frequency access.

The RNG-story is a bit different from the worker-availability story, though. Which RNG is used for which part of the computation shouldn't depend on number or workers/threads/tasks, for reproducibility, so that often may need to be handled explicitly.

@JeffBezanson JeffBezanson added the parallelism Parallel or distributed computation label May 8, 2020
@tkf
Copy link
Member

tkf commented May 8, 2020

My comments above about effect systems, etc, are all pretty much pie-in-the-sky speculation at this stage, though it would be nice to have a rough feeling for where things might go.

@c42f Yeah, I do like pie-in-the-sky speculations and am very interested in effect systems! Also, I think context variable API helps implementing POC effect handlers, even though it won't be as efficient as we'd want. Some effect handlers don't require too much optimizations when the overhead of the handler itself is large (e.g., a process+thread pool abstracting Distributed and Threads). I think it'd be nice to have a building block for playing with effect handler interfaces and assessing programming experience with it, before start thinking about optimizing the hell out of it (though it's also important to think about compiler-friendly interface at the same time).

@tkf
Copy link
Member

tkf commented May 9, 2020

This provides reproducible random numbers for hierarchical parallel applications, via hierarchical partitioning of counter-based RNGs .

@oschulz It's awesome that you already have counter-based RNGs integrated into a parallel program! But don't you need to do something at @spawn? I guess your other comment confirms that?:

And then pass partitions of that RNG to sub-tasks, when they spawn.

If so, don't you need to create custom @spawn-like API anyway? I'm trying to understand how context variables can be useful in implementing deterministic parallel RNGs in "user land." I thought it'd be rather useless.

@c42f
Copy link
Member

c42f commented May 9, 2020

I think the core problem is the cost of the key lookup. I believe it would be much too slow for something like RNG.

Yeah. To make this fast enough I guess we need to be able to hoist the load of the ContextVar for the global RNG out of any inner loop, and also have a fixed concrete type for that RNG. (The fixed type is somewhat limiting, but no worse than what we have now.) I feel like it should be feasible to model dynamic scope in the optimizer if all setting and getting of ContextVars goes through a well defined interface? That could include eliding the ContextVar load for inlined frames where the innermost ContextVar store is visible. And even eliding the store if the frame is a leaf?

@oschulz
Copy link
Contributor Author

oschulz commented May 9, 2020

@oschulz It's awesome that you already have counter-based RNGs integrated into a parallel program! But don't you need to do something at @Spawn? I guess your other comment confirms that?:

Not for the RNG, no, because all functions I use take an RNG as an explicit argument. Since this is pretty much a standard in most of the Julia ecosystem, it's easy to do that consistently, also with the third-party packages I use. In the beginning, I also had explicit function arguments for resources like threads in BAT.jl, but that was unwieldy and I got rid it when partr came around.

Like I wrote, RNG distribution/partitioning can't always be automatic if the computation should be reproducible independing of parallel execution strategy. But it would still be great to have the option of propagating the RNG via context - user's don't always like to have to pass the RNG explicitly - but at points in between, algorithms that distribute computation will need to do some explicit RNG handling/partitioning.

@tkf
Copy link
Member

tkf commented May 9, 2020

But it would still be great to have the option of propagating the RNG via context

Hmm.... I'm not sure if I understand. Let's consider this snippet:

@contextvar state = 0

function demo()
    @sync begin
        @async println("at task1: ", state[])
        @async println("at task2: ", state[])
        state[] += 1
        println("at task0: ", state[])
    end
end

Then calling demo() will print

at task0: 1
at task1: 0
at task2: 0

in some order. If state were the state of RNG, task1 and task2 will have the same stream. That's not what you want, right?

task1 and task2 can call your functions that use RNG correctly. But, unless every use of @async/@spawn is aware of the RNG state that is in the task context, you can't be sure that you have "independent" streams.

@tkf
Copy link
Member

tkf commented May 10, 2020

I think there are two design questions:

(1) Should Distributed handle it automatically? That is to say, should context variable values be restricted to serializable values? It may be handy to put non-serializable states like files and locks in it so I don't think we should add this restriction. But this would make it impossible to implement automatic handling by Distributed. (Note that users still can implement context propagation by wrapping Distributed API and manually propagating known-to-be-safe context variables.)

A semi-automatic way may be to add an overloadable function API Distributed.is_serializable_context_value(x) :: Bool or something. This way, Distributed can copy the context for objects that are marked serializable-safe explicitly.

(2) Should it be possible to list context variables currently set? With the current design (only storing a mapping uuid => value in Task), it's not possible to get this. We can store a mapping uuid => (var, value) in Task or something equivalent to get the list but I'm not sure about the performance impact of this idea.

My preference is to add the context variable API without these features and see if we can get away without implementing them.

@tkf
Copy link
Member

tkf commented May 10, 2020

OK so here is a full set of API I'd propose https://tkf.github.io/ContextVariables.jl/dev/. It includes a quick tutorial.

@oschulz
Copy link
Contributor Author

oschulz commented May 10, 2020

If state were the state of RNG, task1 and task2 will have the same stream. That's not what you want, right?

Oh, yes - we will of course, in general, need variants of @spawn, @remotecall, @async and so on that allow for specifying a partial new context (which should be merged with the current context before sending it on).

With my partitioned RNGs, the situations is a bit different, though. An RNGPartition can be passed on as it is, though, because the receiving end will instantiate the actual RNG (e.g. via AbstractRNG(rngpart::RNGPartition{R}, i::Integer)) it should use based on which part of the calculation it will handle next, not based on it's position in the worker/task hierarchy.

However, parts of the calculation that are not aware of RNGs (e.g. some 3rd party package), should pass on the RNG or RNG partition unchanged automatically, if possible.

Should Distributed handle it automatically?

I think yes.

The idea would be that a hierarchical calculation will, at certain points, need to control how resources (workers, RNG, etc.) are to be partitioned

@c42f
Copy link
Member

c42f commented May 10, 2020

OK so here is a full set of API I'd propose https://tkf.github.io/ContextVariables.jl/dev/. It includes a quick tutorial.

I haven't had time to read all the implementation yet, but the API looks nice. I particularly like the section on data races. I think you've nailed the correct semantics there: the runtime must ensure that the ContextVar get/set is data race free between Tasks, but the use of any context var values will need to be made threadsafe by the user.

For implementation of storage we could add an extra context field on Task, which, similar to logstate, is copied at task creation but is otherwise free to use in this work. Then store something like ImmutableDict in that context field. (Ideally logstate would eventually be removed and replaced with a ContextVar.)

Note that users still can implement context propagation by wrapping Distributed API and manually propagating known-to-be-safe context variables

This seems like it can be worked around by the application author but it's going to be ugly; in principle they can make a big list of ContextVars they care about. But it seems like libraries making use of ContextVar internally won't be able to compose with libraries using Distributed?

It's a tricky problem because in general remote calls can't know which of the context vars will actually be used. So should it be an error to have a contextvar installed which can't be serialized, but which won't be used on the remote side? I would think not. Perhaps it could be made to work by allowing the remote call and sending all context vars, but with the value of any non-serializable context var poisoned so that any use of it on the remote side will result in an exception.

Should it be possible to list context variables currently set

I agree we might be get away without this. The proposed API seems to be one where module authors are likely to create a ContextVar for their own internal purposes within a module. So if nobody else can list those it seems fine.

On the other hand, it would be really useful to be able to list the attached ContextVars during debugging. But for that a separate global lookup table based on uuid might suffice anyway?

@tkf
Copy link
Member

tkf commented May 10, 2020

The idea would be that a hierarchical calculation will, at certain points, need to control how resources (workers, RNG, etc.) are to be partitioned

@oschulz Yeah, I agree it'd be nice to be automatic for these cases. But, as I mentioned, it has some undesirable consequences. For example, some objects like files and locks can't cross process boundaries. It'd be problematic if a user accidentally put huge arrays in the context variable.

(Another way may be to use default_worker_pool to handle context variables that describe computation resources. Discussed below.)

For implementation of storage we could add an extra context field on Task, which, similar to logstate, is copied at task creation but is otherwise free to use in this work.

@c42f Yeah, I actually have already implemented it like this :) https://github.com/tkf/julia/commits/ctxvars

It's a tricky problem because in general remote calls can't know which of the context vars will actually be used.

One solution may be to make Distributed.default_worker_pool backed up by a context-local variable. That is to say, the process pool becomes a context-dependent thing (but still local to a process). This way, it's (probably) possible to implement context propagation across process boundaries by implementing a custom worker pool on user space. Then such custom worker pool can decide what context variables to propagate. I'm not sure if the worker and worker pool interface is designed to allow this, but, if we are to implement context propagation across processes, it sounds like a nice hook point to make it work. (I guess it's kind of like how LoggingExtras.jl builds composable stack on the user land.)

On the other hand, it would be really useful to be able to list the attached ContextVars during debugging. But for that a separate global lookup table based on uuid might suffice anyway?

Yeah, having an optional lookup table for debugging sounds like a good idea.

@tkf
Copy link
Member

tkf commented May 11, 2020

I opened PR #35833 to add this API so that it'd be easier to comment on implementation/design specific to the API I'm proposing.

@oschulz
Copy link
Contributor Author

oschulz commented May 11, 2020

OK so here is a full set of API I'd propose

I'm not so sure that we should directly inject variables into the scope of the child-tasks. There may be name clashes and it's a potential security issue too - mainly I'm worried about name conflicts though. I think it would be better if context variables were retrieved with an explicit mechanism.

@oschulz
Copy link
Contributor Author

oschulz commented May 11, 2020

Yeah, I agree it'd be nice to be automatic for these cases. But, as I mentioned, it has some undesirable consequences.

Hm, I would hope those problems can be overcome somehow - if only by the user being reasonable.

However, for the original scope of this issue - propagation of available workers - we'd most certainly want fully automatic propagation, so the the information is not lost when using packages that don't use the context-API to spawn local/remote tasks.

@tkf
Copy link
Member

tkf commented May 11, 2020

Thanks for having a look into the API docs.

There may be name clashes

Perhaps the documentation should be fixed to emphasize this but there will be no name clashes, by design. (Unless you can invoke a collision of UUIDs.)

However, for the original scope of this issue - propagation of available workers

The set of available workers is a very dynamic information and I don't think propagating it via "static" mechanism like context variable is a good idea. It would mean that a function doesn't get any update after it is called via @spawn or remotecall. I think a better approach is to create a worker pool abstraction [*1] and then propagate the currently active worker pool using the context variable.

This is why @c42f and I are discussing the effect system here. Task scheduler is a special case of the effect handler (#33248 (comment)) and we need dynamic scoping to implement this (or rather a small subset of effect handler that is still enough for task scheduler). Context variable just provides dynamic scoping and some effect handlers can be implemented on top of it.

[*1] I guess it would need to implement something like the work-stealing approach on top of Threads and Distributed.

@oschulz
Copy link
Contributor Author

oschulz commented May 25, 2020

There's actually precedent for task inheriting information from each other: Tasks do inherit logstate from their parent. I think it would make a lot of sense if tasks also had resources (should probably be a dict) that they inherit, and that the user can add content too.

@oschulz
Copy link
Contributor Author

oschulz commented Jul 9, 2020

We could always check if the information is "too big" to be forwarded via remote call.

@tkf
Copy link
Member

tkf commented Jul 10, 2020

I think that would be possible (and preferable) if we do propagate across task and remote-call boundaries.

It's not possible and not preferable because:

  • There are Julia objects that are inherently process-local. Pointers and file descriptors are basic examples.
  • It is not possible to know if a context variable holding a heap-allocated data structure is meant to capture the identity or the value.
    • For example, if I have a x = Vector{Vector{Int}}, is there some code expecting x[1] to be mutated in-place in a specific way (i.e., this context is propagating the identity)? Or, is the program correct if I do x[1] = copy(x[1]) at random points (i.e., this context is propagating the value)?
  • Not all objects are serialization-safe (e.g., it may need finalizer).
  • The serialization overhead of large objects.

I have proposed various solutions to this problem. I think the discussion would be more productive if you explain why they don't work.

@oschulz
Copy link
Contributor Author

oschulz commented Jul 10, 2020

I have proposed various solutions to this problem. I think the discussion would be more productive if you explain why they don't work.

Uh, maybe we misunderstood each other: I'm with you for rejecting things that can't be forwarded, requiring big objects to be wrapped in a Shared wrapper or so, etc.

Maybe this was a misunderstanding on my side, I had the impression that you didn't want non-explicit propagation to tasks and remotes at all anymore, because you wrote "I don't think automatic propagation to remote workers is reasonable". And I was wondering how many use case would still work if context had to be propagated explicitly - since the code that distributes work to tasks and remotes (say, Transducers :-) ) will often not know about the semantics of the whole context.

I would assume that in the future, we'll have more and more automatic/transparent multi-threaded and also multi-process code execution. So the code that "declares" the context, and the code that "consumes" the context will often not be aware of the task/remote-call barrier in between, and may not share stack. But the code that does the parallelization (say, a multi-threaded broadcast implementation) will not know/care about what's in the context - except for the parts of the context that control parallelization.

So that's why I think context must, in principle, always be automatically forwarded to spawned tasks and to remote calls. But of course we can reject/filter certain types of content, resp. require them to be wrapped appropriately - context should, in my opinion, not be abused as a data store for substantial amounts of data, and that should be discouraged.

@oschulz
Copy link
Contributor Author

oschulz commented Jul 10, 2020

It is not possible to know if a context variable holding a heap-allocated data structure is meant to capture the identity or the value

I guess a clean way around that is to only allow context to refer to immutable values (we do have an immutable array type somewhere, don't we?). If we do that, copies can be made as necessary, transparently, without affecting semantics.

@c42f
Copy link
Member

c42f commented Jul 10, 2020

I would assume that in the future, we'll have more and more automatic/transparent multi-threaded and also multi-process code execution. So the code that "declares" the context, and the code that "consumes" the context will often not be aware of the task/remote-call barrier in between, and may not share stack. But the code that does the parallelization (say, a multi-threaded broadcast implementation) will not know/care about what's in the context - except for the parts of the context that control parallelization.

Yes this is exactly the reason that some portion of the context needs to be propagated automatically

  1. The code which calls Distributed.@spawn exposes (distributed) parallelism
  2. The library which creates the context var knows whether it can be safely distributed (if anything does!)

In general (2) is not the end user's top level application code and there's a good chance that (1) might also not be. So they definitely need to be decoupled.

I think this is why @tkf was suggesting the Shared wrapper. I don't like the idea of manually wrapping and unwrapping, but I suppose the context var get/set interface itself could do that.

@oschulz
Copy link
Contributor Author

oschulz commented Jul 10, 2020

I guess in most (sane) cases, the entries of the context would be fairly small and immutable structures anyhow. I guess we can be fairly rigorous and filter everything out that can't be propagated automatically. Maybe we should actually reject everything we don't "like" during context creation/assignment already, to avoid surprises to the user later on?

@tkf
Copy link
Member

tkf commented Jul 11, 2020

@oschulz Thanks for the clarification. Indeed, it looks like we have a different notion of "automatic", "explicit", etc. To clarify, I've been using "explicit" to mean that the user does something beyond the standard context variable declaration. This is maybe some kind of declarative API to tell Distributed.jl to propagate certain context variables for every remote-call (opt-in). Or, maybe just using a low-level API to reset the remote context (manual). If what you mean by "automatic" is what I mean by "opt-in", yes, we are actually on the same page. Perhaps I should have mentioned "unconditional propagation is not reasonable" instead of "automatic propagation is not reasonable."

Concretely, I think it is reasonable to have API something like

@contextvar PROCESS_LOCAL_CONTEXT = 1   # not forwarded to remote
@shared_contextvar GLOBAL_CONTEXT = 1   # automatically forwarded to remote

(I think it kind of makes sense to call them @contextobject and @contextvalue.)

I guess we can be fairly rigorous and filter everything out that can't be propagated automatically.

This is the point I'm still strongly against as I explained in the last comment #35757 (comment). We should make API explicit and easy to understand and manipulate. Something implicit will cause trouble.

but I suppose the context var get/set interface itself could do that.

@c42f Yeah, I think that's possible. It can be an option to @contextvar or another macro like @shared_contextvar. We can totally hide that there is a wrapper like Shared. Or, there can be two dictionaries as backed. No one other than Distributed.jl and Base would notice this.

@oschulz
Copy link
Contributor Author

oschulz commented Jul 11, 2020

"unconditional propagation is not reasonable" instead of "automatic propagation is not reasonable."

I fully agree. I think it's perfectly fine to expect the user to declare a context variable that is to be propagated automatically in a certain way, and to restrict it to certain types of content. It's certainly good to let the user control over what should be restricted to the current process, and what should propagate beyond.

The "local" @contextvar still be propagated to tasks, though, right? I guess in the future, tasks will be used to much under the hood that the user will often really not even be aware of it. And it's all the same shared memory, so propagation of (almost arbitrary but immutable) content wouldn't be a problem?

@tkf
Copy link
Member

tkf commented Jul 11, 2020

Nice to know that we are on the same page!

The "local" @contextvar

I commented this in the other issue #35833 (comment)

@c42f
Copy link
Member

c42f commented Jul 13, 2020

It can be an option to @contextvar or another macro like @shared_contextvar. We can totally hide that there is a wrapper like Shared. Or, there can be two dictionaries as backed. No one other than Distributed.jl and Base would notice this.

Nice, I think this is the way to go.

My inclination is to have options to @contextvar rather than having @shared_contextvar, as I feel like there could be other fine grained properties which define how to propagate or otherwise manage/define context variables in the future. (For example, what about GPUs? What about context vars which need code to run when new tasks spawn? Generally I feel like we should minimize the differences between in-process vs out-of-process tasks where possible — this binary distinction isn't the only way to categorize available compute.)

@tkf
Copy link
Member

tkf commented Jul 13, 2020

this binary distinction isn't the only way to categorize available compute

Right, it makes sense.

What about context vars which need code to run when new tasks spawn?

I'm glad that you are shooting for this! It'd make it possible to implement something like parallel RNG completely in the user space.

eg hashing and/or traversal in the Dict/HAMT/whatever storage backend.

BTW, regarding the lookup overhead, we can store hash(uuid) in ContextVar object so that it's computed only once. Traversal is still a bottleneck, though. Anyway, it's a micro-optimization we can try at some point.

@tkf
Copy link
Member

tkf commented Jul 13, 2020

Speaking of parallel RNG, it'd require some thing like

@contextvar RNG_STATE = RNGState(...)

where RNGState is mutable, right? In general, "no-set!" direction can simply be defeated by

@contextvar IMMUTABLE_POINTING_TO_MUTABLE = Ref(thing)

How does it interact with the optimization you have in mind? Is it that it's important to make the default get-only?

@c42f
Copy link
Member

c42f commented Jul 13, 2020

where RNGState is mutable [...] How does it interact with the optimization you have in mind?

I think the point here is that context should behave like implicit arguments to child functions from the point of view of the compiler. Then the user has a choice to make context immutable or not as necessary, and the compiler can reason about the values of these variables locally in the same way as normal function arguments.

For context like Ref{Int}(), the compiler still gets benefits in knowing that the address of the Int cannot change, even though it may need to emit a load of the Int after running any child functions.

@tkf
Copy link
Member

tkf commented Jul 13, 2020

For context like Ref{Int}(), the compiler still gets benefits in knowing that the address of the Int cannot change, even though it may need to emit a load of the Int after running any child functions.

Thanks for the explanation. It makes sense.

@c42f
Copy link
Member

c42f commented Jul 13, 2020

What about context vars which need code to run when new tasks spawn?

I'm glad that you are shooting for this! It'd make it possible to implement something like parallel RNG completely in the user space.

Yes exactly! I feel like whatever we come up with here should be able to support both #34852 and logger context in user space with excellent efficiency. If not, we won't really have solved two of the key use cases.

Regarding mutable context, it basically has to be cloned (at the vary least) in the parent task prior to @spawn otherwise we'll generate data races everywhere. This is a worry because it forces some code to run in @spawn even if the functions called in the new task never use that state.

@tkf
Copy link
Member

tkf commented Jul 13, 2020

Right, Ref(thing) example does not really make sense.

I think it's also important to mutate the state in the current context upon @spawn. Otherwise, after

t1 = @spawn ...
t2 = @spawn ...

we have identical RNG state in t1 and t2.

@oschulz
Copy link
Contributor Author

oschulz commented Jul 13, 2020

I'm glad that you are shooting for this! It'd make it possible to implement something like parallel RNG completely in the user space.

RNGs have actually been on my mind as a potentially very important use case for contexts. While we often forward RNG via an explicit parameter (and should), sometimes (e.g. in for a likelihood function that happens to need an RNG internally, but just takes the model parameters as it's input) it would be great to pass it on via context.

For parallel applications, I usually use a counter-based RNG, so that I can use a common seed and partition the random space in a hierarchical fashion - that does require semantic knowledge, but should be easy to do using the proposed with_context.

@oschulz
Copy link
Contributor Author

oschulz commented Jul 13, 2020

@c42f I think the point here is that context should behave like implicit arguments to child functions from the point of view of the compiler.

I think that's the ideal way to define it, semantically.

@NHDaly
Copy link
Member

NHDaly commented Feb 5, 2021

EDIT: Oh, except i've just realized that this thread is about communication across distributed tasks, not necessarily about communication across multithreaded tasks? Or is it covering both?

We've wanted something like this for a while too! :) Thanks for opening the issue and discussing it! 👍

Since nothing like this exists right now, we've been toying with the (dirty) idea of (ab)using the logger to get a context that passes to child tasks, given that we currently do pass the logger through to child tasks.

For example, we were considering a thread-aware tracing framework that does something like this, even though it's clearly terrible:

using Logging

struct TraceLogger <: AbstractLogger
    span_name::String
    parent_logger::AbstractLogger
end

Logging.min_enabled_level(tl::TraceLogger) = Logging.min_enabled_level(tl.parent_logger)
Logging.shouldlog(tl::TraceLogger,args...) = Logging.shouldlog(tl.parent_logger, args...)
Logging.handle_message(tl::TraceLogger,args...) = Logging.handle_message(tl.parent_logger, args...)

ancestor_trace(::Any) = ""
ancestor_trace(tl::TraceLogger) = "$(tl.span_name), $(ancestor_trace(tl.parent_logger))"


function with_span(f, name)
    with_logger(TraceLogger(name, Logging.current_logger())) do
        info = @timed f()
        @info "Finished $(ancestor_trace(Logging.current_logger())): $(info.time)"
        return info.value
    end
end
julia> with_span("a") do 
           @sync begin
               @async begin
                   with_span("b") do
                       @info "HI"
                   end
               end
               @async begin
                   with_span("C") do
                       @info "BYE"
                   end
               end
           end
       end
[ Info: HI
[ Info: BYE
[ Info: Finished b, a, : 0.052175791
[ Info: Finished C, a, : 0.000226425
[ Info: Finished a, : 0.101278193
Task (done) @0x00000001136d58d0

:)

Anyway, yeah, sign us up as another interested party!

@oschulz
Copy link
Contributor Author

oschulz commented Feb 6, 2021

EDIT: Oh, except i've just realized that this thread is about communication across distributed tasks, not necessarily about communication across multithreaded tasks? Or is it covering both?

Definitely both, also across local tasks. And since my original proposal, @tkf, @c42f and @JeffBezanson have taken this idea even further than I had envisioned originally - something that could possibly also be used in time-critical code. I think this could potentially become an extremely powerful mechanism.

@vchuravy
Copy link
Member

In #50958 I made the semi-intentional decision not to address the remote-call part of this proposal.
But a distributed framework could choose to propagate a subset of scoped variables across a remote-call interface.

I have started a small prototype for snapshotting in vchuravy/ScopedValues.jl#6, but I won't be including this into the Base proposal for now. The implementation thereof can live in a package.

(we might also be able to implement a "RemoteScope" on-top of local scopes, but I haven't thought to hard about this).

@oschulz
Copy link
Contributor Author

oschulz commented Sep 8, 2023

Would be great to have "remote-enabled" scoped eventually, I think (if possible), to avoid hard-to-predict behavior in cases where remote operation is transparent to the user.

@vchuravy
Copy link
Member

vchuravy commented Sep 8, 2023

The issue is that copying a scope is a very heavy operation. It's something I decidedly wouldn't want to do on every rpc.

Now a framework should be able to define that it wants to propagate scope with its rpc and snapshot only relevant pieces.
Right now for me the design uncertainty with Distributed.jl is large. It may be that we want something like a RemoteSopedValue akin to a RemoteRef,
but the cost must be local and not global across the program.

I certainly wouldn't want to send CuContext or CuDevice automatically across the wire.

@oschulz
Copy link
Contributor Author

oschulz commented Sep 9, 2023

The issue is that copying a scope is a very heavy operation.

Hm, yes, there is that ... hard to control how much people will put in there.

@vtjnash vtjnash closed this as completed Oct 24, 2023
@vtjnash
Copy link
Member

vtjnash commented Oct 24, 2023

Seems covered by #50958 for the main case above of @async and Threads.@spawn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

No branches or pull requests

7 participants