-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client.gather not awaiting futures created with Client.run #4947
Comments
tbh, I didn't fully understand what you are trying to achieve. Are you deliberately using Client.run to schedule your tasks or did you intend to use Client.submit? The notable difference is that Why this matters in this case is that |
@fjetter I will make my use case a bit more concrete. Hopefully this in addition to the high level description I gave above will make things sufficiently clear. Firstly though, I am deliberately using
Now to make my use case a bit more concrete. A process graph is defined as a set of
An entire simulation is run to completion by setting all nodes to run until they receive a termination signal. To handle this there is a
Note that in the actual code there is extra logic to handle propogating a termination signal and shutting everything down. Not shown here for simplicity. What I would like to do is to be able to schedule these nodes to run until completion on specific workers. I first tried to use Each of these nodes can have arbitrary state and may include for example trained machine learning models. It's therefore not desirable that they should move around between workers at different compute steps. Each node should be located on a single worker and run to completion. In terms of getting back distributed results this is not so important as this will be handled by I'm exploring Dask distributed as it provides a nice interface for either running these simulations on a single host or using a kubernetes cluster. I appreciate that my use case might not be what dask.distributed is really designed for and so it may not be the best fit. |
I think your use case is not far off from typical usage patterns. However, scheduling coroutines is indeed something we do not directly support. At the very least the core scheduling machinery will be ignored if you are using this, e.g. you can not create dependencies between these tasks if you are using Have you tried something like In [1]: from distributed import Client
In [2]: client = Client()
In [3]: def wrap_async_stuff():
...:
...: import asyncio
...:
...: async def foo():
...: print("do work")
...: await asyncio.sleep(0.1)
...: print("done")
...: return "success"
...:
...: loop = asyncio.get_event_loop()
...: return loop.run_until_complete(foo())
...:
In [4]: fut = client.submit(wrap_async_stuff)
In [5]: do work
done
In [5]:
In [5]: fut.result()
Out[5]: 'success'
FYI we also implement a publish-subscribe pattern but I believe our docs in that area are missing |
@fjetter thanks for that suggestion with your example above. That approach hadn't occurred to me. I will try it out today. I'd like to try it with the k8s deployment of dask but I'm finding that when I deploy the default helm chart on my machine the workers cannot discover the scheduler for some reason. Will open a separate issue on that front if I can't figure out the problem. In terms of the pubsub stuff I did notice that somewhere. At the moment Queue fits nicely for us behind the abstract interface we've defined with our |
@chrisk314 we recently merged a change to support async tasks directly, see #5151 Is there anything else left to do in this issue? |
This should be closed via #5151. @chrisk314 feel free to re-open if that's not the case |
Hi, I'm struggling with running async functions concurrently using dask distributed. I'm attempted to use
client.run
to launch some tasks on dedicated workers in conjunction withclient.gather
to retrieve the results. As far as I can tell from reading the docs, my approach should be correct hence I am raising it as an issue here; however, I may be missing something, in which case the docs could potentially be improved.For context, I'm building an application in which user defined classes represent nodes within a process graph (think manufacturing plant etc). The nodes execute bespoke code and communicate data via channels (e.g. dask.distributed.Queue). Nodes in the graph may have a large memory footprint (i.e. they contain trained machine learning models). Each node should execute all of its iterations on a single worker until it receives a termination signal. To satisfy this requirement I am using
client.run
and specifying a single worker, assigning workers to nodes in a round robin fashion. I realise this pattern may not be ideal and is perhaps a bit of a hack; I'm currently exploring how to implement this.I have created a minimal example which follows the same pattern as my actual application code and reproduces the same issue.
What happened:
I create a list of futures by calling
client.run
in a loop and passing different arguments to a function targetted to execute on specific workers. I subsequently callclient.gather
to get back the results from this set of futures. Instead of waiting for the functions to execute, control continues past theclient.gather
call and the application exits with the below exception.If I add in a call to
dask.distributed.wait(futures)
before the call toClient.gather
then exactly the same behaviour is observed.What you expected to happen:
I expect that calling
Client.gather
will wait for all the futures to execute and return the results from the futures rather than just returning the futures themselves. Additionaly, I expect that if I calldask.distributed.wait
on the list of futures, that all the futures passed in will be awaited.Minimal Complete Verifiable Example:
Anything else we should know
Client.gather
is replaced withasyncio.gather
then the expected behaviour is observed.foo
with the blocking functionbar
gives the same results.Environment:
The text was updated successfully, but these errors were encountered: