Wait for workers to join before continuing #2138

jacobtomlinson · 2018-07-23T09:07:12Z

It would be useful to have the cluster.scale() method block until the workers join. This was originally raised in dask-kubernetes but is applicable to dask-jobqueue and dask-yarn so would make more sense to implement here.

Suggestions from @mrocklin:

One solution here might be to use SchedulerPlugins to explicitly trigger events whenever a new worker reaches the scheduler. Calling wait on the cluster object might register a tornado.locks.Event or tornado.locks.Condition that gets triggered when the right number of workers have arrived.

@jhamman implemented a SchedulerPlugin for dask-jobqueue in order to improve adaptive operation (he wanted adaptive to be aware of pending jobs). This work happened in dask/dask-jobqueue#63 and might be the sort of thing we could think about generalizing. Any thoughts on this @jhamman?

The text was updated successfully, but these errors were encountered:

guillaumeeb · 2018-08-03T15:15:00Z

Hi there. From dask-jobqueue perspective, I don't think it would be a good thing to wait until all workers join. Maybe block until one worker is there, but with job scheduler, waiting for all of them could take hours if not more. Maybe add a kwarg to scale() ?

From what I remember, using the plugin as suggested by @mrocklin looks simple enough.

jacobtomlinson · 2018-08-03T15:32:41Z

You make a good point. I'm making a few assumptions here.

The first is that I'm using a cluster which doesn't make me wait very long to give me workers (the cloud). I'm talking about interactive jobs rather than batch jobs.

I'm also assuming that I'm doing something brittle which needs all the workers before starting. The primary example of this for me right now is doing a live demo, I don't want it to run on one worker as it will be underwhelming. I want to scale and wait until I have a nice big cluster before plowing through my graph.

I appreciate that might not be a good enough reason to do this. I could write a little script to block until workers reaches n for live demos.

mrocklin · 2018-08-03T16:05:10Z

I think it would make sense to have an explicit wait method or wait= keyword to scale that would allow a user to opt-in to blocking semantics. I would use this for benchmarking.

jacobtomlinson · 2018-08-03T18:48:57Z

Perhaps we could have wait=bool and/or wait_for=int so you could specify a minimum number to wait for before starting. This may fit better with jobqueue.

guillaumeeb · 2018-08-03T20:09:10Z

Yep, I had something like this in mind. Being able to wait for a user defined number of workers to be online is something that would definitly be useful!

jakirkham · 2018-09-08T02:12:52Z

FWIW code like this has served this purpose quite well for us. Would be easy enough for someone to add it if they wish.

while ((client.status == "running") and (len(client.scheduler_info()["workers"]) < nworkers)):
    sleep(1.0)

guillaumeeb · 2018-09-12T10:51:01Z

Could we use a non active wait solution for that?

Maybe using http://distributed.dask.org/en/latest/plugins.html#scheduler-plugins and some callback ?

Do you have suggestion on this @mrocklin ?

mrocklin · 2019-03-04T02:37:58Z

Adding some constraints here:

Currently the scale method is fast so that it can be reliably used within the event loop. Typically scale starts some jobs and then returns immediately. If we decide that it should block, then it has to block in an async-friendly way. This means that scale has to become a coroutine if we're in the event loop (it has to return a future), and a blocking function if we're not. This will require some finesse.

We might solve this in a couple of ways:

We could change the contract of scale across all of the projects. This might be necessary at some point anyway.

We could not add the wait= keyword to scale, and instead add a second method, .wait() which followed the coroutine semantics commonly used in the dask.distributed codebase.

async def _wait(self, n):
    while not condition():
        await gen.sleep(0.01)

def wait(self, n):
    return self.sync(self._wait, n)

guillaumeeb · 2019-03-04T20:54:56Z

Just for my understanding, if we do not modify scale and instead just add a wait method, why would we want it to follow the common coroutine semantics?

mrocklin · 2019-03-04T21:03:41Z

In case someone wanted to use wait from within an asynchronous environment.

async with SGECluster(...) as cluster:
    cluster.scale(10)
    await cluster.wait()
    x.compute()

elliottslaughter · 2019-04-12T20:38:28Z

As another data point, this is impacting my use of Dask in an HPC environment (and some thoughts on CLI integration below).

My jobs are started by a job scheduler (SLURM), and at the start of the job I want to spin up a scheduler and set of workers. For reasons not worth mentioning here, I'm doing this by calling dask-scheduler and dask-worker rather than dask-ssh or dask-mpi, but I believe the same issue appears no matter which route I go. Conceptually the setup looks like:

dask-scheduler &
srun ... dask-worker ... &
python my_dask_script.py

There is a race between the three steps. This is not a problem for correctness since the worker knows to retry if the scheduler isn't up yet, and the scheduler knows to wait for at least one worker to arrive before allowing the client to start running. But in my case this is a performance issue because I want to do accurate timing of execution, and if the client starts before all workers have arrived then the run isn't reflective of the steady state performance.

Currently I'm doing a fairly awful workaround based on parsing the log files from scheduler and counting the number of times it reports workers. You can see my script below. I'm not sure how exactly the API mentioned in the original post would integrate with my workflow; presumably this would need to be exposed via the CLI somehow. I would prefer not to have to write custom scripts to start Dask but I suppose I can do that if absolutely necessary.

https://github.com/elliottslaughter/task-bench/blob/dask/experiments/cori_metg_compute/metg_dask.sh#L18-L43

martindurant · 2019-04-12T20:42:58Z

Could you create the client in the script, and then poll client.ncores() until you see as many workers are you are happy with?

elliottslaughter · 2019-04-12T20:47:50Z

I think could make that work. In my case I know how many nodes I'm planning to boot, so I can just tell that to the client.

Is there API documentation for this? I was trying to figure out if e.g. the num_workers keyword argument to compute was applicable to the distributed scheduler, but never found anything beyond this page (which only mentions multiprocessing and threaded schedulers): https://docs.dask.org/en/latest/scheduler-overview.html#configuring-the-schedulers

martindurant · 2019-04-12T20:49:54Z

https://distributed.readthedocs.io/en/latest/api.html#distributed.Client.ncores

The method just gives information of the scheduler's view of the cluster.

mrocklin · 2019-04-12T21:15:15Z

You can also look at Client.scheduler_info, which is a decent all-purpose method to get information about the general state.

…

On Fri, Apr 12, 2019 at 3:49 PM Martin Durant ***@***.***> wrote: https://distributed.readthedocs.io/en/latest/api.html#distributed.Client.ncores The method just gives information of the scheduler's view of the cluster. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2138 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszLv2C8QIZ0BlSWL31ROErQ2A-CSfks5vgPFzgaJpZM4VaoCy> .

mrocklin · 2019-04-12T21:16:00Z

But, we should probably solve this problem properly at some point. It's well in-scope for the project. If you or anyone you know would like to work on it we'd be happy to help walk them through it. Otherwise hopefully it gets implemented within a few months.

…

On Fri, Apr 12, 2019 at 4:15 PM Matthew Rocklin ***@***.***> wrote: You can also look at Client.scheduler_info, which is a decent all-purpose method to get information about the general state. On Fri, Apr 12, 2019 at 3:49 PM Martin Durant ***@***.***> wrote: > > https://distributed.readthedocs.io/en/latest/api.html#distributed.Client.ncores > > The method just gives information of the scheduler's view of the cluster. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2138 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AASszLv2C8QIZ0BlSWL31ROErQ2A-CSfks5vgPFzgaJpZM4VaoCy> > . >

guillaumeeb · 2019-04-13T13:31:16Z

@elliottslaughter I guess you using dask-jobqueue? Have you seen this PR: dask/dask-jobqueue#223 by @danpf?

It is not properly finished, but may give you some insights.

jacobtomlinson mentioned this issue Jul 23, 2018

Add flag to block until scaling finishes dask/dask-kubernetes#87

Closed

jhamman mentioned this issue Aug 20, 2018

Should dask-mpi section be marked as not necessary in the docs? dask/dask#3889

Open

lesteve mentioned this issue Jan 17, 2019

added wait until n workers arguments dask/dask-jobqueue#223

Closed

danpf mentioned this issue May 13, 2019

Wait for n workers before continuing #2688

Merged

mrocklin closed this as completed in #2688 May 15, 2019

lesteve mentioned this issue Jul 22, 2019

Workers only connect to scheduler when cluster started in IPython dask/dask-jobqueue#293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for workers to join before continuing #2138

Wait for workers to join before continuing #2138

jacobtomlinson commented Jul 23, 2018

guillaumeeb commented Aug 3, 2018

jacobtomlinson commented Aug 3, 2018

mrocklin commented Aug 3, 2018

jacobtomlinson commented Aug 3, 2018

guillaumeeb commented Aug 3, 2018

jakirkham commented Sep 8, 2018

guillaumeeb commented Sep 12, 2018

mrocklin commented Mar 4, 2019

guillaumeeb commented Mar 4, 2019

mrocklin commented Mar 4, 2019

elliottslaughter commented Apr 12, 2019 •

edited

Loading

martindurant commented Apr 12, 2019

elliottslaughter commented Apr 12, 2019

martindurant commented Apr 12, 2019

mrocklin commented Apr 12, 2019 via email

mrocklin commented Apr 12, 2019 via email

guillaumeeb commented Apr 13, 2019

Wait for workers to join before continuing #2138

Wait for workers to join before continuing #2138

Comments

jacobtomlinson commented Jul 23, 2018

guillaumeeb commented Aug 3, 2018

jacobtomlinson commented Aug 3, 2018

mrocklin commented Aug 3, 2018

jacobtomlinson commented Aug 3, 2018

guillaumeeb commented Aug 3, 2018

jakirkham commented Sep 8, 2018

guillaumeeb commented Sep 12, 2018

mrocklin commented Mar 4, 2019

guillaumeeb commented Mar 4, 2019

mrocklin commented Mar 4, 2019

elliottslaughter commented Apr 12, 2019 • edited Loading

martindurant commented Apr 12, 2019

elliottslaughter commented Apr 12, 2019

martindurant commented Apr 12, 2019

mrocklin commented Apr 12, 2019 via email

mrocklin commented Apr 12, 2019 via email

guillaumeeb commented Apr 13, 2019

elliottslaughter commented Apr 12, 2019 •

edited

Loading