-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workers only connect to scheduler when cluster started in IPython #293
Comments
`cluster.scale` does not block until the workers are started. So your script submits the slurm jobs, prints the address of the (still empty cluster), and exits.
Am 21. Juli 2019 01:34:59 schrieb salotz <notifications@github.com>:
… Perhaps I am missing something obvious here but I wrote a little script to get something up and running:
```
if __name__ == "__main__":
import sys
from dask_jobqueue import SLURMCluster
num_workers = int(sys.argv[1])
cluster = SLURMCluster(project='dicksonlab',
cores=1,
walltime="00:05:00",
memory='3 GB',
processes=1,
interface='ib0')
cluster.scale(num_workers)
print(cluster.address)
```
If I execute this from an IPython session (like in every demo I've seen) everything is okay and the logs of my worker jobs show that they have connected.
However, if I just execute this script (also tried not in the `__name__` guard) then it all starts and runs (and suspiciously returns the prompt), but the workers never connect and eventually timeout.
```
distributed.worker - INFO - Waiting to connect to: tcp://10.3.8.48:38990
```
After looking at the source I noticed the remarks that mention this is a planned feature https://github.com/dask/dask-jobqueue/blob/master/dask_jobqueue/deploy/cluster_manager.py#L54
I still think this is a noteworthy consequence of that problem as it points out that the tool is reallly tied to IPython and/or Jupyter notebooks, which I don't really use. At least a warning in the docs would help for now though, that is unless someone has a workaround.
Cheers, and thanks for all the hard work on this, really makes my life with SLURM et al. much easier.
~Sam
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#293
|
If you want to wait for workers to be available, you can use More generally, when you run a script using dask you need to have about something that blocks until you get the result you need (e.g. using To be honest this is a caveat when you start with dask and you run python scripts. If you see a good way to add this to the documentation (probably in the For the record, dask and its subprojects are not tied to IPython or Jupyter notebook. I am going to close the issue, @salotz feel free to comment if you feel your answer has not been answered to its full extent. |
Thanks for the suggestions! One of these will work for me. |
Perhaps I am missing something obvious here but I wrote a little script to get something up and running:
If I execute this from an IPython session (like in every demo I've seen) everything is okay and the logs of my worker jobs show that they have connected.
However, if I just execute this script (also tried not in the
__name__
guard) then it all starts and runs (and suspiciously returns the prompt), but the workers never connect and eventually timeout.After looking at the source I noticed the remarks that mention this is a planned feature https://github.com/dask/dask-jobqueue/blob/master/dask_jobqueue/deploy/cluster_manager.py#L54
I still think this is a noteworthy consequence of that problem as it points out that the tool is reallly tied to IPython and/or Jupyter notebooks, which I don't really use. At least a warning in the docs would help for now though, that is unless someone has a workaround.
Cheers, and thanks for all the hard work on this, really makes my life with SLURM et al. much easier.
~Sam
The text was updated successfully, but these errors were encountered: