-
-
Notifications
You must be signed in to change notification settings - Fork 22
Launch Scheduler on remotely or on the local machine? #2
Comments
This is a good question! This is a personal preference, but in my mind, I think it's entirely reasonable to keep it local, which would usually mean a login node in a traditional HPC environment. If you were to launch the scheduler as a separate task, it would have some advantages as you note, but would be restricted in other ways (maximum walltime, etc.) Additionally, even though you could get a hostname, it's not actually guaranteed the hostname would get you the preferred network interface name (for instance, $hostname vs $hostname-ib0 or $hostname-10g or something like that.) |
On every HPC cluster I've ever used, users were restricted in long-running background tasks on login nodes. On some clusters, firewalls prevent connections to outside machines. |
In our research group (~20 people) we use a PBS cluster and all do the following: We use I am currently trying to figure out how we can use The ideal for us would be something like (running on the remote machine): from dask_drmaa import DRMAACluster
cluster = DRMAACluster(ssh='sshserver') I am very willing to help you to test this :) |
@basnijholt a simple way to use this would be the following: ssh loginnode
dask-drmaa 20 # launch scheduler and 20 workers |
I know how to start the scheduler and the workers. The issue is to connect the client to the remote cluster. I tried to tunnel the scheduler port, but that it's working. |
Generally I think that trying to help people get around local network policies is probably out of Dask's scope. I'm open to suggestions if you think there are general solutions to this, but I suspect that the right solution is "talk to your network administrator". |
I don't think our network needs anything special. I am able to tunnel ports and connect to an IPython Parallel cluster for example. Maybe I did something incorrectly, but I only tunneled the scheduler port to the machine on which I run a notebook. Do I need to do something more, like manually transfer files? |
Workers will need to be able to run dask-worker. Dask-drmaa generally assumes that your worker machines have the same software environment as your login node. This conversation is getting a bit off topic from the original topic of this issue. If you have further questions I recommend opening a new issue. |
Figured I'd leave this note in case it helps someone even though this issue seems dormant. FWIW have been successfully using dask-drmaa by doing the following.
Did the same thing with ipyparallel previously and that also worked quite well. |
Do we want to launch the
dask-scheduler
process on the cluster or do we want to keep it local?Keeping the scheduler local simplifies things, but makes it harder for multiple users to share the same scheduler.
Launching the scheduler on the cluster is quite doable, but we'll need to learn on which machine the scheduler launched. I know how to do this through the SGE interface, but it's not clear to me that it is exposed through DRMAA. See this stackoverflow question. Alternatively we could pass the location of the scheduler back through some other means. This could be through a file on NFS (do we want to assume the presence of a shared filesystem?) or by setting up a tiny TCP Server to which the scheduler connects with its information.
The current implementation just launches the scheduler on the user's machine. Barring suggestions to the contrary my intention is to move forward with this approach until there is an obvious issue.
The text was updated successfully, but these errors were encountered: