Launch Scheduler on remotely or on the local machine? #2

mrocklin · 2016-10-13T17:25:36Z

Do we want to launch the dask-scheduler process on the cluster or do we want to keep it local?

Keeping the scheduler local simplifies things, but makes it harder for multiple users to share the same scheduler.

Launching the scheduler on the cluster is quite doable, but we'll need to learn on which machine the scheduler launched. I know how to do this through the SGE interface, but it's not clear to me that it is exposed through DRMAA. See this stackoverflow question. Alternatively we could pass the location of the scheduler back through some other means. This could be through a file on NFS (do we want to assume the presence of a shared filesystem?) or by setting up a tiny TCP Server to which the scheduler connects with its information.

The current implementation just launches the scheduler on the user's machine. Barring suggestions to the contrary my intention is to move forward with this approach until there is an obvious issue.

The text was updated successfully, but these errors were encountered:

davidr · 2016-10-17T15:57:25Z

This is a good question!

This is a personal preference, but in my mind, I think it's entirely reasonable to keep it local, which would usually mean a login node in a traditional HPC environment. If you were to launch the scheduler as a separate task, it would have some advantages as you note, but would be restricted in other ways (maximum walltime, etc.) Additionally, even though you could get a hostname, it's not actually guaranteed the hostname would get you the preferred network interface name (for instance, $hostname vs $hostname-ib0 or $hostname-10g or something like that.)

richardotis · 2016-10-21T21:02:49Z

On every HPC cluster I've ever used, users were restricted in long-running background tasks on login nodes. On some clusters, firewalls prevent connections to outside machines.

basnijholt · 2017-01-16T13:38:22Z

In our research group (~20 people) we use a PBS cluster and all do the following:

We use ipyparallel to start both an ipcontroller at the headnode and ipengines on the nodes. Then we use a Jupyterhub on a different machine and connect the Client over ssh (with some custom scripts).

I am currently trying to figure out how we can use dask in our workflow. This package seems like a something that we could definitely use.

The ideal for us would be something like (running on the remote machine):

from dask_drmaa import DRMAACluster
cluster = DRMAACluster(ssh='sshserver')

I am very willing to help you to test this :)

mrocklin · 2017-01-20T22:09:22Z

@basnijholt a simple way to use this would be the following:

ssh loginnode
dask-drmaa 20  # launch scheduler and 20 workers

basnijholt · 2017-01-21T11:08:14Z

I know how to start the scheduler and the workers. The issue is to connect the client to the remote cluster.

I tried to tunnel the scheduler port, but that it's working.

mrocklin · 2017-01-21T12:38:34Z

Generally I think that trying to help people get around local network policies is probably out of Dask's scope. I'm open to suggestions if you think there are general solutions to this, but I suspect that the right solution is "talk to your network administrator".

basnijholt · 2017-01-22T22:54:35Z

I don't think our network needs anything special. I am able to tunnel ports and connect to an IPython Parallel cluster for example.

Maybe I did something incorrectly, but I only tunneled the scheduler port to the machine on which I run a notebook. Do I need to do something more, like manually transfer files?

mrocklin · 2017-01-23T23:27:34Z

Workers will need to be able to run dask-worker. Dask-drmaa generally assumes that your worker machines have the same software environment as your login node. This conversation is getting a bit off topic from the original topic of this issue. If you have further questions I recommend opening a new issue.

jakirkham · 2018-02-24T00:55:01Z

Figured I'd leave this note in case it helps someone even though this issue seems dormant.

FWIW have been successfully using dask-drmaa by doing the following.

Logging in to the login node.
Launching an interactive job.
Starting the Jupyter notebook.
Opening a two hop ssh tunnel to the notebook.
Launching dask-drmaa as one of the steps in the notebook.

Did the same thing with ipyparallel previously and that also worked quite well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Launch Scheduler on remotely or on the local machine? #2

Launch Scheduler on remotely or on the local machine? #2

mrocklin commented Oct 13, 2016

davidr commented Oct 17, 2016

richardotis commented Oct 21, 2016

basnijholt commented Jan 16, 2017

mrocklin commented Jan 20, 2017

basnijholt commented Jan 21, 2017

mrocklin commented Jan 21, 2017

basnijholt commented Jan 22, 2017 •

edited

Loading

mrocklin commented Jan 23, 2017

jakirkham commented Feb 24, 2018

Launch Scheduler on remotely or on the local machine? #2

Launch Scheduler on remotely or on the local machine? #2

Comments

mrocklin commented Oct 13, 2016

davidr commented Oct 17, 2016

richardotis commented Oct 21, 2016

basnijholt commented Jan 16, 2017

mrocklin commented Jan 20, 2017

basnijholt commented Jan 21, 2017

mrocklin commented Jan 21, 2017

basnijholt commented Jan 22, 2017 • edited Loading

mrocklin commented Jan 23, 2017

jakirkham commented Feb 24, 2018

basnijholt commented Jan 22, 2017 •

edited

Loading