Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to start a Scheduler in a batch job #186

Open
guillaumeeb opened this issue Oct 30, 2018 · 13 comments
Open

Allow to start a Scheduler in a batch job #186

guillaumeeb opened this issue Oct 30, 2018 · 13 comments

Comments

@guillaumeeb
Copy link
Member

guillaumeeb commented Oct 30, 2018

One of the goal of ClusterManager object is to be able to launch a remote scheduler. In dask-jobqueue scope, this probably means submitting a job which will start a Scheduler, and then connect to it.

We probably still lacks some remote interface between ClusterManager and scheduler object for this to work, so it will probably mean to extend APIs upstream.

Identified Scheduler method to provide:

  • retire_workers(n, minimum, key)
  • scheduler_info(), already existing, see if sufficient,
  • add_diagnostic_plugin(), and mostly retrive plugin information remotely

I suspect that adaptive will need to change significantly too, this will maybe lead to having a transitional adaptive logic in dask-jobqueue, and other remote function to add in scheduler.

This is in the scope of #170.

@guillaumeeb guillaumeeb changed the title ClusterManager: Reflexion on how to start a remote Scheduler Allow to start a Scheduler in a batch job Oct 11, 2019
@guillaumeeb
Copy link
Member Author

With #306 we've got a lot of what is needed to start a Scheduler in a dedicated batch job.

I also believe we've got all what is needed in Distributed side too, as dask-kubernetes don't use a LocalCluster anymore, see dask/dask-kubernetes#162.

@mrocklin
Copy link
Member

mrocklin commented Oct 11, 2019 via email

@lesteve
Copy link
Member

lesteve commented Oct 15, 2019

Just curious could you elaborate a bit more about why this would be useful. I have some guesses but I just want to make sure they are somewhat accurate.

  • I am guessing that the main advantage is that the scheduler does not run on the cluster login node anymore (most users around me run their own jupyter notebook server on the login node so the scheduler is on the login node as well, this is probably different with a JupyterHub setup). I do understand that doing that would be considered best practices by HPC sys-admin. I'd be curious if you have met some cases where the scheduler goes wild and consumes a lot more than expected or maybe there are a lot of people having their own Dask scheduler on the login node and it starts to add up.
  • IMO rather than the scheduler consuming more resources than it should what is a lot more likely to happen is that the client script will, for example because once you have your results from your job you do some not so light post-processing and you forget you are on the login node and not really supposed to do that.
  • I had some feed-back from users that told me that the ideal workflow for them would be to have the client code (the one that call client.submit) on their local machine (more flexible to install what you need, more familiar, nicer working conditions e.g. the shared folders are very slow on our cluster, think ipython takes 20-30s to launch sometimes and this is not something that is likely to change, etc ...) and be able to launch jobs on the cluster. Would such a workflow be made easier by the "remote scheduler" feature?

@mrocklin
Copy link
Member

In practice I doubt that the scheduler will be expensive enough that system administrators will care. They all ask about this, but I don't think that it will be important in reality.

Another reason to support this is for networking rules. In some systems (10-20%?) compute nodes are unable to connect back to login nodes. So here placing the scheduler on a compute node, and then connecting to that node from the client/login node is nice.

It may be though that this is a frequently requested but not actually useful feature.

@lesteve
Copy link
Member

lesteve commented Oct 16, 2019

Another reason to support this is for networking rules. In some systems (10-20%?) compute nodes are unable to connect back to login nodes.

In #354 @orbitfold seems in this particular case (at least for some clusters he has access too). @mrocklin could you give a few pointers that would help getting started on implementing a remote scheduler (my understanding is that using SpecCluster will help but I never had the time to investigate why exactly ...) ?

@mrocklin
Copy link
Member

We need to make an object that follows this interface:

https://github.com/dask/distributed/blob/e7a2e6d41e0b719866769713d8f41cb5fcfbf6e8/distributed/deploy/spec.py#L22-L30

The simplest example today is probably SSHCluster: https://github.com/dask/distributed/blob/master/distributed/deploy/ssh.py

But rather than start things with SSH, it would presumably submit a job.

@lesteve
Copy link
Member

lesteve commented Oct 17, 2019

Thanks a lot!

FWIW another report in #355 where @hawk-sf seems to have some limitation on allowed ports to connect login node and workers.

@manuel-rhdt
Copy link

In practice I doubt that the scheduler will be expensive enough that system administrators will care. They all ask about this, but I don't think that it will be important in reality.

Another reason to support this is for networking rules. In some systems (10-20%?) compute nodes are unable to connect back to login nodes. So here placing the scheduler on a compute node, and then connecting to that node from the client/login node is nice.

It may be though that this is a frequently requested but not actually useful feature.

I am currently using dask-joblib on a PBS cluster and running the scheduler on the login node. It is indeed a bit problematic because the login node has only 2gb of memory and it quickly runs out if I am not careful with the size of computation graphs.

So I think I would definitely benefit from this feature.

@lesteve
Copy link
Member

lesteve commented Dec 22, 2019

Interesting, thanks for this use case! 2GB is certainly very small even more so when shared between all the cluster users. Are there some other nodes you can ssh to to do heavier work e.g. compilatiion of C++ code? On some clusters I am familiar with they are called devel nodes but the naming may well be different in your cluster.

One work-around in your use case is so start an interactive job where you launch your Dask scheduler i.e. you run your python script creating your PBSCluster. Beware: if you lose your Dask scheduler because your interactive job expires you lose all your computations see https://distributed.dask.org/en/latest/resilience.html#scheduler-failure for more details.

@manuel-rhdt
Copy link

manuel-rhdt commented Dec 22, 2019

The cluster I am using is very small and has only one type of compute node. I am not sure what you mean by an "interactive" job. Maybe what you say is that I should start the PBSCluster from a compute node. This does not work however, since it is impossible to submit new PBS jobs from a compute node (it can only be done from the login node).

The current workaround for me is to use a scheduler file to set up the communication between scheduler and workers using the shared file system (without using dask-jobqueue).

Ideally I could run a python script on the login node that launches PBS jobs for both dask-scheduler and dask-worker and then I connect to dask-scheduler from my local workstation to submit my computations (using a ssh tunnel).

@lesteve
Copy link
Member

lesteve commented Dec 22, 2019

The cluster I am using is very small and has only one type of compute node. I am not sure what you mean by an "interactive" job.

An interactive job means you submit a job through your job scheduler and you end up in an interactive shell on a compute node. Look at this for an example on a cluster that use PBS.

This does not work however, since it is impossible to submit new PBS jobs from a compute node (it can only be done from the login node).

This is a mildly annoying restriction if you ask me, maybe try to talk to IT to see whether they would be ready to lift this restriction. People can still do it if they want using ssh login-node qsub. There was an issue where modifying submit_cmd to do that was reported to work, see #257 (comment).

The current workaround for me is to use a scheduler file to set up the communication between scheduler and workers using the shared file system (without using dask-jobqueue).

I am guessing you mean https://docs.dask.org/en/latest/setup/hpc.html#using-a-shared-network-file-system-and-a-job-scheduler. This seems like a reasonable work-around.

I connect to dask-scheduler from my local workstation to submit my computations (using a ssh tunnel).

I have had other people tell me a similar thing (third bullet point of #186 (comment)). If you manage to make that work, please let us know (ideally in a separate issue).

@manuel-rhdt
Copy link

An interactive job means you submit a job through your job scheduler and you end up in an interactive shell on a compute node. Look at this for an example on a cluster that use PBS.

Thank you! I didn't know this and it is very useful!

This is a mildly annoying restriction if you ask me, maybe try to talk to IT to see whether they would be ready to lift this restriction. People can still do it if they want using ssh login-node qsub. There was an issue where modifying submit_cmd to do that was reported to work, see #257 (comment).

I'll see if I can talk to IT next year!

I am guessing you mean https://docs.dask.org/en/latest/setup/hpc.html#using-a-shared-network-file-system-and-a-job-scheduler. This seems like a reasonable work-around.

Yes this is what I meant. It does work well enough in my case because I do not need adaptive scaling of workers (yet). It is a little unfortunate to have to start the cluster separately (not from the notebook that I use for my computations).

@lesteve
Copy link
Member

lesteve commented Mar 25, 2020

@muammar I see that you have commented in #390 (comment). Could you please explain the admin rules that are in place on your cluster just to get an idea what you are allowed to do on your cluster.

You may be interested by my answer above: #186 (comment). Let me try to some up:

  1. launch the scheduler (i.e. create the Cluster object) in an interactive job: probably easier. If you like to work in a Jupyter environment, this is doable this way. There are a few hoops to jump through (mostly SSH port forwarding to open your Jupyter notebook in your local browser at localhost:<some-port>).
  2. launch the scheduler (i.e. create the Cluster object) in a batch job. Only Python scripts not Jupyter environment.
  3. Starting the Dask worker by launching Dask commands yourself: https://docs.dask.org/en/latest/setup/hpc.html#using-a-shared-network-file-system-and-a-job-scheduler

In both 1. and 2. you need to bear in mind that as soon as your scheduler job finishes, you will lose all your workers after ~60s. That may mean losing the result of lenghty computations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants