-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using cuda cluster on HPC: Implementation question for a PBSCudaCluster/Job using LocalCUDACluster #653
Comments
I think it's a bit of work but definitely doable!
@andersy005 @guillaumeeb if you have additional thoughts here or corrections |
Regarding the first point - starting scheduler on a cluster node - I have this simple suggestion:
Sounds easy and without too much risk as it use classical techniques already working on HPC and with MPI-Dask.
In option 1) scheduler and workers could communicate with usual way or continue using MPI primitives In option 2) workers and the scheduler will be usual system processes, outside the MPI script and will no more care about it.
|
Regarding the second point : I agree. So if we can define a good way of how to run the full Dask cluster including the scheduler in the HPC cluster and use it from a client node, I will be pleased to give it a try and test it on each HPC I can. Just to express my feelings : there are a lot of ways to start Dask : local machine, distributed and manually on a network, on the cloud, on hpc with jobqueue, using mpi, ... That's fine and this is a proof that Dask is able to run everywhere. But it becomes difficult to understand and maintain. I guess - I have not yet contributed to any line of it. Personally I will need soon a good diagram to see clearly in all the implementations options and how they could be reused from one to the others. Anyway, thank you for taking the time answering me. |
Dask on MPI systems are used and used quite a bit more than expected. You might be interested in the dask-mpi project. @jacobtomlinson have you used The questions that you posed are great and I think the dask-mpi docs answer quite a few of them. For example, the batch jobs page outlines how the scheduler is started within the MPI job on Rank 0 |
@jacobtomlinson, is this something https://github.com/dask-contrib/dask-ctl could help with now or in the future? |
@quasiben, is dask/dask-jobqueue#390 still worth pursuing? |
Hi, thanks for the replies and links. I think it can be easy to allow running CudaLocalCluster from dask.mpi Then I will go back on how starting a PBSCluster based on Cuda (or CPU nodes) with the scheduler running in the HPC not on the login node using dask-spec-cli That sounds a good workflow. One step at a time. Will start on Thursday |
@quasiben I haven't. It seems you can choose between The workflow then would be to submit a Python script via PBS which uses dask-mpi to start the cluster.
@andersy005 yeah this is something that it would support in the future. The Given that
Yeah I really think it is. It would be necessary for |
Hey all, quite nice seeing this discussion. I'm really in favor of all the improvements in this discussion. I've still got a question though: is it really mandatory to be able to have the Scheduler running remotely with dask-jobqueue before implementing a solution to launch CudaWorker with it (so probably implementing dask-spec cli if I understood correctly)? I feel both improvements are somewhat independent (even if both might be required in some HPC centers...), and only the second is really needed to answer the original issue. But maybe I missed something. @MordicusEtCubitus did you have time to work on this? |
In our case what we decided to do is to created a job script where the dask-scheduler and dask-cuda-workers are launched, then launch the python client. Note that the scheduler and workers run on compute nodes while the python client is run in a batch/service node. This way we feel like we have more control of the mapping between workers and node resources. Find below a sample based on the LSF scheduler but the pattern should be the same for PBS/slurm.
|
This issue has been labeled |
This issue has been labeled |
Dears,
First I would like to thank you for the great work you have done to help using Dask+GPU.
The LocalCUDACluster is really great and makes things much more simple.
Actually I have been able to run it quite easily on a single computer having many GPU or on a local network with many computers having one gpu.
That's great!
I also work a few on HPC using Dask/Jobqueue with PBS or Slurm for CPU computing - and that's fine too.
Now, you may have guessed: I would like to run the Cuda Cluster on a HPC having many nodes having many GPU each.
I've seen that's you have think about many features in this way in the scheduler/worker configuration, like using infiniband.
Actually, I've been able to run CudaCluster on a single HPC node having many GPU by executing a simple python script with the usual PBS/Slurm way. So I can run a CudaLocalCluster on a single node. A first step.
But I don't clearly understand how I can start a PBS/Slurm job running CudaCluster/Workers on many nodes, as PBSCluster/SlurmCluster are doing for CPU.
Cant it be done, and, if yes, how ?
After having a look inside PBSCluster class, it appears that it only defines a static job attribute from PBSJob class.
python -m distributed.cli.dask_worker
runned on each nodeSo, if I want to run workers using CudaLocalCluster,
distributed.cli.dask
bydask_cuda.cli.dask_cuda_worker
?Seems near to be simple. I do not have access to the HPC right now, and will appreciate to get your comment before investing on this try.
Another question regarding the setting of
distributed.cli.dask_worker
in Job class.This is hard coded in the Job super class and not stored in an attribute, but can be retrieved from
Job instance._command_template
attributeSo in the
PBSCudaJob.__init__
I could try to replace the string 'distributed.cli.dask_worker' by 'dask_cuda.cli.dask_cuda_worker' inself._command_template
, this is not a best practice. Do you have a better suggestion ?Do I have understood the work to implement or do I have missed a lot of steps ?
Thanks for your help.
Gaël,
The text was updated successfully, but these errors were encountered: