-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running jobs in a Slurm cluster #16
Comments
This is very interesting and we've actually been digging into it for a few weeks. It seems doable pending a few questions about how the cluster is set up. Would you be willing to jump on a quick call just to answer a few questions and talk through our approach? |
I would like to help out and can probably also help test it in our cluster. I can be available for a call at US-friendly times tomorrow and maybe Friday. Can you email me to coordinate? |
I am also interested in getting runhouse interfacing with a Slurm cluster. Has there been any progress recently on this issue? |
Hey Andre we're still in the POC stage - we'd be happy to speak with you to hear more about your requirements and how that integration can work for your setup. Just sent you an email to coordinate |
Hello, I have a similar setup to those above and would like to try out Runhouse. Are there any updates to this issue? I have experience interfacing with Slurm clusters so would be happy to contribute if that would help get this past POC. |
Hi Eugene thanks for reaching out! We'd love to support slurm, it's on our roadmap along with other compute providers (e.g. k8s), and we hope to get the slurm support into the next or following release. In the meantime we'd be happy to hear your thoughts and possible contribution on this! Sent you an email to discuss further |
The feature
It would be interesting if Runhouse could also interface to a cluster in the form of a an existing Slurm cluster.
Motivation
I am part of a team managing a Slurm (GPU) cluster. On the other hand, I have users who are interested in being able to run large language models via Runhouse (https://langchain.readthedocs.io/en/latest/modules/llms/integrations/self_hosted_examples.html). It would be excellent if I could bridge this gap between supply and demand with Runhouse. From what I have read in the documentation so far, Runhouse does not seem to come with an interface to Slurm so far.
What the ideal solution looks like
I am completely new to Runhouse, so this may not be the ideal solution model, but I imagine this could be supported as a bring-your-own cluster with a little bit of extra interaction between Runhouse and Slurm to request the necessary resources (maybe from the Cluster factory method) as a job / jobs in Slurm (probably through the Slurm REST API). Once the jobs are running, the nodes involved can be contacted by Runhouse as a BYO cluster.
The text was updated successfully, but these errors were encountered: