Replies: 4 comments
-
Hi @jkitchin! Good to hear from you.
I agree that should be fairly do-able. Out of curiosity, what's the motivation for using a Kubernetes cluster?
I haven't used K8s very much, so I can't say much there specifically. I know that redun has a Kubernetes executor (as does Parsl), so it could be worthwhile to see how they approach things since I imagine that there might be ideas that carry over to what you are looking to do. In the context of quacc specifically, I recently added an interface to redun, so just using that is one option. At least within the context of this repo, I've purposefully tried to avoid doing anything daemon-related with the idea that one of the several supported workflow engines would handle that far more robustly than I would. That said, I think for the K8s approach you're thinking about, you probably would need some kind of long-running service. Happy to hear other thoughts. |
Beta Was this translation helpful? Give feedback.
-
The main motivation is we have a kubernetes cluster that we already use. how does it work if you submit a job to a queue system? what stays running to check on when a job is done? |
Beta Was this translation helpful? Give feedback.
-
That depends entirely on the workflow manager that you choose to use. Because everyone's needs are different, I specifically wanted to avoid enforcing any one approach. Most of the approaches involve some long-running server/daemon that will periodically poll the queuing system. But all of that logic is intentionally kept isolated from the details of quacc. The relevant details are summarized in the Deploying Calculations section of the documentation. |
Beta Was this translation helpful? Give feedback.
-
I think the gist of this is the following pseudocode:
you have to decide how often to poll, and if there is a timeout. I still don't understand how else this could work. |
Beta Was this translation helpful? Give feedback.
-
We have a way to run vasp jobs by launching pods on a Kubernetes cluster. eventually it is basically writing out the input files, creating a yaml file, and then running a shell command to start it. That part I think would not be too hard to setup.
I am not sure though, what happens after the job is launched? How do you check if the job is done so you can get results from it? Do you need some kind of long-running daemon to keep something from returning until the job is done?
Beta Was this translation helpful? Give feedback.
All reactions