Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is checkpointing available? #88

Open
alansill opened this issue May 1, 2024 · 1 comment
Open

Is checkpointing available? #88

alansill opened this issue May 1, 2024 · 1 comment

Comments

@alansill
Copy link

alansill commented May 1, 2024

We have an account holder who is running out of wall clock time, which is generously set to 48 hours on our cluster, even when using a 128-core node with the multi-threading option turned on. Since the code does not appear to be full MPI, and I assume would not parallelize over multiple nodes, the next best option would be to use checkpoint-and-restore methods to pick up with a subsequent job after the first one runs out of wall clock time.

Is this supported in this code? If not, are we correct about OpenMP but not OpenMPI or other MPI implementations being available? Are there any tips for lowering the run time for a given set of input?

@igronau
Copy link
Collaborator

igronau commented May 8, 2024

Sorry, but we don't have support for "checkpoint-and-restore". It's been on my TODOs for a while, but it doesn't look like I'm going to get the time to implement this. As a result, there is no effective way to run G-PhoCS on a cluster with 48 hour time limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants