Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resilience: Add to the Task Manager #614

Closed
rstyd opened this issue Nov 22, 2022 · 2 comments
Closed

Resilience: Add to the Task Manager #614

rstyd opened this issue Nov 22, 2022 · 2 comments
Labels
core Core BEE functionality that must exist High Priority

Comments

@rstyd
Copy link
Collaborator

rstyd commented Nov 22, 2022

The task manager needs increased resilience in case of any unexpected service interruptions. We will accomplish this using the task manager database to spin workflows back up in case of any expected or unexpected downtime. For the the task manager, we need to rebuild the submit and job queues, check the status of any tasks that were running via slurm, and resubmit any failed tasks. The task manager will need to ensure that the workflow manager is back up before restarting tasks.

This issue was originally part of #591, but we're breaking that issue into two.

@pagrubel
Copy link
Collaborator

Do issues #674-676 cover this can we close this?

@aquan9
Copy link
Collaborator

aquan9 commented Apr 9, 2024

Closing this task, per discussion on April 9th. All parts of this task are in other tasks.

@aquan9 aquan9 closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core BEE functionality that must exist High Priority
Projects
None yet
Development

No branches or pull requests

3 participants