Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

platforms: fix unreachable hosts not reset on platform group failure #6109

Merged
merged 13 commits into from
Jun 12, 2024

Conversation

wxtim
Copy link
Member

@wxtim wxtim commented May 20, 2024

Closes: #6100

Bad host reset was only happening if we ran out of hosts during job submission, not during preparation phase, and hosts being added to the list of bad hosts in this way were waiting for the main loop plugin to remove them.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@wxtim wxtim self-assigned this May 20, 2024
@wxtim wxtim marked this pull request as draft May 20, 2024 14:11
@wxtim wxtim marked this pull request as ready for review May 21, 2024 09:25
@wxtim wxtim marked this pull request as draft May 21, 2024 09:25
@wxtim wxtim changed the base branch from master to 8.2.x May 21, 2024 09:26
@wxtim wxtim force-pushed the fix.platform_selection_bug branch from a476c09 to 25d6e96 Compare May 22, 2024 07:50
@oliver-sanders oliver-sanders changed the title wip platforms: fix unreachable hosts not reset on platform group failure May 22, 2024
@oliver-sanders oliver-sanders added this to the 8.2.x milestone May 22, 2024
@oliver-sanders oliver-sanders added the bug Something is wrong :( label May 22, 2024
@wxtim wxtim force-pushed the fix.platform_selection_bug branch from fc66081 to 13034d6 Compare May 23, 2024 14:26
@wxtim wxtim marked this pull request as ready for review May 23, 2024 14:26
added test which includes break

return list of all hosts consumed in a platform group.
@wxtim wxtim force-pushed the fix.platform_selection_bug branch from 13034d6 to d30ac43 Compare May 23, 2024 14:30
cylc/flow/platforms.py Outdated Show resolved Hide resolved
cylc/flow/task_job_mgr.py Outdated Show resolved Hide resolved
cylc/flow/task_job_mgr.py Outdated Show resolved Hide resolved
cylc/flow/task_job_mgr.py Outdated Show resolved Hide resolved
@wxtim wxtim force-pushed the fix.platform_selection_bug branch from e166e27 to 9976751 Compare May 28, 2024 08:49
@wxtim wxtim requested a review from MetRonnie May 28, 2024 09:01
Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually tested 👍

cylc/flow/task_job_mgr.py Outdated Show resolved Hide resolved
tests/integration/test_platforms.py Outdated Show resolved Hide resolved
tests/integration/test_platforms.py Outdated Show resolved Hide resolved
tests/integration/test_platforms.py Outdated Show resolved Hide resolved
Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>
@wxtim wxtim force-pushed the fix.platform_selection_bug branch from 09c0988 to 14ed16a Compare May 28, 2024 13:50
@oliver-sanders
Copy link
Member

(one unused var to clean up)

@wxtim wxtim force-pushed the fix.platform_selection_bug branch from d2a90e0 to b9b337f Compare May 30, 2024 13:50
changes.d/fix.6109.md Outdated Show resolved Hide resolved
@MetRonnie MetRonnie linked an issue May 30, 2024 that may be closed by this pull request
Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>
@wxtim
Copy link
Member Author

wxtim commented Jun 6, 2024

@oliver-sanders poke?

cylc/flow/exceptions.py Outdated Show resolved Hide resolved
cylc/flow/platforms.py Outdated Show resolved Hide resolved
cylc/flow/task_job_mgr.py Show resolved Hide resolved
cylc/flow/exceptions.py Show resolved Hide resolved
tests/integration/test_platforms.py Outdated Show resolved Hide resolved
tests/integration/test_platforms.py Show resolved Hide resolved
tests/integration/test_platforms.py Outdated Show resolved Hide resolved
tests/integration/test_platforms.py Outdated Show resolved Hide resolved
tests/integration/test_platforms.py Show resolved Hide resolved
tests/integration/test_platforms.py Show resolved Hide resolved
wxtim and others added 4 commits June 11, 2024 15:01
Co-authored-by: Oliver Sanders <oliver.sanders@metoffice.gov.uk>
Co-authored-by: Oliver Sanders <oliver.sanders@metoffice.gov.uk>
Co-authored-by: Oliver Sanders <oliver.sanders@metoffice.gov.uk>
@oliver-sanders
Copy link
Member

Tested as working.

Co-authored-by: Oliver Sanders <oliver.sanders@metoffice.gov.uk>
@MetRonnie MetRonnie merged commit 9c3cd49 into cylc:8.2.x Jun 12, 2024
21 of 23 checks passed
@oliver-sanders oliver-sanders modified the milestones: 8.2.x, 8.3.0 Jun 18, 2024
@wxtim wxtim deleted the fix.platform_selection_bug branch June 28, 2024 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unreachable hosts not reset on platform group failure.
3 participants