Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add platform on reload for SUCCEEDED and FAILED tasks #5025

Merged
merged 1 commit into from
Aug 24, 2022

Conversation

wxtim
Copy link
Member

@wxtim wxtim commented Jul 29, 2022

These changes close #5063

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Appropriate tests are included (unit and/or functional).
  • Appropriate change log entry included.
  • No documentation update required.
  • Merged bugfix onto master

@wxtim wxtim marked this pull request as draft July 29, 2022 12:02
@wxtim wxtim changed the title attempt to fix add platform on reload for SUCCEEDED and FAILED tasks Jul 29, 2022
@wxtim wxtim marked this pull request as ready for review August 8, 2022 09:11
@wxtim wxtim added this to the cylc-8.0.1 milestone Aug 8, 2022
@wxtim wxtim linked an issue Aug 8, 2022 that may be closed by this pull request
@oliver-sanders
Copy link
Member

(happy omitting tests for this one, we need to review polling logic at some point and develop a complete set of unit tests)

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@oliver-sanders oliver-sanders changed the base branch from master to 8.0.x August 12, 2022 10:35
@oliver-sanders
Copy link
Member

oliver-sanders commented Aug 12, 2022

Changed base to 8.0.x to match the milestone.

Could we have a changelog entry for this and a more informative commit message than "attempt to fix".

@oliver-sanders
Copy link
Member

Tried this with Dave's example from #4513, I think I've managed to reproduce it on this branch. Will investigate and try again, could someone else have a crack.

@wxtim
Copy link
Member Author

wxtim commented Aug 12, 2022

Tried this with Dave's example from #4513, I think I've managed to reproduce it on this branch. Will investigate and try again, could someone else have a crack.

Actually closes #5063/

@MetRonnie
Copy link
Member

MetRonnie commented Aug 12, 2022

I think #5063 is a duplicate of #4513, which is the 8.0.1 version of #4516

Ah wait: #4516 (comment)

@MetRonnie
Copy link
Member

I tried reproducing #4516 on 8.0.x, but don't think I could reproduce using _remote_background_indep_tcp platform

[scheduling]
    [[graph]]
        R1 = a:fail? => restart

[runtime]
    [[a]]
        script = sleep 60
        execution time limit = PT1S
        platform = _remote_background_indep_tcp
    [[restart]]
        script = """
            cylc stop "${CYLC_WORKFLOW_ID}" --now --now
            sleep 5
            cylc play "${CYLC_WORKFLOW_ID}" --host=localhost
        """

@wxtim
Copy link
Member Author

wxtim commented Aug 12, 2022

I tried reproducing #4516 on 8.0.x, but don't think I could reproduce using _remote_background_indep_tcp platform

[scheduling]
    [[graph]]
        R1 = a:fail? => restart

[runtime]
    [[a]]
        script = sleep 60
        execution time limit = PT1S
        platform = _remote_background_indep_tcp
    [[restart]]
        script = """
            cylc stop "${CYLC_WORKFLOW_ID}" --now --now
            sleep 5
            cylc play "${CYLC_WORKFLOW_ID}" --host=localhost
        """

TBH I did the testing manually - I couldn't get this auto restarter working well and used @dpmatthews 's example.

@oliver-sanders oliver-sanders added the bug Something is wrong :( label Aug 15, 2022
@MetRonnie MetRonnie linked an issue Aug 15, 2022 that may be closed by this pull request
@oliver-sanders oliver-sanders modified the milestones: cylc-8.0.1, 8.0.2 Aug 16, 2022
@wxtim
Copy link
Member Author

wxtim commented Aug 16, 2022

Example I used

#!jinja2
[scheduling]
    [[graph]]
        R1 = "timeout-remote & timeout-local-slurm & timeout-local-bg"
[runtime]
    [[timeout-remote]]
        script = "sleep 600"
        platform = xce
        execution time limit = PT1S
    [[timeout-local-slurm]]
        script = "sleep 600"
        platform = spice
        execution time limit = PT1S
    [[timeout-local-bg]]
        script = "sleep 600"
        execution time limit = PT1S

Wait for all three to fail, then stop and restart the workflow.

@oliver-sanders
Copy link
Member

add platform on reload for SUCCEEDED and FAILED tasks

This issue is about "restart" rather than "reload" right?

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed to reproduce with the timeout-remote platform in Tim's example on master, couldn't reproduce on this branch.

LGTM.

@oliver-sanders
Copy link
Member

@wxtim could you add a changelog.

@wxtim
Copy link
Member Author

wxtim commented Aug 23, 2022

I've rebased this on the wrong branch. I'm taking that as an indication of tiredness. Will fix tomorrow. 😞

@wxtim wxtim force-pushed the bugfix.c8.reload_polling_bug branch from 7fbff4b to 5f9072d Compare August 24, 2022 07:41
@wxtim wxtim force-pushed the bugfix.c8.reload_polling_bug branch from 5f9072d to 5fcd223 Compare August 24, 2022 07:45
@wxtim
Copy link
Member Author

wxtim commented Aug 24, 2022

@oliver-sanders @MetRonnie Please check that I haven't messed owt up with my foolish rebasing.

@oliver-sanders
Copy link
Member

(letting tests pass)

@oliver-sanders oliver-sanders merged commit 026ce5c into cylc:8.0.x Aug 24, 2022
@wxtim wxtim deleted the bugfix.c8.reload_polling_bug branch September 28, 2022 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

job polling broken for failed jobs after restart
3 participants