Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks reported as succeeded(runahead) #4868

Closed
dpmatthews opened this issue May 12, 2022 · 4 comments · Fixed by #4871
Closed

Tasks reported as succeeded(runahead) #4868

dpmatthews opened this issue May 12, 2022 · 4 comments · Fixed by #4871
Assignees
Labels
bug Something is wrong :(
Milestone

Comments

@dpmatthews
Copy link
Contributor

If I run the following workflow:

[scheduler]
    allow implicit tasks = True
[scheduling]
    initial cycle point = 20220101T0000Z
    final cycle point = 20220101T0000Z
    [[graph]]
        R1 = """
               install_cold => model_retrieve_amm15sst_cold => model_ants_amm15_to_model_cold
               install_cold => model_createbc_lbc_cold & model_install_startdata_cold
               model_install_startdata_cold => model_varbc_reset => housekeep_cold
               model_install_startdata_cold & model_ants_amm15_to_model_cold => model_recon_cold
               model_createbc_lbc_cold & model_recon_cold => model_fcst_cold => model_recon_ls_cold => housekeep_cold
               model_retrieve_lbc_cold => model_createbc_lbc_cold
            """
        T00 = "model_createbc_lbc_cold[^] => model_createbc_lbc"
[runtime]
    [[root]]
        script=true

then I get the following entries in the workflow log:

2022-05-12T10:38:25+01:00 INFO - [20220101T0000Z/model_createbc_lbc_cold waiting(runahead) job:01 flows:1] => succeeded(runahead)
2022-05-12T10:38:33+01:00 INFO - [20220101T0000Z/model_varbc_reset waiting(runahead) job:01 flows:1] => succeeded(runahead)

These are very confusing.
Given that succeeded(runahead) isn't an expected state, I think this can be regarded as a bug.

See also #3647.

@dpmatthews dpmatthews added the bug Something is wrong :( label May 12, 2022
@dpmatthews dpmatthews added this to the cylc-8.x milestone May 12, 2022
@oliver-sanders
Copy link
Member

Possibly caused by tasks drifting out of the n=0/1 window then drifting back in again at a later time?

@hjoliver
Copy link
Member

These tasks progress as normal, then the one weird log message appears some time later:

cylc cat-log dm | grep model_varbc_reset
2022-05-13T12:54:12+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(runahead) job:00 flows:1] spawned
2022-05-13T12:54:12+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(runahead) job:00 flows:1] => waiting
2022-05-13T12:54:12+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting job:00 flows:1] => waiting(queued)
2022-05-13T12:54:13+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(queued) job:00 flows:1] => waiting
2022-05-13T12:54:13+12:00 INFO - Queue released: 20220101T0000Z/model_varbc_reset
2022-05-13T12:54:13+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting job:00 flows:1] => preparing
2022-05-13T12:54:13+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] host=niwa-1007823l
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] (internal)submitted at 2022-05-13T12:54:14+12:00
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] submitted to localhost:background[26146]
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] => submitted
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset submitted job:01 flows:1] health: submission timeout=None, polling intervals=PT15M,...
2022-05-13T12:54:16+12:00 INFO - [20220101T0000Z/model_varbc_reset submitted job:01 flows:1] (received)started at 2022-05-13T12:54:15+12:00
2022-05-13T12:54:16+12:00 INFO - [20220101T0000Z/model_varbc_reset submitted job:01 flows:1] => running
2022-05-13T12:54:16+12:00 INFO - [20220101T0000Z/model_varbc_reset running job:01 flows:1] health: execution timeout=None, polling intervals=PT15M,...
2022-05-13T12:54:26+12:00 INFO - [20220101T0000Z/model_varbc_reset running job:01 flows:1] (received)succeeded at 2022-05-13T12:54:25+12:00
2022-05-13T12:54:26+12:00 INFO - [20220101T0000Z/model_varbc_reset running job:01 flows:1] => succeeded
...
2022-05-13T12:55:08+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(runahead) job:01 flows:1] => succeeded(runahead)

(Note I put script = sleep 10 in root).

@hjoliver
Copy link
Member

Possibly caused by tasks drifting out of the n=0/1 window then drifting back in again at a later time?

Confirmed. The datastore is calling TaskProxy.state_reset(), which actions and logs state changes, on loading task history from the DB ... which isn't a "real" state change.

@hjoliver
Copy link
Member

(Note I took a look at this with some urgency, in case it indicated a problem with the scheduling algorithm (fortunately it didn't).

@hjoliver hjoliver self-assigned this May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants