Tasks reported as succeeded(runahead) #4868

dpmatthews · 2022-05-12T10:31:57Z

If I run the following workflow:

[scheduler]
    allow implicit tasks = True
[scheduling]
    initial cycle point = 20220101T0000Z
    final cycle point = 20220101T0000Z
    [[graph]]
        R1 = """
               install_cold => model_retrieve_amm15sst_cold => model_ants_amm15_to_model_cold
               install_cold => model_createbc_lbc_cold & model_install_startdata_cold
               model_install_startdata_cold => model_varbc_reset => housekeep_cold
               model_install_startdata_cold & model_ants_amm15_to_model_cold => model_recon_cold
               model_createbc_lbc_cold & model_recon_cold => model_fcst_cold => model_recon_ls_cold => housekeep_cold
               model_retrieve_lbc_cold => model_createbc_lbc_cold
            """
        T00 = "model_createbc_lbc_cold[^] => model_createbc_lbc"
[runtime]
    [[root]]
        script=true

then I get the following entries in the workflow log:

2022-05-12T10:38:25+01:00 INFO - [20220101T0000Z/model_createbc_lbc_cold waiting(runahead) job:01 flows:1] => succeeded(runahead)
2022-05-12T10:38:33+01:00 INFO - [20220101T0000Z/model_varbc_reset waiting(runahead) job:01 flows:1] => succeeded(runahead)

These are very confusing.
Given that succeeded(runahead) isn't an expected state, I think this can be regarded as a bug.

See also #3647.

The text was updated successfully, but these errors were encountered:

oliver-sanders · 2022-05-12T10:35:35Z

Possibly caused by tasks drifting out of the n=0/1 window then drifting back in again at a later time?

hjoliver · 2022-05-13T01:04:04Z

These tasks progress as normal, then the one weird log message appears some time later:

cylc cat-log dm | grep model_varbc_reset
2022-05-13T12:54:12+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(runahead) job:00 flows:1] spawned
2022-05-13T12:54:12+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(runahead) job:00 flows:1] => waiting
2022-05-13T12:54:12+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting job:00 flows:1] => waiting(queued)
2022-05-13T12:54:13+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(queued) job:00 flows:1] => waiting
2022-05-13T12:54:13+12:00 INFO - Queue released: 20220101T0000Z/model_varbc_reset
2022-05-13T12:54:13+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting job:00 flows:1] => preparing
2022-05-13T12:54:13+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] host=niwa-1007823l
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] (internal)submitted at 2022-05-13T12:54:14+12:00
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] submitted to localhost:background[26146]
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset preparing job:01 flows:1] => submitted
2022-05-13T12:54:14+12:00 INFO - [20220101T0000Z/model_varbc_reset submitted job:01 flows:1] health: submission timeout=None, polling intervals=PT15M,...
2022-05-13T12:54:16+12:00 INFO - [20220101T0000Z/model_varbc_reset submitted job:01 flows:1] (received)started at 2022-05-13T12:54:15+12:00
2022-05-13T12:54:16+12:00 INFO - [20220101T0000Z/model_varbc_reset submitted job:01 flows:1] => running
2022-05-13T12:54:16+12:00 INFO - [20220101T0000Z/model_varbc_reset running job:01 flows:1] health: execution timeout=None, polling intervals=PT15M,...
2022-05-13T12:54:26+12:00 INFO - [20220101T0000Z/model_varbc_reset running job:01 flows:1] (received)succeeded at 2022-05-13T12:54:25+12:00
2022-05-13T12:54:26+12:00 INFO - [20220101T0000Z/model_varbc_reset running job:01 flows:1] => succeeded
...
2022-05-13T12:55:08+12:00 INFO - [20220101T0000Z/model_varbc_reset waiting(runahead) job:01 flows:1] => succeeded(runahead)

(Note I put script = sleep 10 in root).

hjoliver · 2022-05-13T04:18:20Z

Possibly caused by tasks drifting out of the n=0/1 window then drifting back in again at a later time?

Confirmed. The datastore is calling TaskProxy.state_reset(), which actions and logs state changes, on loading task history from the DB ... which isn't a "real" state change.

hjoliver · 2022-05-18T00:01:56Z

(Note I took a look at this with some urgency, in case it indicated a problem with the scheduling algorithm (fortunately it didn't).

dpmatthews added the bug Something is wrong :( label May 12, 2022

dpmatthews added this to the cylc-8.x milestone May 12, 2022

hjoliver mentioned this issue May 13, 2022

Don't log flow merge for same flow. #4870

Merged

8 tasks

hjoliver mentioned this issue May 13, 2022

Don't log fake task state changes (datastore) #4871

Merged

7 tasks

hjoliver closed this as completed in #4871 May 18, 2022

hjoliver modified the milestones: cylc-8.x, cylc-8.0rc3 May 18, 2022

hjoliver self-assigned this May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tasks reported as succeeded(runahead) #4868

Tasks reported as succeeded(runahead) #4868

dpmatthews commented May 12, 2022

oliver-sanders commented May 12, 2022

hjoliver commented May 13, 2022

hjoliver commented May 13, 2022

hjoliver commented May 18, 2022

Tasks reported as succeeded(runahead) #4868

Tasks reported as succeeded(runahead) #4868

Comments

dpmatthews commented May 12, 2022

oliver-sanders commented May 12, 2022

hjoliver commented May 13, 2022

hjoliver commented May 13, 2022

hjoliver commented May 18, 2022