Intercycle Dependencies not honoured at FCP change & Restart #6268

wxtim · 2024-07-30T12:12:11Z

Description

Upon changing the final cycle point and restarting a workflow, inter-cycle dependencies on the previous graph are ignored.

Reproducible Example

#!/bin/bash
TMP=$(mktemp -d)
echo Files extracted to ${TMP}
mkdir -p "${TMP}"

cat > "${TMP}/./flow.cylc" <<__ICI__
#!jinja2
[scheduler]
    allow implicit tasks = True
    cycle point format = %Y

[scheduling]
    initial cycle point = 1590
    final cycle point = 1591
    [[graph]]
        P1Y = """
            task => fin
            fin[-P1Y] => fin
        """

[runtime]
    [[root]]
        [[[simulation]]]
            default run length = PT0S
__ICI__

WORKFLOW=N"$(uuidgen | awk -F '-' '{print $1}')"

# Run workflow to completion:
cylc vip --no-detach "${TMP}/" --workflow-name "${WORKFLOW}" --mode=simulation

# Modify the fcp
sed -i 's/1591/1593/' "${TMP}/flow.cylc"

# Restart, trigger a task in a new cycle point:
cylc vr "${WORKFLOW}"
cylc trigger --flow=new "${WORKFLOW}//1592/task" --mode=simulation
cylc log "${WORKFLOW}" -m t

Expected Behaviour

The flow will continue from 1592/task all the way to the new FCP.

The text was updated successfully, but these errors were encountered:

wxtim · 2024-07-30T12:17:55Z

Brain dump -

Put a breakpoint at cylc/flow/task_state.py::483 and looked at the dependencies created for the new task created by Cylc Trigger:

(Pdb) cpre.api_dump() 
expression: "c0"
conditions {
  task_proxy: "1590/fin"
  expr_alias: "c0"
  req_state: "succeeded"
  satisfied: false
  message: "unsatisfied"
}
cycle_points: "1590"
satisfied: false

Stack

  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/commands.py(448)force_trigger_tasks()
-> yield schd.pool.force_trigger_tasks(tasks, flow, flow_wait, flow_descr)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_pool.py(2208)force_trigger_tasks()
-> self.add_to_pool(itask)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_pool.py(228)add_to_pool()
-> self.create_data_store_elements(itask)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_pool.py(239)create_data_store_elements()
-> self.data_store_mgr.increment_graph_window(
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/data_store_mgr.py(955)increment_graph_window()
-> self.generate_ghost_task(
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/data_store_mgr.py(1188)generate_ghost_task()
-> itask = TaskProxy(
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_proxy.py(281)__init__()
-> self.state = TaskState(tdef, self.point, status, is_held)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_state.py(238)__init__()
-> self._add_prerequisites(point, tdef)
> /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_state.py(484)_add_prerequisites()

When restarting the workflow ...
When generating a triggering generates task pre-requisites it creates them unsatisfied (fair enough).
The unsatisfied pre-reqs are never satisfied.

oliver-sanders · 2024-07-31T08:42:00Z

I think this may be a duplicate of #5952

The behaviour is sort of "intended" under SoD logic, however, totally illogical from the graph perspective.

wxtim · 2024-07-31T08:42:49Z

I think this may be a duplicate of #5952

It is.

hjoliver · 2024-08-01T00:25:17Z

The behaviour is sort of "intended" under SoD logic, however, totally illogical from the graph perspective.

Meh, reading back through that issue, there is nothing illogical about it. The root of the problem (in that issue at least; I haven't read this one closely) is that you made a change on restart that affected the structure of the graph - by adding new dependencies on an old (previously succeeded) task.

What happens next depends entirely on whether we think graph execution should be purely event-driven (prerequisites get satisfied when outputs get completed) or not (prerequisites can also get satisfied by output events that happened any time in the past).

As it happens I also think we should try to implement the latter way, although it's not exactly easy (a lot more DB lookups...). But I suspect it can be argued either way, and neither way is "illogical".

oliver-sanders · 2024-08-01T09:49:52Z

This behaviour flows on from our current implementation of the SoD model, however, this is an implementation choice, not a required (or desired) characteristic of the SoD model so it can only be considered "logical" from the perspective of the implementation, not the model. Under the current implementation, Cylc can tell you that a task which previously succeeded has not run at all, ignoring its prior history which is a bug.

The SoD implementation is indeed event driven, it's just that those events are not entirely held in memory. We are already looking up outputs every time a task spawns (to prevent "over-running"), we do not currently lookup prerequisites (to prevent "under-running") leading to the bug (which is inconsistent).

Looking up prerequisites (in addition to outputs) need not add any extra sqlite calls but will increase the "volume" of the calls we are already performing. The performance impacts of this will need to be measured to be determined, however, since the bulk of the heavy lifting is performed at the C layer, I'm cautiously optimistic. To optimise, we can shift as much logical as possible into the SQL query, one optimisation might be to only return prerequisites which are satisfied.

hjoliver · 2024-08-04T21:16:09Z

We are already looking up outputs every time a task spawns (to prevent "over-running"), we do not currently lookup prerequisites (to prevent "under-running") leading to the bug (which is inconsistent).

The former is an absolute necessity, without which the model would spawn multiple flows at every graph join (a & b => c). The latter is at least arguable, and it only matters in relatively niche situations. If I add new dependence on old event, it's not entirely unreasonable to expect that I might have fake the old event again to trigger the new dependence on it - because we missed the original event.

This is not merely implementation - the conceptual model itself can be event driven in the instantaneous sense, or with the addition of remembering and re-using past events.

When implementing SoD I did think about this, and settled on what we've got because (a) it is easier not to look up old outputs if we don't have to, and unlike for the "overrun" case it is not an absolute necessity; but also (b) it can be justified if you favour an instantaneous event-driven model.

That said, I'm being a bit bloody-minded just to object to characterizing the current situation as "illogical". I do agree that we should change it to remember and re-use past output events.

wxtim added the bug Something is wrong :( label Jul 30, 2024

wxtim self-assigned this Jul 30, 2024

wxtim added this to the 8.3.4 milestone Jul 30, 2024

wxtim closed this as completed Jul 31, 2024

oliver-sanders removed this from the 8.3.4 milestone Jul 31, 2024

oliver-sanders unassigned wxtim Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intercycle Dependencies not honoured at FCP change & Restart #6268

Intercycle Dependencies not honoured at FCP change & Restart #6268

wxtim commented Jul 30, 2024 •

edited

Loading

wxtim commented Jul 30, 2024 •

edited

Loading

oliver-sanders commented Jul 31, 2024 •

edited

Loading

wxtim commented Jul 31, 2024

hjoliver commented Aug 1, 2024 •

edited

Loading

oliver-sanders commented Aug 1, 2024 •

edited

Loading

hjoliver commented Aug 4, 2024 •

edited

Loading

Intercycle Dependencies not honoured at FCP change & Restart #6268

Intercycle Dependencies not honoured at FCP change & Restart #6268

Comments

wxtim commented Jul 30, 2024 • edited Loading

Description

Reproducible Example

Expected Behaviour

wxtim commented Jul 30, 2024 • edited Loading

oliver-sanders commented Jul 31, 2024 • edited Loading

wxtim commented Jul 31, 2024

hjoliver commented Aug 1, 2024 • edited Loading

oliver-sanders commented Aug 1, 2024 • edited Loading

hjoliver commented Aug 4, 2024 • edited Loading

wxtim commented Jul 30, 2024 •

edited

Loading

wxtim commented Jul 30, 2024 •

edited

Loading

oliver-sanders commented Jul 31, 2024 •

edited

Loading

hjoliver commented Aug 1, 2024 •

edited

Loading

oliver-sanders commented Aug 1, 2024 •

edited

Loading

hjoliver commented Aug 4, 2024 •

edited

Loading