Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intercycle Dependencies not honoured at FCP change & Restart #6268

Closed
wxtim opened this issue Jul 30, 2024 · 6 comments
Closed

Intercycle Dependencies not honoured at FCP change & Restart #6268

wxtim opened this issue Jul 30, 2024 · 6 comments
Labels
bug Something is wrong :(

Comments

@wxtim
Copy link
Member

wxtim commented Jul 30, 2024

Description

Upon changing the final cycle point and restarting a workflow, inter-cycle dependencies on the previous graph are ignored.

Reproducible Example

#!/bin/bash
TMP=$(mktemp -d)
echo Files extracted to ${TMP}
mkdir -p "${TMP}"

cat > "${TMP}/./flow.cylc" <<__ICI__
#!jinja2
[scheduler]
    allow implicit tasks = True
    cycle point format = %Y

[scheduling]
    initial cycle point = 1590
    final cycle point = 1591
    [[graph]]
        P1Y = """
            task => fin
            fin[-P1Y] => fin
        """

[runtime]
    [[root]]
        [[[simulation]]]
            default run length = PT0S
__ICI__

WORKFLOW=N"$(uuidgen | awk -F '-' '{print $1}')"

# Run workflow to completion:
cylc vip --no-detach "${TMP}/" --workflow-name "${WORKFLOW}" --mode=simulation

# Modify the fcp
sed -i 's/1591/1593/' "${TMP}/flow.cylc"

# Restart, trigger a task in a new cycle point:
cylc vr "${WORKFLOW}"
cylc trigger --flow=new "${WORKFLOW}//1592/task" --mode=simulation
cylc log "${WORKFLOW}" -m t

Expected Behaviour

The flow will continue from 1592/task all the way to the new FCP.

@wxtim wxtim added the bug Something is wrong :( label Jul 30, 2024
@wxtim wxtim self-assigned this Jul 30, 2024
@wxtim wxtim added this to the 8.3.4 milestone Jul 30, 2024
@wxtim
Copy link
Member Author

wxtim commented Jul 30, 2024

Brain dump -

Put a breakpoint at cylc/flow/task_state.py::483 and looked at the dependencies created for the new task created by Cylc Trigger:

(Pdb) cpre.api_dump() 
expression: "c0"
conditions {
  task_proxy: "1590/fin"
  expr_alias: "c0"
  req_state: "succeeded"
  satisfied: false
  message: "unsatisfied"
}
cycle_points: "1590"
satisfied: false

Stack

  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/commands.py(448)force_trigger_tasks()
-> yield schd.pool.force_trigger_tasks(tasks, flow, flow_wait, flow_descr)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_pool.py(2208)force_trigger_tasks()
-> self.add_to_pool(itask)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_pool.py(228)add_to_pool()
-> self.create_data_store_elements(itask)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_pool.py(239)create_data_store_elements()
-> self.data_store_mgr.increment_graph_window(
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/data_store_mgr.py(955)increment_graph_window()
-> self.generate_ghost_task(
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/data_store_mgr.py(1188)generate_ghost_task()
-> itask = TaskProxy(
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_proxy.py(281)__init__()
-> self.state = TaskState(tdef, self.point, status, is_held)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_state.py(238)__init__()
-> self._add_prerequisites(point, tdef)
> /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_state.py(484)_add_prerequisites()
  • When restarting the workflow ...
  • When generating a triggering generates task pre-requisites it creates them unsatisfied (fair enough).
  • The unsatisfied pre-reqs are never satisfied.

@oliver-sanders
Copy link
Member

oliver-sanders commented Jul 31, 2024

I think this may be a duplicate of #5952

The behaviour is sort of "intended" under SoD logic, however, totally illogical from the graph perspective.

@wxtim
Copy link
Member Author

wxtim commented Jul 31, 2024

I think this may be a duplicate of #5952

It is.

@wxtim wxtim closed this as completed Jul 31, 2024
@oliver-sanders oliver-sanders removed this from the 8.3.4 milestone Jul 31, 2024
@hjoliver
Copy link
Member

hjoliver commented Aug 1, 2024

The behaviour is sort of "intended" under SoD logic, however, totally illogical from the graph perspective.

Meh, reading back through that issue, there is nothing illogical about it. The root of the problem (in that issue at least; I haven't read this one closely) is that you made a change on restart that affected the structure of the graph - by adding new dependencies on an old (previously succeeded) task.

What happens next depends entirely on whether we think graph execution should be purely event-driven (prerequisites get satisfied when outputs get completed) or not (prerequisites can also get satisfied by output events that happened any time in the past).

As it happens I also think we should try to implement the latter way, although it's not exactly easy (a lot more DB lookups...). But I suspect it can be argued either way, and neither way is "illogical".

@oliver-sanders
Copy link
Member

oliver-sanders commented Aug 1, 2024

This behaviour flows on from our current implementation of the SoD model, however, this is an implementation choice, not a required (or desired) characteristic of the SoD model so it can only be considered "logical" from the perspective of the implementation, not the model. Under the current implementation, Cylc can tell you that a task which previously succeeded has not run at all, ignoring its prior history which is a bug.

The SoD implementation is indeed event driven, it's just that those events are not entirely held in memory. We are already looking up outputs every time a task spawns (to prevent "over-running"), we do not currently lookup prerequisites (to prevent "under-running") leading to the bug (which is inconsistent).

Looking up prerequisites (in addition to outputs) need not add any extra sqlite calls but will increase the "volume" of the calls we are already performing. The performance impacts of this will need to be measured to be determined, however, since the bulk of the heavy lifting is performed at the C layer, I'm cautiously optimistic. To optimise, we can shift as much logical as possible into the SQL query, one optimisation might be to only return prerequisites which are satisfied.

@hjoliver
Copy link
Member

hjoliver commented Aug 4, 2024

We are already looking up outputs every time a task spawns (to prevent "over-running"), we do not currently lookup prerequisites (to prevent "under-running") leading to the bug (which is inconsistent).

The former is an absolute necessity, without which the model would spawn multiple flows at every graph join (a & b => c). The latter is at least arguable, and it only matters in relatively niche situations. If I add new dependence on old event, it's not entirely unreasonable to expect that I might have fake the old event again to trigger the new dependence on it - because we missed the original event.

This is not merely implementation - the conceptual model itself can be event driven in the instantaneous sense, or with the addition of remembering and re-using past events.

When implementing SoD I did think about this, and settled on what we've got because (a) it is easier not to look up old outputs if we don't have to, and unlike for the "overrun" case it is not an absolute necessity; but also (b) it can be justified if you favour an instantaneous event-driven model.

That said, I'm being a bit bloody-minded just to object to characterizing the current situation as "illogical". I do agree that we should change it to remember and re-use past output events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

No branches or pull requests

3 participants