broadcast: don't auto-expire broadcasts #6308

ColemanTom · 2024-08-20T04:44:58Z

[edit] We have decided to change our approach to broadcast expiry in light of this issue. I have now changed the issue title to match the new scope.

For a summary of the proposed changes, skip to #6308 (comment)

Discussion took place in https://cylc.discourse.group/t/cylc-broadcast-is-being-cleared-automatically/991/3

I'm posting this here to hope it doesn't get lost/forgotten.

Two of us encountered this same issue today. When you do a cylc broadcast on a task which has already run and succeeded, the broadcast is automatically cleared.

I was using CYLC_VERSION=8.3.1 and my colleague using CYLC_VERSION=8.3.3. We both first noticed it via the WUI but we have shown it also happens via CLI. I don’t know the behaviour in 8.2.

From the scheduler log:

2024-07-31T02:00:40Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20240327T0000Z'], mode=put_broadcast, namespaces=['ct_to_simpler'], settings=[{'environment': {'FORCE_RESEND': 'no'}}])
2024-07-31T02:00:40Z INFO - Broadcast set:
    + [20240327T0000Z/ct_to_simpler] [environment]FORCE_RESEND=no
2024-07-31T02:00:40Z INFO - Broadcast cancelled:
    - [20240327T0000Z/ct_to_simpler] [environment]FORCE_RESEND=no

Pausing the workflow, triggering the task, and then broadcasting allowed the broadcast to be kept.

2024-07-31T02:04:48Z INFO - Pausing the workflow
2024-07-31T02:04:48Z INFO - Command "pause" actioned. ID=0ea0b0b8-5b94-4d06-b90b-81b43962e8a9
2024-07-31T02:05:06Z INFO - Command "force_trigger_tasks" received. ID=8055b39c-10e3-4ccf-97ac-ed3a9bdd5d17
    force_trigger_tasks(flow=['all'], flow_wait=False, tasks=['20240327T0000Z/ct_to_simpler'])
2024-07-31T02:05:07Z INFO - [20240327T0000Z/ct_to_simpler:waiting(runahead)] => waiting
2024-07-31T02:05:07Z INFO - [20240327T0000Z/ct_to_simpler:waiting] => waiting(queued)
2024-07-31T02:05:07Z INFO - Command "force_trigger_tasks" actioned. ID=8055b39c-10e3-4ccf-97ac-ed3a9bdd5d17
2024-07-31T02:06:07Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20240327T0000Z'], mode=put_broadcast, namespaces=['ct_to_simpler'], settings=[{'environment': {'FORCE_RESEND': 'no'}}])
2024-07-31T02:06:07Z INFO - Broadcast set:
    + [20240327T0000Z/ct_to_simpler] [environment]FORCE_RESEND=no
2024-07-31T02:07:13Z INFO - Command "resume" received. ID=d74fcf91-6398-4317-b4e2-a251c290029c
    resume()
2024-07-31T02:07:14Z INFO - RESUMING the workflow now
2024-07-31T02:07:14Z INFO - Command "resume" actioned. ID=d74fcf91-6398-4317-b4e2-a251c290029c
2024-07-31T02:07:14Z INFO - [20240327T0000Z/ct_to_simpler:waiting(queued)] => waiting
2024-07-31T02:07:14Z INFO - [20240327T0000Z/ct_to_simpler:waiting] => preparing
2024-07-31T02:07:19Z INFO - [20240327T0000Z/ct_to_simpler/05:preparing] submitted to user-dm_calthunder_d:pbs[6195899]
2024-07-31T02:07:19Z INFO - [20240327T0000Z/ct_to_simpler/05:preparing] => submitted
2024-07-31T02:07:27Z INFO - [20240327T0000Z/ct_to_simpler/05:submitted] => running
2024-07-31T02:07:31Z INFO - [20240327T0000Z/ct_to_simpler/05:running] => succeeded
2024-07-31T02:07:31Z INFO - [20240327T0000Z/ct_archive:succeeded] already finished and completed (flows=1))
2024-07-31T02:07:32Z INFO - Broadcast cancelled:
    - [20240327T0000Z/ct_to_simpler] [environment]FORCE_RESEND=no

What were we trying to do? Say we generated some data, and something was found wrong with it in a future task, or the disk got corrupted and we want to rerun the task again with a slightly different setting to delete the old data perhaps. We modify an environment variable in the runtime and run the task. As we can see the task in the WUI and broadcast to it via CLI, we were under the impression we could just do the broadcast and then trigger the task, but that does not work. We can’t see a way to do trigger new task with modified runtime, so our only option appears to be to pause the whole suite. We did not test hold -> trigger -> broadcast -> release.

Desired behaviour

We would like a way to modify the runtime of a succeeded task and trigger it without having to pause, trigger, edit runtime, unpause a workflow to do it.

The text was updated successfully, but these errors were encountered:

oliver-sanders · 2024-08-20T10:29:01Z

I don't think that a broadcast made to a succeeded task should be automatically cleared. This may be leftover Cylc 7 logic, it really doesn't play well with the Cylc 8 concept of new-flows (which may have the same broadcast requirements as the original flow).

Hopefully an easy fix.

We would like a way to modify the runtime of a succeeded task and trigger it without having to pause, trigger, edit runtime, unpause a workflow to do it.

Note, cylc vr can achieve this.

wxtim · 2024-09-05T12:37:59Z

Currently broadcasts are "expired" (removed) when the cycles to which they refer to have passed. Disabling this seems simple. But I don't think that we want to allow broadcasts to accumulate without housekeeping forever.

I'm going to propose that tasks can consume broadcasts, somewhere towards the run of a task.

Question - should the broadcast be considered cleanable on completion or success?

oliver-sanders · 2024-09-05T13:06:34Z

Question - should the broadcast be considered cleanable on completion or success?

That's the idea.

When the task is removed from the task-pool as completed, the broadcast can be cleared.

(note task output completion and task pool removal aren't strictly equivalent)

simonathompson · 2024-10-09T13:37:46Z

Just wanted to add, that I've just had a chat with Ronnie in the cylc surgery describing (completely independently) this issue. My context is a regular cycling workflow with no cycle-to-cycle dependencies, where I want to rerun a past cycle. I issue a broadcast at the beginning of a cycle to the family that over-arches all the other tasks in a cycle, that effectively tells it which HPC hall to point to. If I try this in cylc 8.3.3 the broadcast just vanishes as there are no active tasks, and because it's the first thing a cycle does. Fortunately I can get it to do what I want using a new cylc flow. Either way, you end up with either housekeeping broadcasts, or housekeeping flows! Thanks.

hjoliver · 2024-10-09T23:15:50Z

If I try this in cylc 8.3.3 the broadcast just vanishes as there are no active tasks,

Broadcast information is held in the scheduler, to be used for future tasks - which don't have to be in the active pool yet. So lack of active tasks isn't the problem here, it's that that you're targeting past tasks that you want to rerun, and the broadcast clearing mechanism assumes that past tasks are done with, so the broadcast can be cleared. (Which is indeed exactly what this issue is about ... just clarifying the logic in this case).

oliver-sanders · 2024-11-05T09:33:37Z

Meeting 2024-11-04

This issue was recently discussed at a Cylc developers meeting, conclusions:

Auto expiring broadcasts on cycle completion (i.e. housekeeping) doesn't make sense.

While this has always been the case, Cylc 8 makes the consequences more impactfull:

It is easier to re-run historical tasks and multi-flow functionality encourages this.

Edit runtime forms are likely to be inaccurate for n>0 tasks.

SoD makes it slightly harder for users to tell what cycles are "open" vs "closed" (need to change the window to n=0).

The intention of this housekeeping was likely memory management, not functionality.

We discussed and agreed an alternative approach where broadcasts are not housekept:

Do not auto-expire broadcasts when cycles "complete". Allow the broadcasts to pile up in the DB.

Only hold broadcasts that match cycles in the data-store window in memory.

When a new cycle is added to the data store (when a task is added for a cycle not present in the data store), retrieve any matching broadcasts from the DB.

When a cycle is remove from the data store (when a task is removed leaving no others in the data store), ditch any broadcasts that are no longer relevant.

Concerns for development:

Keep it efficient to large numbers of broadcasts. E.G, try to push pattern matching into SQL.

Ensure the data store is responsive to triggering, removal, reloads and restarts.

Leave the door open to broadcast: operate on broadcasts by ID #6402

Ensure the DB remains compatible or develop a DB upgrader.

We will probably want to be able to query the broadcasts that are held in the data-store window (i.e. like the cylc-review view active broadcasts feature), see also info view: display task configuration & broadcasts cylc-ui#1895 & Broadcasts View cylc-ui#463

oliver-sanders · 2024-11-06T10:10:27Z

From initial investigations:

The broadcast_events table in the DB is not housekept, so we are already preserving a record of all broadcasts.
At present the BroadcastManager state is assembled in memory and probably(?) cannot be reconstructed from the broadcast_events alone. Hence, we have the broadcast_states table.
The broadcast_states table in the DB (which backs up the BroadcastManager state) can be removed with this change (as the BroadcastMananger state can be reconstructed from the DB). This will have impacts on cylc-review.
Broadcasts already have unique IDs in the form of (high precision) timestamps (broadcast: operate on broadcasts by ID #6402).
The SQL BETWEEN operator should allow us to efficiently select matching entries from the broadcast_events table.
At present, an all-cycle broadcast can take one of three values *, all-cycles or all-cycle-points, all of which are equivalent. We should standardise on * with this change for simplicity.

hjoliver · 2024-11-07T01:52:12Z

At present, an all-cycle broadcast can take one of three values *, all-cycles or all-cycle-points, all of which are equivalent.

Huh, I wonder how we ended up with that?

oliver-sanders · 2024-11-26T16:26:16Z

At present, an all-cycle broadcast can take one of three values *, all-cycles or all-cycle-points, all of which are equivalent.

Huh, I wonder how we ended up with that?

No idea, but I intend to wipe it out!

Implementing the above requires a small/medium size code refactor. I've got a proof-of-concept branch working, but there's a lot in the pipes at the moment so I have tagged against 8.5.0.

Although the Cylc 8 behavior is unchanged in relation to Cylc 7, it's not very nice and adds caveats to interventions performed via broadcasts so I have added this to the roadmap.

oliver-sanders · 2024-11-29T13:57:45Z

My POC for anyone interested: https://github.com/oliver-sanders/cylc-flow/pull/new/broadcast-refactor

(This is a parallel implementation, swap out the existing broadcast manager in the Scheduler class to activate it. Note, it ain't gonna work just yet)

ColemanTom added the bug Something is wrong :( label Aug 20, 2024

MetRonnie added this to the 8.3.x milestone Aug 20, 2024

wxtim self-assigned this Aug 30, 2024

wxtim added the question Flag this as a question for the next Cylc project meeting. label Sep 5, 2024

wxtim mentioned this issue Sep 5, 2024

Housekeep broadcasts on task completion #6359

Closed

8 tasks

oliver-sanders changed the title ~~Modify runtime of a succeeded task so you can trigger it~~ broadcast: modify runtime of a succeeded task so you can trigger it Oct 7, 2024

ColemanTom mentioned this issue Oct 23, 2024

Allow manual trigger when workflow paused. #6192

Closed

8 tasks

wxtim removed the question Flag this as a question for the next Cylc project meeting. label Nov 5, 2024

oliver-sanders added the code refactor Large code refactors label Nov 5, 2024

oliver-sanders self-assigned this Nov 6, 2024

oliver-sanders modified the milestones: 8.3.x, 8.5.0 Nov 26, 2024

oliver-sanders changed the title ~~broadcast: modify runtime of a succeeded task so you can trigger it~~ broadcast: don't auto-expire broadcasts Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

broadcast: don't auto-expire broadcasts #6308

broadcast: don't auto-expire broadcasts #6308

ColemanTom commented Aug 20, 2024 •

edited by oliver-sanders

Loading

oliver-sanders commented Aug 20, 2024

wxtim commented Sep 5, 2024

oliver-sanders commented Sep 5, 2024

simonathompson commented Oct 9, 2024

hjoliver commented Oct 9, 2024 •

edited

Loading

oliver-sanders commented Nov 5, 2024

oliver-sanders commented Nov 6, 2024 •

edited

Loading

hjoliver commented Nov 7, 2024

oliver-sanders commented Nov 26, 2024 •

edited

Loading

oliver-sanders commented Nov 29, 2024

broadcast: don't auto-expire broadcasts #6308

broadcast: don't auto-expire broadcasts #6308

Comments

ColemanTom commented Aug 20, 2024 • edited by oliver-sanders Loading

Desired behaviour

oliver-sanders commented Aug 20, 2024

wxtim commented Sep 5, 2024

oliver-sanders commented Sep 5, 2024

simonathompson commented Oct 9, 2024

hjoliver commented Oct 9, 2024 • edited Loading

oliver-sanders commented Nov 5, 2024

oliver-sanders commented Nov 6, 2024 • edited Loading

hjoliver commented Nov 7, 2024

oliver-sanders commented Nov 26, 2024 • edited Loading

oliver-sanders commented Nov 29, 2024

ColemanTom commented Aug 20, 2024 •

edited by oliver-sanders

Loading

hjoliver commented Oct 9, 2024 •

edited

Loading

oliver-sanders commented Nov 6, 2024 •

edited

Loading

oliver-sanders commented Nov 26, 2024 •

edited

Loading