Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monthly task in an otherwise hourly workflow has unexpected scheduling runahead limit behavior #5705

Closed
jaworsks opened this issue Aug 25, 2023 · 7 comments
Assignees
Labels
bug Something is wrong :(
Milestone

Comments

@jaworsks
Copy link

jaworsks commented Aug 25, 2023

Description

We have been working to move from Cylc 7 to Cylc 8. Currently using Cylc 8.2.1

Our workflow has multiple jobs occurring every hour. Also in that workflow we have a job that is scheduled to only run on the 10th of every month. Our workflow has a high volume of parallel tasks each cycle, some of which will go into a retry state until data is available, to help manage performance we implemented a [scheduling]runahead limit = PT6H. However, it seems the when loading the 10T00 entry from the graph it causes all hourly cycles/tasks from the initial cycle point until the monthly task to be generated and tracked causing significant delay to play the workflow or pull up in the UI.

Reproducible Example

Relevant portion of a simplified flow.cylc to reproduce the problem:

[scheduling]
  initial cycle point = 20230812T000000Z
  runahead limit = PT4H

  [[xtriggers]]
    clock_1 = wall_clock():PT1H

  [[graph]]
    PT1H = """
        @clock_1 => TASK_A1 => TASK_A2 => TASK_A3
        @clock_1 => TASK_B1 => TASK_B2
        @clock_1 => TASK_C1
    """
    10T00 = TASK_A3 => MONTHLY_TASK

Will result in all cycles between 2023-08-12 and 2023-09-10 to be tracked and all tasks for those cycles to be tracked in a waiting state. I've only put 3 sets of tasks here but our production workflow can be much larger on the cardinal hours.

Expected Behaviour

Don't know exactly the expectation here , other than not having such an impact on performance. If this is expected behavior, looking for recommendations for a work-around. Only solution I've thought of is to create a separate workflow for our monthly tasks and have one workflow signal the other.

@jaworsks jaworsks added the bug Something is wrong :( label Aug 25, 2023
@oliver-sanders oliver-sanders added this to the cylc-8.2.2 milestone Aug 25, 2023
@oliver-sanders
Copy link
Member

oliver-sanders commented Aug 25, 2023

On startup, Cylc spawns tasks from 20230812T0000Z out to 20230910T0000Z.

This is definitely not the expected behaviour, thank you for reporting this issue.

The issue only occurs when both of the recurrences (PT1H and 10T00) are present, if you remove one, then the runahead limit is observed correctly.

@jaworsks
Copy link
Author

Thanks for the response. It also occurs if you list specific hours and not just PT1H.

  [[graph]]
    T00,T06,T12,T18= """
        @clock_1 => TASK_A1 => TASK_A2 => TASK_A3
        @clock_1 => TASK_B1 => TASK_B2
        @clock_1 => TASK_C1
    """
    10T00 = TASK_A3 => MONTHLY_TASK

@jaworsks
Copy link
Author

Using a interger interval rather than datetime interval (i.e. [scheduling]runahead limit = P6 ) looks to result in the expected behavior. Need to test a bit further, but I think that will be our solution for now.

@oliver-sanders
Copy link
Member

oliver-sanders commented Aug 25, 2023

This appears to be a bug in the runahead limit calculation in the start up case (i.e. not self.main_pool).

This diff is enough to get it working correctly:

diff --git a/cylc/flow/task_pool.py b/cylc/flow/task_pool.py
index e7d85f669..057f3dfa8 100644
--- a/cylc/flow/task_pool.py
+++ b/cylc/flow/task_pool.py
@@ -371,6 +371,13 @@ class TaskPool:
         else:
             count_cycles = True
 
+        if not self.main_pool:
+            points = [
+                point
+                for point in points
+                if point <= base_point + limit
+            ]
+
         # Get all cycle points possible after the runahead base point.
         if (
             not force

Note at this stage, points contains the first point on each sequence, but this list has not been filtered by the configured runahead limit yet. The first point on the second sequence is beyond this limit, but Cylc is choosing this as the limit point resulting in the issue.

@hjoliver
Copy link
Member

hjoliver commented Aug 26, 2023

A more minimal example:

[scheduler]
    allow implicit tasks = True
[scheduling]
    initial cycle point = 2023
    runahead limit = PT2H
    [[graph]]
        PT1H = "foo"
        R/^+P1D/P1D = "foo => bar"

Which confirms it's the offset recurrence start point that screws up the limit, and just at start-up. It settles down to normal behaviour once the initial spawned tasks are done.

(I guess none of our tests have recurrence start points beyond the initial runahead limit!)

@hjoliver
Copy link
Member

@jaworsks the fix will be in the upcoming 8.2.2 release. In the meantime you could patch your local installation as above.

@wxtim
Copy link
Member

wxtim commented Aug 29, 2023

PR didn't auto-close issue.

@wxtim wxtim closed this as completed Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

No branches or pull requests

4 participants