-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
efficiency: increment_graph_window #5435
Comments
I left this example running for ~20 mins, then ctrl+c'ed it. The Naively, I would guess that this should be ok? diff --git a/cylc/flow/data_store_mgr.py b/cylc/flow/data_store_mgr.py
index f47cfcbbb..8d13cf598 100644
--- a/cylc/flow/data_store_mgr.py
+++ b/cylc/flow/data_store_mgr.py
@@ -741,6 +741,15 @@ class DataStoreMgr:
source_tokens.id
)
+ if self.n_window_nodes.get(active_id):
+ # we've already walked this node
+ if edge_distance == 0:
+ if active_id in self.prune_trigger_nodes:
+ self.prune_flagged_nodes.update(
+ self.prune_trigger_nodes[active_id])
+ del self.prune_trigger_nodes[active_id]
+ return
+
# Setup and check if active node is another's boundary node
# to flag its paths for pruning.
if edge_distance == 0: With this change the tasks spawned much faster, with 14002 calls. The GUI seems ok, but there is a housekeeping problem with some nodes not getting removed. In this screenshot, the graph isolates are erroneous: |
Here's another failed attempt which attempts to merge diff --git a/cylc/flow/data_store_mgr.py b/cylc/flow/data_store_mgr.py
index f47cfcbbb..da462e193 100644
--- a/cylc/flow/data_store_mgr.py
+++ b/cylc/flow/data_store_mgr.py
@@ -496,7 +496,7 @@ class DataStoreMgr:
self.delta_queues = {self.workflow_id: {}}
self.publish_deltas = []
# internal n-window
- self.all_task_pool = set()
+ self.all_task_pool = {}
self.n_window_nodes = {}
self.n_window_edges = {}
self.n_window_boundary_nodes = {}
@@ -735,6 +735,14 @@ class DataStoreMgr:
source_tokens = self.id_.duplicate(source_tokens)
active_id = source_tokens.id
+ visited = False
+ if edge_distance==0 and active_id not in self.all_task_pool:
+ self.all_task_pool[active_id] = False
+ elif self.all_task_pool[active_id]:
+ visited = True
+
+ self.all_task_pool[active_id] = True
+
# flag manual triggers for pruning on deletion.
if is_manual_submit:
self.prune_trigger_nodes.setdefault(active_id, set()).add(
@@ -761,6 +769,9 @@ class DataStoreMgr:
].setdefault(edge_distance, set()).add(source_tokens.id)
return
+ if visited:
+ return
+
# Generate task proxy node
is_orphan, graph_children = self.generate_ghost_task(
source_tokens,
@@ -904,7 +915,7 @@ class DataStoreMgr:
task=name,
).id
if tp_id in self.all_task_pool:
- self.all_task_pool.remove(tp_id)
+ self.all_task_pool.pop(tp_id)
self.updates_pending = True
# flagged isolates/end-of-branch nodes for pruning on removal
if (
@@ -921,14 +932,6 @@ class DataStoreMgr:
self.prune_flagged_nodes.add(tp_id)
self.updates_pending = True
- def add_pool_node(self, name, point):
- """Add external ID reference for internal task pool node."""
- tp_id = self.id_.duplicate(
- cycle=str(point),
- task=name,
- ).id
- self.all_task_pool.add(tp_id)
-
def generate_ghost_task(
self,
tokens: Tokens,
diff --git a/cylc/flow/task_pool.py b/cylc/flow/task_pool.py
index 8bfa1d923..0ad9c90fd 100644
--- a/cylc/flow/task_pool.py
+++ b/cylc/flow/task_pool.py
@@ -239,8 +239,6 @@ class TaskPool:
def create_data_store_elements(self, itask):
"""Create the node window elements about given task proxy."""
- # Register pool node reference
- self.data_store_mgr.add_pool_node(itask.tdef.name, itask.point)
# Create new data-store n-distance graph window about this task
self.data_store_mgr.increment_graph_window(
itask.tokens, It almost works, but the @dwsutherland I'm just hacking here, I'm not quite sure how this works. We need to stop this code from re-walking the same bit of graph over and over somehow. Any pointers? |
Makes sense that it would get that large .. As the window walks back n+1 edges, so I'll have a look at your solution now, however (and it might be similar), we can reduce computation by:
Anyway, I'll have a look at what you've done first. |
The main reason why we do so much walking is to have a set of nodes associated with each active node.. consider
if Also, we walk one further in order to register What we should do (in order not to re-walk), is to register all the associations of skipped sections as if we walked them .. And if we need to walk further, pick up from where we left off.. This means we need to store parents and children (IDs) of every node/edge that has been walked until they are pruned. I'll put up a PR.. With both the above changes.. This should nail it. |
14002/7001 🎉 |
Closed by #5475 |
Re-issue of: #5315
The scaling of
increment_graph_window
has improved, but is still severe enough to prevent some Cylc 7 workflows from being able to run with Cylc8.See Element for discussion:
https://matrix.to/#/!gMuCeOJBhYqgWDGwnR:matrix.org/$1679999387176196Xelev:matrix.org?via=matrix.org
Description
For the workflow
a => b<x>
there arex + 1
tasks andx
dependencies.However, for
x=700
theincrement_graph_window
function is called in excess of 3'000'000 times.Reproducible Example
$ cylc play <id> --profile --no-detach
Run this for a few mins, when you get bored ctrl+c it, then:
Expected Behaviour
It should take seconds, not minutes to spawn the 700 tasks in this workflow.
The text was updated successfully, but these errors were encountered: