-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent node execution order by sorting node with Sequentialrunner
#1604
Changes from 6 commits
6205d47
5319d5c
b83b57c
7769907
0531ba2
8a21f37
f7010b4
36527e2
4cc5f85
120b40d
90126e6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,13 @@ | ||
import random | ||
import re | ||
from itertools import chain | ||
from typing import List | ||
|
||
import pytest | ||
|
||
import kedro | ||
from kedro.pipeline import Pipeline, node | ||
from kedro.pipeline.node import Node | ||
from kedro.pipeline.pipeline import ( | ||
CircularDependencyError, | ||
ConfirmNotUniqueError, | ||
|
@@ -253,8 +256,9 @@ def test_grouped_nodes(self, input_data): | |
grouped = pipeline.grouped_nodes | ||
# Flatten a list of grouped nodes | ||
assert pipeline.nodes == list(chain.from_iterable(grouped)) | ||
# Check each grouped node matches with expected group | ||
assert all(g == e for g, e in zip(grouped, expected)) | ||
# Check each grouped node matches with the expected group, the order is | ||
# non-deterministic, so we are only checking they have the same set of nodes. | ||
assert all(set(g) == e for g, e in zip(grouped, expected)) | ||
|
||
def test_free_input(self, input_data): | ||
nodes = input_data["nodes"] | ||
|
@@ -588,6 +592,52 @@ def test_connected_pipeline(self, disjoint_pipeline): | |
assert len(pipeline.inputs()) == 1 | ||
assert len(pipeline.outputs()) == 1 | ||
|
||
def test_pipeline_consistent_nodes_order(self, mocker): | ||
""" | ||
Pipeline that have multiple possible execution orders should have consistent | ||
solutions | ||
Possible Solutions: | ||
1. A -> B -> C -> D -> E -> F | ||
2. B -> A -> C -> D -> E -> F | ||
3 ... Any permutation as long as F is executed last. | ||
|
||
Although we are not sure which permutation it is, but it should always output | ||
the same permutation. | ||
|
||
A-- \ | ||
B--- \ | ||
C---- F | ||
D--- / | ||
E-- / | ||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is nice 👍 |
||
|
||
def multiconcat(*args): | ||
return sum(args) | ||
noklam marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
mock_hash = mocker.patch(f"{__name__}.Node.__hash__") | ||
expected_sorted_nodes: List[List[Node]] = None | ||
|
||
# Repeat 10 times so we can be sure it is not purely by chance | ||
for _ in range(10): | ||
mock_hash.return_value = random.randint(1, 1e20) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is doing what you want it to do. This is currently fixing the hash of every
... but this is still not a great test because the current code in After spending a looooong time playing around with this, I think it might just not be worth writing a test for it at all all... So long as it works as it should in manual testing then I think we're fine. Happy to explain more about what I discovered while playing around with the testing here if you want to hear. It's certainly a tricky one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's chat tomorrow! |
||
|
||
inverted_fork_dags = Pipeline( | ||
[ | ||
node(constant_output, None, "A"), | ||
node(constant_output, None, "B"), | ||
node(constant_output, None, "C"), | ||
node(constant_output, None, "D"), | ||
node(constant_output, None, "E"), | ||
node(multiconcat, ["A", "B", "C", "D", "E"], "F"), | ||
] | ||
) | ||
if not expected_sorted_nodes: | ||
expected_sorted_nodes = inverted_fork_dags.nodes | ||
|
||
else: | ||
|
||
assert expected_sorted_nodes == inverted_fork_dags.nodes | ||
|
||
|
||
class TestPipelineDescribe: | ||
def test_names_only(self, str_node_inputs_list): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think this is a very clear explanation though. Maybe what you have is better 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @MerelTheisenQB has a better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is any better to be honest 😅 "Added sorting of nodes for the
SequentialRunner
to facilitate consistent execution order across multiple runs. "