-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KED-2419] Make pipeline and Pipeline consistent, take 2 #1147
Changes from 13 commits
e34ded2
85fccd0
29085ad
38ac519
4612b59
275a3b9
bd9de67
76dbddc
0b93f18
020799e
9cb85a2
0bdeedf
a96d0b8
ea4e090
9351cb9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
"""Helper to integrate modular pipelines into a master pipeline.""" | ||
import copy | ||
from typing import AbstractSet, Dict, List, Set, Union | ||
from typing import AbstractSet, Dict, Iterable, List, Set, Union | ||
|
||
from kedro.pipeline.node import Node | ||
from kedro.pipeline.pipeline import ( | ||
|
@@ -69,34 +69,40 @@ def _validate_datasets_exist( | |
|
||
|
||
def pipeline( | ||
pipe: Pipeline, | ||
pipe: Union[Iterable[Union[Node, Pipeline]], Pipeline], | ||
*, | ||
inputs: Union[str, Set[str], Dict[str, str]] = None, | ||
outputs: Union[str, Set[str], Dict[str, str]] = None, | ||
parameters: Dict[str, str] = None, | ||
tags: Union[str, Iterable[str]] = None, | ||
namespace: str = None, | ||
) -> Pipeline: | ||
"""Create a copy of the pipeline and its nodes, | ||
with some dataset names and node names modified. | ||
"""Create a ``Pipeline`` from a collection of nodes and/or ``Pipeline``s. | ||
|
||
Args: | ||
pipe: Original modular pipeline to integrate | ||
pipe: The nodes the ``Pipeline`` will be made of. If you | ||
provide pipelines among the list of nodes, those pipelines will | ||
be expanded and all their nodes will become part of this | ||
new pipeline. | ||
inputs: A name or collection of input names to be exposed as connection points | ||
to other pipelines upstream. | ||
to other pipelines upstream. This is optional; if not provided, the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
pipeline inputs are automatically inferred from the pipeline structure. | ||
When str or Set[str] is provided, the listed input names will stay | ||
the same as they are named in the provided pipeline. | ||
When Dict[str, str] is provided, current input names will be | ||
mapped to new names. | ||
Must only refer to the pipeline's free inputs. | ||
outputs: A name or collection of names to be exposed as connection points | ||
to other pipelines downstream. | ||
to other pipelines downstream. This is optional; if not provided, the | ||
pipeline inputs are automatically inferred from the pipeline structure. | ||
When str or Set[str] is provided, the listed output names will stay | ||
the same as they are named in the provided pipeline. | ||
When Dict[str, str] is provided, current output names will be | ||
mapped to new names. | ||
Can refer to both the pipeline's free outputs, as well as | ||
intermediate results that need to be exposed. | ||
parameters: A map of existing parameter to the new one. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we also explicitly say that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is ok without because, unlike the others, |
||
tags: Optional set of tags to be applied to all the pipeline nodes. | ||
namespace: A prefix to give to all dataset names, | ||
except those explicitly named with the `inputs`/`outputs` | ||
arguments, and parameter references (`params:` and `parameters`). | ||
|
@@ -108,8 +114,17 @@ def pipeline( | |
any of the expected types (str, dict, list, or None). | ||
|
||
Returns: | ||
A new ``Pipeline`` object with the new nodes, modified as requested. | ||
A new ``Pipeline`` object. | ||
""" | ||
if isinstance(pipe, Pipeline): | ||
# To ensure that we are always dealing with a *copy* of pipe. | ||
pipe = Pipeline([pipe], tags=tags) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
else: | ||
pipe = Pipeline(pipe, tags=tags) | ||
|
||
if not any([inputs, outputs, parameters, namespace]): | ||
return pipe | ||
|
||
# pylint: disable=protected-access | ||
inputs = _to_dict(inputs) | ||
outputs = _to_dict(outputs) | ||
|
@@ -181,7 +196,7 @@ def _copy_node(node: Node) -> Node: | |
|
||
new_nodes = [_copy_node(n) for n in pipe.nodes] | ||
|
||
return Pipeline(new_nodes) | ||
return Pipeline(new_nodes, tags=tags) | ||
|
||
|
||
def _to_dict(element: Union[None, str, Set[str], Dict[str, str]]) -> Dict[str, str]: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -599,7 +599,7 @@ def node( | |
outputs: Union[None, str, List[str], Dict[str, str]], | ||
*, | ||
name: str = None, | ||
tags: Iterable[str] = None, | ||
tags: Union[str, Iterable[str]] = None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should always have been this way so that |
||
confirms: Union[str, List[str]] = None, | ||
namespace: str = None, | ||
) -> Node: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this exist? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it makes sense to make a small release before
0.18.0