-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KED-2419] Make pipeline and Pipeline consistent, take 2 #1147
[KED-2419] Make pipeline and Pipeline consistent, take 2 #1147
Conversation
@@ -599,7 +599,7 @@ def node( | |||
outputs: Union[None, str, List[str], Dict[str, str]], | |||
*, | |||
name: str = None, | |||
tags: Iterable[str] = None, | |||
tags: Union[str, Iterable[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should always have been this way so that node
and Node
match identically.
I think this change makes sense, and after our chat I can say that I also like this change 🙂 I don't necessarily think that this change makes the notion of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, think the more I look at it the more it grows on me. Would be good to add some tests.
kedro/pipeline/modular_pipeline.py
Outdated
if not isinstance(pipe, Pipeline): | ||
pipe = Pipeline(pipe, tags=tags) | ||
|
||
if inputs is None and outputs is None and parameters is None and namespace is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: We can maybe simplify this condition with an all
, sth like
all(x is None for x in (inputs, outputs...))
…ked-2419-2' into feature/consistent-pipeline-api#ked-2419-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've finished updating the docstrings, project template and added some tests, so please do re-review and let me know what you think. I'll update docs and starters in separate PRs.
I wasn't sure exactly how many tests to add for this or how many existing ones to change from Pipeline
to pipeline
so let me know if you think I should do some more there. I did manually do a find and replace for all instances of Pipeline()
and replace them with pipeline()
and all tests still passed, which is a good sign that people can now always use pipeline
.
@@ -1,3 +1,14 @@ | |||
# Release 0.17.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this exist? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it makes sense to make a small release before 0.18.0
""" | ||
if isinstance(pipe, Pipeline): | ||
# To ensure that we are always dealing with a *copy* of pipe. | ||
pipe = Pipeline([pipe], tags=tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pipeline
only takes an iterable of pipelines/nodes, hence wrapping this in []
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 👍 Just some small comments from my side.
@@ -1,3 +1,14 @@ | |||
# Release 0.17.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it makes sense to make a small release before 0.18.0
inputs: A name or collection of input names to be exposed as connection points | ||
to other pipelines upstream. | ||
to other pipelines upstream. This is optional; if not provided, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
When str or Set[str] is provided, the listed input names will stay | ||
the same as they are named in the provided pipeline. | ||
When Dict[str, str] is provided, current input names will be | ||
mapped to new names. | ||
Must only refer to the pipeline's free inputs. | ||
outputs: A name or collection of names to be exposed as connection points | ||
to other pipelines downstream. | ||
to other pipelines downstream. This is optional; if not provided, the | ||
pipeline inputs are automatically inferred from the pipeline structure. | ||
When str or Set[str] is provided, the listed output names will stay | ||
the same as they are named in the provided pipeline. | ||
When Dict[str, str] is provided, current output names will be | ||
mapped to new names. | ||
Can refer to both the pipeline's free outputs, as well as | ||
intermediate results that need to be exposed. | ||
parameters: A map of existing parameter to the new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also explicitly say that parameters
are optional, now it's mentioned for inputs
, outputs
and tags
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok without because, unlike the others, parameters
is not a required argument to node
so there's no real possible confusion here 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌟
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome 🎉 I love the additional changes which clean up some docs/typing debt ❤️
Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
Description
Currently we have
node
andNode
, which are exactly equivalent. We encourage users to donode
, which just instantiatesNode
.We also have
pipeline
andPipeline
, which are quite different.Pipeline
is the underlying class used to create a pipeline; thenpipeline
is then used to transform that according to a namespace and dataset input/output mapping. The current use would be:(Of course you don't have to transform the pipeline if the
initial_pipeline
is all you need.)To be more consistent with
node/Node
and reduce confusion we want to make all the options available inPipeline
also available topipeline
. We would then encourage users to just usepipeline
. This would mean no need for thepipeline(Pipeline())
construction as users would instead just dopipeline([node1, node2, node3], tag=..., input=..., outputs=..., namespace=...)
.Note that everything here is actually a non-breaking change, since the only arguments added to
pipeline/Pipeline
are keyword only and optional.Question
Do you like this change? @lorenabalan was concerned that it conflates the notion of pipeline and modular pipeline, and the result will be more rather than less confusing to users.
Looking at the arguments that
pipeline
now takes, it does seem a bit confusing to me also. As a user, when I create my first pipeline it makes sense to me that I can specify thenamespace
andtags.
But why do I have the option of specifying theinputs
andoutputs
? Shouldn't those be taken from the structure of DAG built up from the nodes provided? (Yes they are -inputs
/outputs
/parameters
are only needed if you want to transform those to something else... but somehow these being arguments to the same function that takestags
as an argument doesn't feel quite right to me, like we're mixing two things up).But also I do think it's nicer to be able to do
pipeline()
rather thanpipeline(Pipeline())
. Overall I think this is an improvement but I'm not totally convinced yet.TODO
This PR
This ticket, new PR(s)
Separate tickets
pipeline
rather thanPipeline
pipeline
frommodular_pipeline.py
topipeline.py
and deletemodular_pipeline.py
. This would break any importsfrom kedro.pipeline.modular_pipeline import pipeline
but notfrom kedro.pipeline import pipeline
Checklist
RELEASE.md
file