Revise modular pipelines docs #3948

DimedS · 2024-06-11T17:56:31Z

Description

Revised the pipeline and modular-pipelines documentation sections, introduced namespaces docs section.

Main Changes:

Sorted topics within these sections: simple usage is now in the pipeline introduction, while pipeline reusage is covered in the advanced-level second document.
Added a table of contents with links at the top of the pipeline introduction.
Also added a Pipeline creation structure section, which describes how to create a pipeline after using the kedro pipeline create command, what to include in nodes.py and pipeline.py, and why, as well as other available options.
Grouped key pipeline methods and operations.
Simplified the pipeline reuse documentation to focus more on practical implementation.
Reduced the size of pictures by 60% as they were too large and not useful.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

Read the contributing guidelines
Signed off each commit with a Developer Certificate of Origin (DCO)
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the RELEASE.md file
Added tests to cover my changes
Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

merelcht · 2024-06-12T10:54:00Z

Rendered docs for anyone else reviewing: https://kedro--3948.org.readthedocs.build/en/3948/

merelcht

Thanks so much for taking on this work @DimedS, you've clearly put a lot of thought in restructuring the docs and I really appreciate the effort. I've left comments in the specific pages but I think on a higher level the flow isn't quite right.

I found that the new structure seems to disrupt the logical flow of information. Sections don't seamlessly transition into one another, making it challenging to follow along. As a result, I'm concerned that the overall clarity and effectiveness of the documentation may have been compromised.

To address this, I propose revisiting the organisation of the content to ensure that each section logically leads into the next, creating a smoother reading experience.

For example, the sections on how to create a pipeline and create a new blank pipeline with the kedro pipeline create command seem very separate and then in the section on pipeline creation structure you jump back again to the example used in "how to create a simple pipeline".

Additionally, we may need to reconsider the level of detail provided in certain sections to strike the right balance between comprehensive coverage and concise readability.

I really like the rephrasing of "modular pipelines" to "reusable pipelines", but from just reading the doc, I don't feel like I'd be able to apply the information in practice. I have always hated that cooking pipeline example and if we change anything I think it has to be that example. In the ticket for this task Revise the modular pipelines documentation to improve clarity Jo mentioned "I'd like to see it more like a tutorial that is based on the spaceflights starter (extends it)." and I totally agree with that.

I'm aware I haven't been involved much in this task, so if any of this goes against what you and @astrojuanlu have discussed, feel free to dismiss it!

docs/source/nodes_and_pipelines/pipeline_introduction.md

docs/source/nodes_and_pipelines/modular_pipelines.md

DimedS · 2024-06-12T13:31:45Z

Thank you very much for the comprehensive review, @merelcht! I agree that the current flow is not ideal. Let's brainstorm how we can restructure it. My main idea was to split the content into two docs:

Basic topics that everyone should know (including how to use the Kedro pipeline create command and our recommended practices for writing pipeline code).
Advanced topics on pipeline reuse, including namespacing and overwriting.

I also agree that the cooking example isn't great. I'd be happy to replace it with a more realistic scenario, like a spaceflights extension.
Let's wait for some more feedback and then think how to refine the structure!

docs/source/nodes_and_pipelines/pipeline_introduction.md

docs/source/nodes_and_pipelines/modular_pipelines.md

ankatiyar · 2024-06-12T13:55:38Z

docs/source/nodes_and_pipelines/modular_pipelines.md


-### Custom templates
+Pipelines are shareable between Kedro codebases via [micro-packaging](micro_packaging.md), but you must follow a couple of rules to ensure portability:


Micropackaging is going to be deprecated so perhaps it's worth adding alternate instructions on how to do this?

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com>

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

DimedS · 2024-06-18T09:05:17Z

@merelcht, @ankatiyar, thank you for your reviews! I have addressed most of your comments. I would appreciate any advice on micropackaging and providing a clearer explanation of namespaces. Additionally, I did not modify the cookie examples for modular pipelines; let's confirm any changes regarding that with @astrojuanlu and Jo.

astrojuanlu

Gave this a first pass. Fantastic job @DimedS 👏🏼

Some high level thoughts:

I really like all the "how-tos" in https://kedro--3948.org.readthedocs.build/en/3948/nodes_and_pipelines/pipeline_introduction.html
However, from https://kedro--3948.org.readthedocs.build/en/3948/nodes_and_pipelines/pipeline_introduction.html#how-to-create-a-new-blank-pipeline-using-the-kedro-pipeline-create-command onwards, we're talking about Modular Pipelines, because it describes the directory structure that wraps Pipeline objects (and here the use of "Modular" is correct!).
And then, the document called "Reuse pipelines with namespaces" goes on to explain mostly namespaces, which I think is good 👍🏼 but it's still named modular_pipelines.md

So I only have 1 major suggestion about the structure, which is having "Modular Pipelines" still have a page on its own. Something like:

docs/source/nodes_and_pipelines/pipeline_introduction.md

docs/source/nodes_and_pipelines/modular_pipelines.md

noklam

I left some minor comments but I thought it's best to review the high-level structure first before going into minor details.

Even our own docs is confusing, we have a page title "Reuse pipeline with namespace" while having the URL as "modular_pipelines.html". They are really different thing.
https://kedro--3948.org.readthedocs.build/en/3948/nodes_and_pipelines/modular_pipelines.html

Question:

With micro-packging deprecating, do we fade out the "modular pipeline" concept too? My answer is no since kedro pipeline create will still use it and in general is a nice idea to keep your pipeline "modular" in terms of configuration, or even dependencies. (So it's easier to run in Orchestrator)

With that in mind, I think we should explains clearly "What is a modular pipeline", it is mainly about the structure, organisation of config/dependencies, what are the benefits etc.

"Modular pipeline" is a concept, then "Namespace" is a Kedro feature that helps you to implement pipeline in a modular way.

docs/source/nodes_and_pipelines/modular_pipelines.md

docs/source/nodes_and_pipelines/pipeline_introduction.md

noklam · 2024-06-20T04:36:52Z

docs/source/nodes_and_pipelines/pipeline_introduction.md


-<details>
-<summary><b>Click to expand</b></summary>
+Kedro's dependency resolution ensures that each node runs only after its required inputs are available from the outputs of previous nodes. This way, the nodes are executed in the correct order automatically, based on the defined dependencies.


Maybe it helps to print pipline.nodes which is toposorted already?

It's the section below titled "How to receive information about the nodes in a pipeline" that shows that output, so I don't think it's a good idea to duplicate it on the same page.

astrojuanlu · 2024-06-20T07:36:54Z

With micro-packging deprecating, do we fade out the "modular pipeline" concept too? My answer is no since kedro pipeline create will still use it

Agreed, we should keep the term

yury-fedotov

Hey! I read through this a bit, if I may, would leave a couple of comments.

I definitely don't have as deep knowledge on namespaces and modular pipelines as maintainers, but I'd say I leverage them comfortably and e.g. making dynamic namespaced pipelines (like described here) and nesting namespaces 2-3 levels deep is something I operate with very frequently. I even found a bug in kedro viz at some point related to this :)

Overall I like the current docs a lot more. It is subjective and personal opinion indeed, but a few things it's based on:

See multiple comments I left.
Modular pipelines and namespaces are IMO very interconnected and at least for me it's easy to follow if they're on the same page.
It doesn't describe pipeline() wrapper like current docs do, which to me is the key for users to understand how it works.

yury-fedotov · 2024-06-30T01:13:08Z

docs/source/nodes_and_pipelines/namespaces.md

+```python
+def create_new_pipeline(**kwargs) -> Pipeline:
+    return pipeline(
+    [existing_pipeline], # Name of the existing pipeline


Why is this[existing_pipeline] instead of existing_pipeline? Also the comment is a bit inaccurate: it's not # Name of the existing pipeline but rather a Pipeline object itself.

I think the example would be more realistic if existing_pipeline had a simple but realistic job and clear name. E.g. like cook_pipeline in old example.

I fixed the comment and removed the brackets, but I believe existing_pipeline is a sufficiently descriptive name to define reusability. A more detailed example is provided a bit further down, based on the spaceflights starter.

yury-fedotov · 2024-06-30T01:16:31Z

docs/source/nodes_and_pipelines/namespaces.md

+    inputs = {"old_input_df_name" : "new_input_df_name"},  # Mapping old input to new input
+    outputs = {"old_output_df_name" : "new_output_df_name"},  # Mapping old output to new output
+    parameters = {"params: model_options": "params: new_model_options"},  # Updating parameters


I think the old and new dataset names do not reflect what modular pipelines really do.

A mental model while working with modular pipelines and namespaces is not that you're replacing something old with something new. It is that you're using something specific instead of something abstract.

Previous example with cooking food I think reflects it better because it shows how "grilled_veg" which is an output of cook_pipeline is in one case "breakfast_food" and in the other it is "lunch_food"

Also the block inside pipeline() should be indented I believe

I fixed the indentation and updated the comments. I believe the idea is now also clearly explained within the example—showing how to create different data science pipelines based on the base_data_science pipeline.

docs/source/nodes_and_pipelines/namespaces.md

yury-fedotov · 2024-06-30T01:21:29Z

docs/source/nodes_and_pipelines/namespaces.md

+If you want to create a new pipeline that performs similar tasks with different inputs/outputs/parameters as your `existing_pipeline`, you can use the same `pipeline()` creation function as described in [How to structure your pipeline creation](modular_pipelines.md#how-to-structure-your-pipeline-creation). This function allows you to overwrite inputs, outputs, and parameters. Your new pipeline creation code should look like this:
+
+```python
+def create_new_pipeline(**kwargs) -> Pipeline:


I think this PR overall uses less specific names than the current docs, which at least to me is an impact on docs quality. Compare:

create_new_pipeline

to:

final_pipeline = ( cook_breakfast_pipeline + eat_breakfast_pipeline + cook_lunch_pipeline + eat_lunch_pipeline )

I updated it to create_pipeline(). If I understood correctly, the name create_pipeline() is important (although it is not specific) because otherwise, the pipeline will not be autodiscovered and executed during kedro new (see Kedro documentation). If you see any other examples with less specific names, please let me know and I would be happy to fix them as well.

yury-fedotov · 2024-06-30T01:23:47Z

docs/source/nodes_and_pipelines/namespaces.md

+                func=split_data,
+                inputs=["model_input_table", "params:model_options"],
+                outputs=["X_train", "X_test", "y_train", "y_test"],
+                name="split_data_node",


And similarly in other nodes, because in all the places this name is used, it is clear that it's a node.

Suggested change

name="split_data_node",

name="split_data",

I would prefer to leave this as it is because it's currently how it's written in our starters, and I don't want to confuse users with different naming since our example is based on that starter.

docs/source/nodes_and_pipelines/namespaces.md

yury-fedotov · 2024-06-30T01:34:02Z

docs/source/nodes_and_pipelines/namespaces.md

+
+## Example: Combining disconnected pipelines
+
+Sometimes two pipelines must be connected, but do not share any catalog dependencies. In this example, there is a `lunch_pipeline`, which makes us lunch. The 'verbs', `defrost` and `eat`, are Python functions and the inputs/outputs are food at different points of the process (`frozen`, `thawed` and `food`).


thawed is not present in the example at all

Decided to remove the cooking examples.

yury-fedotov · 2024-06-30T01:36:07Z

docs/source/nodes_and_pipelines/namespaces.md

+```{note}
+The set of overriding inputs and outputs must be a subset of the reused pipeline's "free" inputs and outputs, respectively. A free input is an input that isn't generated by a node in the pipeline, while a free output is an output that isn't consumed by a node in the pipeline. {py:meth}`Pipeline.inputs() <kedro.pipeline.Pipeline.inputs>` and {py:meth}`Pipeline.outputs() <kedro.pipeline.Pipeline.outputs>` can be used to list a pipeline's free inputs and outputs, respectively.
+```
+


Why do we need both cooking and data science example to demonstrate how namespaces work?

I agree with that. We want to move away from the cooking example and shift to something more practical, based on our spaceflights-pandas starter. I previously kept the cooking example, but let's go ahead and remove it.

docs/source/nodes_and_pipelines/pipeline_introduction.md

docs/source/nodes_and_pipelines/namespaces.md

astrojuanlu · 2024-06-30T09:45:56Z

Hi @yury-fedotov, thanks a lot for your review! 🙏🏼

I'd like to add clarification on two bits though:

The explanation on the wrapper was a lie: kedro.pipeline.modular_pipeline.pipeline and kedro.pipeline.pipeline are exactly the same, as I painfully discovered in Rectify "modular pipelines" terminology #2723 one year ago when I taught namespaces for the first time. I'd even advocate to make it private API, but that's for another day. Regardless, it's already time to remove it from our docs.
I don't think modular pipelines and namespaces are connected at all. In fact, most Kedro projects make use of modular pipelines (since that's the output of kedro pipeline create) but not all of them use namespaces, and it's fine. What's more: somebody could have non-modular pipelines (hence by not following the directory structure recommended by Kedro) and still use namespaces. The messy terminology made these concepts difficult to understand even for Kedro power users and the Kedro team itself (in fact, this PR was prompted by a discussion about kedro-airflow), so we considered it was urgent to clarify all these topics.

I'll let @DimedS act on the rest of your review comments as he sees fit.

yury-fedotov · 2024-06-30T15:54:36Z

@astrojuanlu

To be on the same page in terminology, by modular pipeline I mean a pipeline that is:

Defined but not registered itself
Leveraged in pipeline() wrapper, which overrides abstract inputs and outputs by specific catalog datasets, and this let's say "instance" is what's being registred.
Now for simplicity let's assume we only do 1 level of namespacing.

most Kedro projects make use of modular pipelines

Yeah but I'd assume most use them in a non-modular fashion, i.e. not leveraging inputs, outputs, parameters and namespace arguments of pipeline. Actually because of this what I do is after kedro pipeline create I go to pipeline.py and change the pipeline constructor from pipeline to Pipeline if I do not use namespaces.

I don't think modular pipelines and namespaces are connected at all

I'm curious to understand why. IMO there are 2 reasons in Kedro to use modular pipelines:

Simplify viz output and overall mental model by namespacing all internals so that other pipelines care only about what's on the edges.
Reuse the same pipeline structure more than once.

In either case, namespace is what enables you do that. I don't understand why are they not connected.

Can you give an example of when you'd leverage a modular pipeline without a namespace?

astrojuanlu · 2024-06-30T17:24:51Z

by modular pipeline I mean a pipeline that is:

I don't claim to be in possession of the right meaning for "modular pipeline" but we are talking about 2 different things. Maybe we need some arbitration :) @idanov @merelcht @noklam

yury-fedotov · 2024-06-30T20:20:55Z

by modular pipeline I mean a pipeline that is:

I don't claim to be in possession of the right meaning for "modular pipeline" but we are talking about 2 different things. Maybe we need some arbitration :) @idanov @merelcht @noklam

Sure, I didn't mean my definition is correct, that's just what I have in mind currently. Would be happy to change my mind if it's inaccurate.

But overall, when you say that modular pipelines and namespaces aren't connected, is there an example of a pipeline that's modular but not using namespaces to act as modular?

astrojuanlu · 2024-06-30T20:34:20Z

is there an example of a pipeline that's modular but not using namespaces to act as modular?

According to my current understanding of what modular pipelines are,

Modular with no namespaces

That one is easy:

$ kedro pipeline create

...

def create_pipeline(**kwargs) -> Pipeline:
    return pipeline(
        [
            node(
                func=preprocess_companies,
                inputs="companies",
                outputs="preprocessed_companies",
                name="preprocess_companies_node",
            ),
            node(
                func=preprocess_shuttles,
                inputs="shuttles",
                outputs="preprocessed_shuttles",
                name="preprocess_shuttles_node",
            ),
            node(
                func=create_model_input_table,
                inputs=["preprocessed_shuttles", "preprocessed_companies", "reviews"],
                outputs="model_input_table",
                name="create_model_input_table_node",
            ),
        ]
    )

Not modular with namespaces

In pipeline_registry.py:

def register_pipelines() -> Dict[str, Pipeline]:
    base_pipeline = pipeline(
        [
            node(
                func=split_data,
                inputs=["model_input_table", "params:model_options"],
                outputs=["X_train", "X_test", "y_train", "y_test"],
                name="split_data_node",
            ),
            node(
                func=train_model,
                inputs=["X_train", "y_train"],
                outputs="regressor",
                name="train_model_node",
            ),
            node(
                func=evaluate_model,
                inputs=["regressor", "X_test", "y_test"],
                outputs=None,
                name="evaluate_model_node",
            ),
        ]
    )
    ds_pipeline_1 = pipeline(
        pipe=base_pipeline,
        inputs="model_input_table",
        namespace="active_modelling_pipeline",
    )
    ds_pipeline_2 = pipeline(
        pipe=base_pipeline,
        inputs="model_input_table",
        namespace="candidate_modelling_pipeline",
    )

    return {"__default__": ds_pipeline_1 + ds_pipeline_2}

So the pipeline is defined directly in the pipeline_registry.py, hence not "modular" (unusual but nonetheless allowed), and it makes use of namespaces.

astrojuanlu · 2024-06-30T21:24:54Z

Also note that our docs already say that the way to create & delete modular pipelines is with kedro pipeline create/delete https://docs.kedro.org/en/0.19.6/development/commands_reference.html#modular-pipelines

yury-fedotov · 2024-06-30T22:30:44Z

@astrojuanlu thanks for providing those examples!

Yeah I think we just define modular pipelines differently. To simplify context a bit, let's assume that kedro pipeline create/delete do not exist, and we create pipeline folder structure ourselves. It actually takes a few seconds to do, so I think this example is quite realistic.

Your first examples (with preprocessing companies and shuttles) - in what way is it modular? It points to specific items in the catalog, hence cannot be reused multiple times with different inputs and outputs. Well, in theory it can of course, but how is it different from the case where I'd just use a Pipeline() constructor as give it a list of nodes?
What you have as a base_pipeline is what I would call modular. It is reusable and not registred itself, but rather 2 instances of that are created. They are created using namespaces, which allow reusability. I get why you call it not modular, but this is just because it's defined directly in the pipeline_registry.py, which to me is a weird pattern. If we take all this:

base_pipeline = pipeline(
        [
            node(
                func=split_data,
                inputs=["model_input_table", "params:model_options"],
                outputs=["X_train", "X_test", "y_train", "y_test"],
                name="split_data_node",
            ),
            node(
                func=train_model,
                inputs=["X_train", "y_train"],
                outputs="regressor",
                name="train_model_node",
            ),
            node(
                func=evaluate_model,
                inputs=["regressor", "X_test", "y_test"],
                outputs=None,
                name="evaluate_model_node",
            ),
        ]
    )
    ds_pipeline_1 = pipeline(
        pipe=base_pipeline,
        inputs="model_input_table",
        namespace="active_modelling_pipeline",
    )
    ds_pipeline_2 = pipeline(
        pipe=base_pipeline,
        inputs="model_input_table",
        namespace="candidate_modelling_pipeline",
    )

Move it to manually created pipelines/data_science/pipeline.py, and then import ds_pipeline_1 and ds_pipeline_2 in registry, and registred. Would this make this pipeline modular?

stichbury · 2024-07-01T08:09:30Z

I'm happy to do a final read through when further requested changes are complete. Please just ping me @DimedS when you're ready!

DimedS · 2024-07-01T10:52:35Z

Many thanks for your review, @yury-fedotov, and your comments, @astrojuanlu. I mostly agree with @astrojuanlu's understanding that modular pipelines are currently a more virtual concept - just a folder structure. For me, clarifying exactly what that means is not as important. I actually wanted to remove the term "modular" entirely to not to confuse users. What's more important to me is to clarify for users how to effectively use Kedro to solve their different tasks.

I like the current docs structure from that perspective because it's more task-oriented now:

In the intro, we explain that a pipeline is a combination of nodes with auto-execution order resolution.
In the modular pipelines section, we discuss the benefits of storing pipelines in separate folders, teach how to use the pipeline create command, and describe the general pipeline structure.
In the last and most complicated section, we explain how to reuse pipelines with namespaces. This is definitely an advanced-level section, and I hope it is now clearer for beginners.
@yury-fedotov, I will address your other comments today.

Co-authored-by: Yury Fedotov <102987839+yury-fedotov@users.noreply.github.com> Signed-off-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com>

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

Changes were made.

merelcht

Some small final suggestions, but nothing blocking.

Thanks for the many revisions of these docs @DimedS ⭐ From my point of view this is now a lot better than what we had before.

I would like us to put some effort in creating a more realistic example of re-using the pipeline so instead of data_science_1 and data_science_2 we can refer to a proper use case that requires re-use, but let's leave that for another time.

docs/source/nodes_and_pipelines/namespaces.md

merelcht · 2024-07-02T08:58:52Z

docs/source/nodes_and_pipelines/namespaces.md

+    - company_rating
+```
+
+> In Kedro, you cannot run pipelines with the same node names. In this example, both pipelines have nodes with the same names, so it's impossible to execute them together. However, `base_data_science` is not registered and will not be executed with the `kedro run` command. The `data_science` pipeline, on the other hand, will be executed during `kedro run` because it will be autodiscovered by Kedro, as it was created inside the `create_pipeline()` function.


I find this confusing because in the code snippet above :

def create_pipeline(**kwargs) -> Pipeline: return pipeline( [base_data_science], # Creating a new data_science pipeline based on base_data_science pipeline parameters={"params:model_options": "params:model_options_1"}, # Using a new set of parameters to train model )

base_data_science is created inside create_pipeline()...

Other than in the explanation there's no actual reference to data_science anywhere in the code examples on this page.

I fixed it by adding the comment # data_science pipeline creation function before the create_pipeline() function. However, as I understand it, I have to use create_pipeline() name to ensure the pipeline will be autodiscovered. Do you have other ideas on how to highlight the data_science name inside the code?

merelcht · 2024-07-02T09:04:52Z

docs/source/nodes_and_pipelines/namespaces.md

+
+A namespace is a way to isolate nodes, inputs, outputs, and parameters inside your pipeline. If you put `namespace="namespace_name"` attribute inside the `pipeline()` creation function, it will add the `namespace_name.` prefix to all nodes, inputs, outputs, and parameters inside your new pipeline.
+
+Let's extend our previous example and try to reuse the `base_data_science` pipeline one more time by creating another pipeline based on it. First, we should use the `kedro pipeline create` command to create a new blank pipeline named `data_science_2`:


From my point of view it doesn't matter how you create the pipelines, what's important is how you re-use them in the end. It's up to the user if they want the pipelines to be in the same folder or separated out (which is what happens when you do kedro pipeline create).

docs/source/nodes_and_pipelines/namespaces.md

docs/source/nodes_and_pipelines/pipeline_introduction.md

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com>

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

stichbury · 2024-07-02T10:44:04Z

I'm happy to do a final read through when further requested changes are complete. Please just ping me @DimedS when you're ready!

Oh, you merged it. I can still see quite a few Vale issues that I would have preferred to have been resolved, but I guess they can be addressed separately as needed.

astrojuanlu · 2024-07-02T11:11:08Z

Adding that as a follow-up task in #2723 (comment)

DimedS · 2024-07-02T13:00:33Z

I'm happy to do a final read through when further requested changes are complete. Please just ping me @DimedS when you're ready!

Oh, you merged it. I can still see quite a few Vale issues that I would have preferred to have been resolved, but I guess they can be addressed separately as needed.

I'm sorry @stichbury, I forgot to ping you before merging. There were many comments in the PR, and we agreed to continue addressing part of them in another PR. Thanks for the follow-up task description, @astrojuanlu.

Docs update

18a46ee

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

DimedS linked an issue Jun 11, 2024 that may be closed by this pull request

Revise the modular pipelines documentation to improve clarity #1998

Closed

Docs update

8a7d370

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

DimedS changed the title ~~Docs update~~ Revise modular pipelines docs Jun 11, 2024

DimedS and others added 6 commits June 11, 2024 21:37

Merge branch 'main' into 1998-revise-the-modular-pipelines-docs

9fbf6b3

Fix header

653dcde

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

Change filename

53f94fe

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

Fix external link

8678eb7

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

Fix external link

23b0989

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

Fix external links

682efb1

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

DimedS marked this pull request as ready for review June 11, 2024 22:05

DimedS requested review from yetudada and astrojuanlu as code owners June 11, 2024 22:05

DimedS requested review from noklam, merelcht and ankatiyar June 11, 2024 22:06

merelcht previously requested changes Jun 12, 2024

View reviewed changes

ankatiyar reviewed Jun 12, 2024

View reviewed changes

DimedS and others added 4 commits June 17, 2024 10:34

Apply suggestions from code review

c8e75a1

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com>

Merge branch 'main' into 1998-revise-the-modular-pipelines-docs

9ba96fc

Address review comments

7c93b58

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

Address review comments

bdae1ec

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

DimedS requested review from merelcht and ankatiyar June 18, 2024 09:05

astrojuanlu reviewed Jun 19, 2024

View reviewed changes

docs/source/nodes_and_pipelines/pipeline_introduction.md Outdated Show resolved Hide resolved

docs/source/nodes_and_pipelines/modular_pipelines.md Outdated Show resolved Hide resolved

docs/source/nodes_and_pipelines/modular_pipelines.md Show resolved Hide resolved

noklam reviewed Jun 20, 2024

View reviewed changes

yury-fedotov reviewed Jun 30, 2024

View reviewed changes

astrojuanlu mentioned this pull request Jun 30, 2024

Rectify "modular pipelines" terminology #2723

Open

DimedS and others added 2 commits July 1, 2024 13:32

Apply suggestions from code review

aa4a3c2

Co-authored-by: Yury Fedotov <102987839+yury-fedotov@users.noreply.github.com> Signed-off-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com>

Address review comments

0ac37be

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

astrojuanlu mentioned this pull request Jul 1, 2024

Collapsible non-modular pipelines kedro-org/kedro-viz#1873

Closed

1 task

DimedS added 4 commits July 1, 2024 16:03

Address review comments

2fbf15e

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

Address review comments

0c14ce2

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

Address review comments

6b36ab9

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

Remove cooking example

bc04d86

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

merelcht approved these changes Jul 2, 2024

View reviewed changes

DimedS and others added 3 commits July 2, 2024 11:07

Apply suggestions from code review

c46dab3

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com>

Add comments to create_pipeline func

1cfc4d7

Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>

Merge branch 'main' into 1998-revise-the-modular-pipelines-docs

cf63f67

DimedS merged commit c269cde into main Jul 2, 2024
10 checks passed

DimedS deleted the 1998-revise-the-modular-pipelines-docs branch July 2, 2024 10:39

astrojuanlu mentioned this pull request Jul 17, 2024

Decide new name for "reused pipelines with namespaces" #4016

Open

merelcht mentioned this pull request Sep 12, 2024

Improve documentation about namespaces #2825

Closed


		### Custom templates
		Pipelines are shareable between Kedro codebases via [micro-packaging](micro_packaging.md), but you must follow a couple of rules to ensure portability:


		## Example: Combining disconnected pipelines

		Sometimes two pipelines must be connected, but do not share any catalog dependencies. In this example, there is a `lunch_pipeline`, which makes us lunch. The 'verbs', `defrost` and `eat`, are Python functions and the inputs/outputs are food at different points of the process (`frozen`, `thawed` and `food`).


		A namespace is a way to isolate nodes, inputs, outputs, and parameters inside your pipeline. If you put `namespace="namespace_name"` attribute inside the `pipeline()` creation function, it will add the `namespace_name.` prefix to all nodes, inputs, outputs, and parameters inside your new pipeline.

		Let's extend our previous example and try to reuse the `base_data_science` pipeline one more time by creating another pipeline based on it. First, we should use the `kedro pipeline create` command to create a new blank pipeline named `data_science_2`:

Revise modular pipelines docs #3948

Revise modular pipelines docs #3948

Conversation

DimedS commented Jun 11, 2024 • edited Loading

Description

Developer Certificate of Origin

Checklist

merelcht commented Jun 12, 2024

merelcht left a comment

Choose a reason for hiding this comment

DimedS commented Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

DimedS commented Jun 18, 2024

astrojuanlu left a comment • edited Loading

Choose a reason for hiding this comment

noklam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astrojuanlu commented Jun 20, 2024

yury-fedotov left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astrojuanlu commented Jun 30, 2024

yury-fedotov commented Jun 30, 2024 • edited Loading

astrojuanlu commented Jun 30, 2024

yury-fedotov commented Jun 30, 2024

astrojuanlu commented Jun 30, 2024 • edited Loading

Modular with no namespaces

Not modular with namespaces

astrojuanlu commented Jun 30, 2024

yury-fedotov commented Jun 30, 2024

stichbury commented Jul 1, 2024

DimedS commented Jul 1, 2024 • edited Loading

merelcht left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merelcht Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

stichbury commented Jul 2, 2024

astrojuanlu commented Jul 2, 2024

DimedS commented Jul 2, 2024

DimedS commented Jun 11, 2024 •

edited

Loading

DimedS commented Jun 12, 2024 •

edited

Loading

astrojuanlu left a comment •

edited

Loading

yury-fedotov left a comment •

edited

Loading

yury-fedotov commented Jun 30, 2024 •

edited

Loading

astrojuanlu commented Jun 30, 2024 •

edited

Loading

DimedS commented Jul 1, 2024 •

edited

Loading

merelcht Jul 2, 2024 •

edited

Loading