The pipeline registry is difficult to understand #3233

astrojuanlu · 2023-10-26T11:06:53Z

The current code for pipeline_registry.py in the default template is as follows:

kedro/kedro/templates/project/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/pipeline_registry.py

Lines 1 to 16 in df9f174

    
           """Project pipelines.""" 
        
           from __future__ import annotations 
        
           from kedro.framework.project import find_pipelines 
        
           from kedro.pipeline import Pipeline 
        
           def register_pipelines() -> dict[str, Pipeline]: 
        
               """Register the project's pipelines. 
        
               Returns: 
        
                   A mapping from pipeline names to ``Pipeline`` objects. 
        
               """ 
        
               pipelines = find_pipelines() 
        
               pipelines["__default__"] = sum(pipelines.values()) 
        
               return pipelines

Apart from #2526, this is fine and works well. The magic is in kedro.framework.project.find_pipelines, which scans different directories searching for a create_pipeline function:

kedro/kedro/framework/project/__init__.py

Line 292 in df9f174

obj = getattr(pipeline_module, "create_pipeline")()

This is so magical though, that the moment users want to manually register pipelines, they go crazy. For example, this is a user that was trying something like kedro run --pipeline=data_science+evaluation, which is a beautiful syntax by the way https://linen-slack.kedro.org/t/15697047/i-have-a-quick-question-on-running-selected-pipelines-only-i#b93fe172-d54f-4f51-a8a6-b85f9dbcec32

to which I replied, how would I subtract a pipeline?

def register_pipelines() -> dict[str, Pipeline]:
    """Register the project's pipelines.

    Returns:
        A mapping from pipeline names to ``Pipeline`` objects.
    """
    pipelines = find_pipelines()
    pipelines["__default__"] = sum(pipelines.values())
    pipelines["except-train"] = ???
    return pipelines

in the end I did this:

from .pipelines.model_training import create_pipeline as create_model_training_pipeline

...
pipelines["all"] = sum(pipelines.values())
pipelines["all_except_eval"] = pipelines["all"] - create_model_training_pipeline()

but @noklam suggested this instead

pipelines["all_except_eval"] = pipelines["all"] - pipelines["eval"]

This week I saw a user do something similar, but they renamed the functions instead:

https://github.com/pablovdcf/TFM_HADO_Cares/blob/28d5a024b915169a039a5a84996b9ee11ee1f3ee/hado/src/hado/pipeline_registry.py#L5-L7

and since their pipeline creation functions were not named create_pipeline but something else, this completely broke the automagic find_pipelines for them.

The text was updated successfully, but these errors were encountered:

astrojuanlu · 2023-11-14T17:37:51Z

Another pattern: repeatedly using create_pipeline https://linen-slack.kedro.org/t/16062967/i-think-this-might-be-a-versioning-question-i-created-a-kedr#e92a5668-7e06-41e7-8825-3ec18fff1c0c

from kedro.framework.project import find_pipelines
from kedro.pipeline import Pipeline

from network_anomaly_detection.pipelines import (
    data_collection as dc,
    data_engineering as de,
    ...

def register_pipelines() -> Dict[str, Pipeline]:
    ...
    data_collection_pipeline = dc.create_pipeline()
    data_engineering_pipeline = de.create_pipeline()
    ...

    return {
        "dc": data_collection_pipeline,
        "de": data_engineering_pipeline,
        ...
        "__default__": data_collection_pipeline + data_engineering_pipeline + data_science_pipeline + plot_pipeline
    }

astrojuanlu added this to Kedro Framework Oct 26, 2023

github-actions bot mentioned this issue Nov 1, 2023

Monthly issue metrics report #3256

Closed

This was referenced Jan 22, 2024

[Parent] Improve documentation section that covers "Extend Kedro" #3202

Open

[Parent] Improve documentation section that covers "Nodes & pipelines" #3200

Open

astrojuanlu mentioned this issue Apr 5, 2024

Document best practices for writing tests for nodes and pipelines #3782

Merged

7 tasks

astrojuanlu mentioned this issue Jun 5, 2024

Define modular pipelines in config #3904

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The pipeline registry is difficult to understand #3233

The pipeline registry is difficult to understand #3233

astrojuanlu commented Oct 26, 2023

astrojuanlu commented Nov 14, 2023

The pipeline registry is difficult to understand #3233

The pipeline registry is difficult to understand #3233

Comments

astrojuanlu commented Oct 26, 2023

astrojuanlu commented Nov 14, 2023