Allow for specifying extra node dependencies #3988

lvijnck · 2024-07-04T13:35:18Z

Description

I've always felt like Kedro misses the ability to specify additional dependencies among nodes, which are not dataset related.

Context

For instance, consider the problem of filling a knowledge graph though Kedro. Obviously, there's two main nodes:

Write nodes
Write edges

However, the edges cannot be written before the nodes were pushed. There is hence no "dataset" dependency between the nodes, but rather an execution dependency.

Possible Implementation

Adding this to Kedro would involve 1) addition to the node system and 2) and update to the topological execution mechanism. With respect to the nodes, dependencies could be specified as follows:

def create_pipeline(**kwargs) -> Pipeline:
    """Create embeddings pipeline."""
    return pipeline(
        [
            node(
                func=write_nodes,
                inputs=[
                    "int.nodes"
                ],
                outputs="prm.nodes",
                name="write_nodes",
            ),
            node(
                func=write_edges,
                inputs=[
                    "int.edges"
                ],
                outputs="prm.edges",
                name="write_edges",
                dependencies=["write_nodes"]
            )
       ]
  )

Possible Alternatives

The current work-around is to add "artificial" dataset dependencies among the nodes. This has the drawback that the function signatures of those nodes are polluted.

The text was updated successfully, but these errors were encountered:

datajoely · 2024-07-04T14:23:12Z

Hey @lvijnck good to see you pop up here 👀 congrats on the new role!

The current way to do this is to pass a dummy dataset between the nodes to coerce the DAG into the right shape.

There are some open proposals on a more explicit mechanism of defining the DAG order. #1156 , I'm 99% @noklam has a concrete design somewhere, but I can't find it

datajoely · 2024-07-04T14:30:23Z

This was the issue (now discussion) I was looking for, @lvijnck if you have any further thoughts please add them there as it really helps prioritise things

#3758

lvijnck added the Issue: Feature Request New feature or improvement to existing feature label Jul 4, 2024

github-actions bot mentioned this issue Aug 1, 2024

Monthly issue metrics report #4049

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for specifying extra node dependencies #3988

Allow for specifying extra node dependencies #3988

lvijnck commented Jul 4, 2024

datajoely commented Jul 4, 2024

datajoely commented Jul 4, 2024

Allow for specifying extra node dependencies #3988

Allow for specifying extra node dependencies #3988

Comments

lvijnck commented Jul 4, 2024

Description

Context

Possible Implementation

Possible Alternatives

datajoely commented Jul 4, 2024

datajoely commented Jul 4, 2024