A different approach to Dynamic Pipelines #2663

desmond-dsouza · 2023-06-08T20:04:14Z

Description

I will quote from this blog post on Haskell's Shake build system

The most important thing Shake got right was adding monadic/dynamic dependencies. Most build systems start with a static graph, and then, realizing that can't express the real world, start hacking in an unprincipled manner. The resulting system becomes a bunch of special cases. Shake embraced dynamic dependencies. That makes some things harder (no static cycle detection, less obvious parallelism, must store dependency edges), but all those make Shake itself harder to write, while dynamic dependencies make Shake easier to use. I hope that eventually all build systems gain dynamic dependencies.

Context

Current discussions on dynamic pipelines in Kedro. A more principled approach may end up being both more flexible, understandable, and usable.

Shake has been used, among other things, for a BioInformatics pipeline management tool called BioShake. That domain has a lot in common with Kedro's domain.

Possible Implementation

Emulate Shake in Python, it has more than enough dynamism. Would need to accommodate today's static configuration styles, which should be doable.

Possible Alternatives

noklam · 2023-06-08T21:42:43Z

Thanks for creating this discussion. This is very interesting, I am not familiar with Haskell or Shake.

Could you describe how it would work for kedro to embrace it and how Shake solve this problem differently?

desmond-dsouza · 2023-06-09T20:21:12Z

I'm new to kedro and just happened to hit this limitation early. Short reply here, will try to expand on this later.

The key is it uses a DSL (embedded in Haskell) to write the dependencies in code. In particular, some dependencies may be static (of course), but others can be determined after examining interim results of previous dependencies.

With apologies for posting links for now: paper including skeletal code here.
https://ndmitchell.com/downloads/paper-shake_before_building-10_sep_2012.pdf

The code is fairly readable as most of it maps to Python @dataclasses and functions (with generics in signatures, and no parenthesis needed to call a function). The $ is just function application, and do notation is syntax sugar for implicitly threading the result of one step into the next step. [EDIT: the do is needed because Haskell is pure, can mostly ignore it for Python]

short talk (15 min) here: https://www.youtube.com/watch?v=xYCPpXVlqFM

stichbury · 2023-06-12T11:59:25Z

Just adding #2627 pointer here so the two issues are linked

astrojuanlu · 2023-11-25T09:54:04Z

Example of said DSL:

So in essence, the idea would be to declare (in our case in Python) that the dependencies of one node would be dynamically generated from another node.

This is a potential solution of a subset of the problems expressed in #2627. I'd say let's continue the discussion there.

desmond-dsouza added the Issue: Feature Request New feature or improvement to existing feature label Jun 8, 2023

Lasica mentioned this issue Sep 28, 2023

Make config loading consistently happen before pipelines are registered to allow for dynamic pipelines with OmegaConf #3093

Open

astrojuanlu closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A different approach to Dynamic Pipelines #2663

A different approach to Dynamic Pipelines #2663

desmond-dsouza commented Jun 8, 2023

noklam commented Jun 8, 2023

desmond-dsouza commented Jun 9, 2023 •

edited

Loading

stichbury commented Jun 12, 2023

astrojuanlu commented Nov 25, 2023

A different approach to Dynamic Pipelines #2663

A different approach to Dynamic Pipelines #2663

Comments

desmond-dsouza commented Jun 8, 2023

Description

Context

Possible Implementation

Possible Alternatives

noklam commented Jun 8, 2023

desmond-dsouza commented Jun 9, 2023 • edited Loading

stichbury commented Jun 12, 2023

astrojuanlu commented Nov 25, 2023

desmond-dsouza commented Jun 9, 2023 •

edited

Loading