-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A different approach to Dynamic Pipelines #2663
Comments
Thanks for creating this discussion. This is very interesting, I am not familiar with Haskell or Shake. Could you describe how it would work for kedro to embrace it and how Shake solve this problem differently? |
I'm new to kedro and just happened to hit this limitation early. Short reply here, will try to expand on this later. The key is it uses a DSL (embedded in Haskell) to write the dependencies in code. In particular, some dependencies may be static (of course), but others can be determined after examining interim results of previous dependencies. With apologies for posting links for now: paper including skeletal code here. The code is fairly readable as most of it maps to Python @dataclasses and functions (with generics in signatures, and no parenthesis needed to call a function). The short talk (15 min) here: https://www.youtube.com/watch?v=xYCPpXVlqFM |
Just adding #2627 pointer here so the two issues are linked |
Example of said DSL: So in essence, the idea would be to declare (in our case in Python) that the dependencies of one node would be dynamically generated from another node. This is a potential solution of a subset of the problems expressed in #2627. I'd say let's continue the discussion there. |
Description
I will quote from this blog post on Haskell's Shake build system
The most important thing Shake got right was adding monadic/dynamic dependencies. Most build systems start with a static graph, and then, realizing that can't express the real world, start hacking in an unprincipled manner. The resulting system becomes a bunch of special cases. Shake embraced dynamic dependencies. That makes some things harder (no static cycle detection, less obvious parallelism, must store dependency edges), but all those make Shake itself harder to write, while dynamic dependencies make Shake easier to use. I hope that eventually all build systems gain dynamic dependencies.
Context
Current discussions on dynamic pipelines in Kedro. A more principled approach may end up being both more flexible, understandable, and usable.
Shake has been used, among other things, for a BioInformatics pipeline management tool called BioShake. That domain has a lot in common with Kedro's domain.
Possible Implementation
Emulate Shake in Python, it has more than enough dynamism. Would need to accommodate today's static configuration styles, which should be doable.
Possible Alternatives
The text was updated successfully, but these errors were encountered: