Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is cloning Kedro Pipelines worth it #2903

Closed
DebanjanBanerjeeQB opened this issue Aug 7, 2023 · 7 comments
Closed

Is cloning Kedro Pipelines worth it #2903

DebanjanBanerjeeQB opened this issue Aug 7, 2023 · 7 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@DebanjanBanerjeeQB
Copy link

Description

What if there was a way to clone the kedro pipelines to move the kedro pipeline in and out fo different environments , say deploy on a docker env , has limited access to install new things there , so we clone a non PII version of it and bring it to a more technically friendlier version to do our experiments.

Context

In fintech context where there is an issue of installing new packages, opening up new ports for jupyter etc. , experimentation becomes tough and trying new solutions becomes a huge challenge. People often shift to experiment locally on their laptops and replicate solutions on the banking systems. For something as simple as getting a kedro-viz you need a new port to be opened and many setups wont allow you to do that. If there was a way to clone the pipelines , we create either a folder and then move it or create a json for the whole pipeline and move it to a new env.

Possible Implementation

i can see 2 ways to do this :

  1. Quick implementation : Handle this explicitly

kedro --clone

This will :

  • create a new folder with the same kedro pipeline
  • remove all the paths from catalog.yml
  • remove credentials.yml file
  • retain src/requirements.txt
  • retain all pipelines as is
  • retain all nodes as is
  • retain the registry
  1. Relatively Harder but cleaner implementation : Handle this like you handle kedro-viz jsons (if possible)

Write : Pack everything in a json
kedro --save-json mypipeline.json

Read :
kedro new --from-json mypipeline.json

This reads all the config including an empty catalog (maybe just with the placeholders) and empty nodes (maybe just with the function names) and create a pipeline structure that you can take to any environment.Moreover if some other team wants to build pipeline in the same structure as your team does , they need to do the whole git shebang , do a cleanup of your conf/ directory and tweak functions etc but cloning would give them a readymade starter to do so.

Both the above operations help us avoid PII leak and would be great to replicate experiments across different environments.

Possible Alternatives

I can see how we can do this with git too but that involves manually handling and cleaning some of the files and/or managing access for people outside your org (at times).

@DebanjanBanerjeeQB DebanjanBanerjeeQB added the Issue: Feature Request New feature or improvement to existing feature label Aug 7, 2023
@astrojuanlu
Copy link
Member

Hey @DebanjanBanerjeeQB have you seen https://docs.kedro.org/en/stable/nodes_and_pipelines/micro_packaging.html ? Would this suit your needs?

@DebanjanBanerjeeQB
Copy link
Author

Hey @astrojuanlu , does this package catalogs as well ? How can we make that happen ?

@astrojuanlu
Copy link
Member

It does not:

Kedro will not package the catalog config files even if those are present in conf//catalog/<micropkg_name>.yml.

Alternatively you could use kedro package, which does ship the config (separately). However, this would package all the pipelines of your project. https://docs.kedro.org/en/stable/deployment/single_machine.html#package-based

@DebanjanBanerjeeQB
Copy link
Author

So , given both of these options don’t pack the /conf directory , how do we solve that ? Say we want to run this in a new environment or want to create a kedro viz in that new env. What’s the best way to go about it

@noklam
Copy link
Contributor

noklam commented Aug 14, 2023

As far as I understand this is more of a niche constrained environment problem.

@astrojuanlu
Copy link
Member

So , given both of these options don’t pack the /conf directory , how do we solve that ? Say we want to run this in a new environment or want to create a kedro viz in that new env. What’s the best way to go about it

To clarify, kedro package will produce

  1. A wheel with the installable source code
  2. A .tar.gz with the project configuration

https://docs.kedro.org/en/stable/deployment/single_machine.html#package-based

So, while kedro micropkg won't package the config at all, kedro package will package the config, just in a separate artifact. @DebanjanBanerjeeQB Does this help?

@SajidAlamQB
Copy link
Contributor

SajidAlamQB commented Dec 18, 2023

We're closing this issue as we won't be pursuing, but we'll use it as reference for future research on micropkgs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

No branches or pull requests

4 participants