-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is cloning Kedro Pipelines worth it #2903
Comments
Hey @DebanjanBanerjeeQB have you seen https://docs.kedro.org/en/stable/nodes_and_pipelines/micro_packaging.html ? Would this suit your needs? |
Hey @astrojuanlu , does this package catalogs as well ? How can we make that happen ? |
It does not:
Alternatively you could use |
So , given both of these options don’t pack the /conf directory , how do we solve that ? Say we want to run this in a new environment or want to create a kedro viz in that new env. What’s the best way to go about it |
As far as I understand this is more of a niche constrained environment problem.
|
To clarify,
https://docs.kedro.org/en/stable/deployment/single_machine.html#package-based So, while |
We're closing this issue as we won't be pursuing, but we'll use it as reference for future research on micropkgs. |
Description
What if there was a way to clone the kedro pipelines to move the kedro pipeline in and out fo different environments , say deploy on a docker env , has limited access to install new things there , so we clone a non PII version of it and bring it to a more technically friendlier version to do our experiments.
Context
In fintech context where there is an issue of installing new packages, opening up new ports for jupyter etc. , experimentation becomes tough and trying new solutions becomes a huge challenge. People often shift to experiment locally on their laptops and replicate solutions on the banking systems. For something as simple as getting a kedro-viz you need a new port to be opened and many setups wont allow you to do that. If there was a way to clone the pipelines , we create either a folder and then move it or create a json for the whole pipeline and move it to a new env.
Possible Implementation
i can see 2 ways to do this :
kedro --clone
This will :
catalog.yml
credentials.yml
filesrc/requirements.txt
Write : Pack everything in a json
kedro --save-json mypipeline.json
Read :
kedro new --from-json mypipeline.json
This reads all the config including an empty catalog (maybe just with the placeholders) and empty nodes (maybe just with the function names) and create a pipeline structure that you can take to any environment.Moreover if some other team wants to build pipeline in the same structure as your team does , they need to do the whole git shebang , do a cleanup of your conf/ directory and tweak functions etc but cloning would give them a readymade starter to do so.
Both the above operations help us avoid PII leak and would be great to replicate experiments across different environments.
Possible Alternatives
I can see how we can do this with git too but that involves manually handling and cleaning some of the files and/or managing access for people outside your org (at times).
The text was updated successfully, but these errors were encountered: