-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I run my pipeline programmatically after I packaged my project? #370
Comments
@limdauto looks like one for you. I believe there's improvements/fixes for this coming up soon in the next release |
Hi @f-istvan, since
Then you can run your package with In the upcoming 0.16 release, we have removed the requirement for the |
Hi @limdauto, Adding the files and directories you described fixed my issue. I can run my pipeline with
It's like cli (python -m my_project.run) vs script (python my_external_runner_script.py). Thank you! |
@limdauto Can you explain the reasoning behind having |
@f-istvan Oh this is a very interesting use case. Interestingly enough, I thought calling @WaylonWalker That's a good question. We touched on this in our newly updated documentation about packaging: https://kedro.readthedocs.io/en/latest/03_tutorial/05_package_a_project.html?highlight=package#package-your-project
This mental model unfortunately breaks when it comes to |
Would it be possible to package up the catalog? I have had some folks that do a lot of hypothesis testing that want fast access to data, and very little interest in building/running pipelines want the catalog for certain projects made easily portable. They would really like to be able to Personally I have a couple of pipelines that seem to reuse the same sections of pipeline over and over. I would really like to be able to make those their own package, import them into new projects and just append the pipeline and catalog to the new pipeline with no more than a small bulk change in the path or s3 bucket. |
@WaylonWalker Yes, you can. It's a bit more involved. In The problem, however, is that your package data need to be on the same level as your
Then you should be able to bundle your configuration with your binary distribution. It's not what we would recommend as the default setup, but it should work. |
Thanks @limdauto, I got everything fully working from a pip installed wheel. I moved conf/ and .kedro.yml into the library, included them in the Manifest, modified some path variables and it is working much easier than I anticipated. This is a big move for us. I do not really see as many use cases for using different catalogs as I do for folks who want to be able to pip install project and get access to all of the project's data. For the most part, I think I can take care of dev/prod easier with a transformer than walking everyone through how to get the catalog working for hypothesis testing. |
@f-istvan I'm going to be closing this but please feel free to re-open it if you still have problems. |
@f-istvan did you ever manage to find a solution to this, or did you revert to using something like |
I ran
kedro package
and distributed my kedro project. Now, when I do apip install my_procejt
I get the package but the question is how can I run my pipeline from an external.py
script. What should I import to my external python file and how can I run my pipeline?Based on the generated
kedro_cli.py
I tried this (my_external_runner_script.py):The
Path.cwd()
is obviously wrong in this case. I could not find any info about this in the documentation?Could you please give me a hint how to do this?
Thank you,
Stefan
The text was updated successfully, but these errors were encountered: