-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment support/Kedro deployer #2058
Comments
So I love this and think we should really emphasise the power of modular pipelines here as a standout feature of Kedro. Choosing the granularity of what to translate is critical here.
Couler looks v cool btw. |
Something that has been done on some plugins, we should come up with a flexible way that |
I guess #143 is tangentially related? |
It is! Although I am thinking more about deployment for platform/orchestrator here, but there are definitely case user deploying kedro pipeline to an endpoint.
|
Not sure where should it go, so I just put here as this is amateur thought. Inspired by deepyaman, it seems that it's fairly easy to convert a Kedro pipeline to Metaflow. It's just a variable assignment between steps (node in Kedro's term). The benefit of using Metaflow is that it allows you to abstract infrastructure with a decorator like This is not saying Kedro is gonna to integrate with Metaflow, but showing that the possibility of doing it.
|
Related: #3094 That issue contains a research synthesis, we can use this issue to collect the plan. |
(found after listening to @ankatiyar's Tech Design on |
As we learn more about how to deploy to Airflow (see @DimedS's #3860) it becomes more apparent that the list of steps can become quite involved. This is not new, as the Databricks workflows also require some care. We have plenty of evidence that this is an important pain point that affects lots of users. The main questions are:
|
We should really treat Kedro as a Pipeline DSL. Most of the orchestrator work as a DAG, so these are all generic feature. For example:
So there is definitely a theme around DAG manipulation, is the current Pipeline API flexible enough? We have to implement a separate Where will be the best place to hold these metadata? maybe tag? There are some early vision in #770 and #3094 done by @datajoely last year. |
Yeah - my push here is not to focus too much on Airflow and really address the fundamental problems which make Kedro slightly awkward in these situations |
Possible inspiration: Apache Beam https://beam.apache.org/get-started/beam-overview/
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
options = PipelineOptions([
"--runner=PortableRunner",
"--job_endpoint=localhost:8099",
"--environment_type=LOOPBACK"
])
with beam.Pipeline(options) as p:
... There's a difference though between deploying and running, this is more similar to our Dask runner https://docs.kedro.org/en/stable/deployment/dask.html But maybe it means that we need a more clear mental model too. In my head, when we deploy a Kedro project, Kedro is no longer responsible for the execution, whereas writing a runner implies that Kedro is in fact driving the execution and acting as a poor-man's orchestrator. |
I guess there is a philosophical question (very much related to granualirity) on how you express which chunks of pipeline(s) get executed in different distributed systems. The execution plan of a Kedro DAG is resolved at runtime, we do not have a standardised way of:
I'm not against the context manager approach for doing this in principle - but I think it speaks to the more fundamental problem of how some implicit elements of Kedro (which increase development velocity) leave a fair amount of unmanaged complexity when it gets to this point in the process. |
Description
We want to offer users better support with deploying Kedro.
Implementation ideas
Questions
The text was updated successfully, but these errors were encountered: