-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logical Intermediate Pipeline Representation #3703
Comments
@talebzeghmi this is something we have been discussing with KFP team, and hope to reach a conclusion in very near future around the state of IR, and get the effort moved in community soon. |
Hello all, Ketan |
The benefit of an intermediate representation (IR), especially if residing in a neutral repo outside from KFP, is that disparate ML SDKs can compile to the same IR, and the IR can be executed by disparate engines. Possible SDKs:
Possible execution engines:
|
@talebzeghmi at the moment Flyte has an intermediate representation, specified in protobuf - https://github.com/lyft/flyteidl/blob/master/protos/flyteidl/core/workflow.proto#L147 So this allows FlyteAdmin (control plane to JIT to the executable representation) -> so in our case from this representation to FlytePropeller's Workflow CRD. We could potentially alternatively target Argo CRD or Tekton too, I am sure it is not 1:1, and there in lies the challenge. |
@kumare3 I work with quite a few teams who are currently thinking of moving their workflows to Kubeflow, but they are also interested initiatives like Flyte (and also Metaflow), but at the moment it is hard to concretely show them how how these could work together. Metaflow currently comes across as very much its own thing. (Although they claim architectural independence in their design, the fact is that currently it only supports AWS Managed Services in any meaningful way). The fact that Flyte runs on Kubernetes seems to be a natural fit for Kubeflow. Anyway, I don't want to hijack the specific discussion here, just wanted to point out that having Flyte play nicely with Kubeflow (Pipelines in particular) would likely be greeted with enthusiasm by a lot of users. |
@karlschriek I am absolutely open to all conversations where we would love to serve the Kubeflow community in general. But, we are planning to have an experimental "Kubeflow Pipelines" to "Flyte" compiler soon, which should allow using Flyte to run Kubeflow pipelines. This is not the ideal solution because, it will not use Flyte's native Plugin system, and will not use KfP's UI (as they have no hooks today). But, we could start pushing for the right integration? On the other hand @karlschriek please join Flyte Slack and you can ping me, if we want to discuss ideas |
@kumare3 Would you be interested in presenting flyte at an upcoming Kubeflow community meeting? Just leave a comment in the Google doc to let us know which date you are interested in presenting on. |
@karlschriek Although most of our (metaflow) integrations are with AWS managed services, there is work underway to integrate with GCP and K8S. We also have a PR out for compiling Metaflow workflows into AWS Step Functions specification, and there is significant interest within the community to have a similar integration with KfP. |
@jlewi I would definitely be interested to presenting Flyte at the community meeting. Do you think a general presentation about Flyte and our design decisions would be good? |
@kumare3 yes. Could you email the mailing list at kubeflow-discuss@googlegroups.com to coordinate a presentation at an upcoming community meeting. |
Mitar Milutinovic suggested [1] looking at https://github.com/openml/flow2 |
Please take a look at the TFX IR for a Logical pipeline representation. You can author a pipeline using the TFX SDK and then submit it for execution on Kubeflow. The TFX IR is the suggested way forward for future development. Only if for some reason you cannot use TFX SDK and TFX IR, then the KFP SDK already has a structure for pipeline persistence. Sharing and persistenceFor component sharing the KFP SDK already has a portable platform-independent structure called Graph componentsWhile most components are backed by a container, the KFP's Python SDK provides a way to
A pipeline saved as a graph component can be easily sent for execution: my_pipeline_op = kfp.components.load_component_from_url(...)
kfp.Client().create_run_from_pipeline_func(my_pipeline_op, arguments={}) being part of the The advice from the KFP team is: If you want to persist a pipeline, the python code is the most supported way for now, but if you want to persist the pipeline in a non-python format, then use the graph component file format. We'd like the graph component format to be the KFP's logical intermediate pipeline representation. Open questions:Adding Backend support for IR/graph-componentsThis has been considered, but not all stakeholders were on board. Making Frontend IR-nativeFrontend directly works with the workflow status object updated by the orchestrator. Since the orchestrator is Argo, the frontend is currently tied to the Argo WorkflowStatus structure (in addition to the WorkflowSpec structure). Refactoring the DSL to be based on the Graph Component structures.This might be a good idea that can improve the capabilities of the create_graph_component_from_pipeline_func function that essentially compiles python pipeline to the IR. Implementing those major refactorings does not seem to bring immediate improvements for the KFP users and the KFP team might lack the bandwidth to implement the above feature. We're seeking feedback for improving the |
Thanks @Ark-kun If answer is yes to the above question, then this makes sense. Now apart from the issues you mentioned
few other things are needed
|
some more details in the slides I used for pipeline community meeting |
No, that goal is not dropped. I guess my answer was ambiguous. I've reworded it. The TFX IR is the road forward for the future development. Please try using it.
I'm not sure I fully understand what you want to say with this item. The reason that the sample pipelines use Python is so that they are easier to understand for the users (Python DSL vs YAML). We do not have good editing tools for YAML-based pipelines. You can upload graph component.yaml files to AI Hub. You can even have zip files with both Argo's pipeline.yaml and component.yaml.
I think most of the limitations are not really limitations of the graph ComponentSpec format, but rather limitations of the particular SDK and how it consumes/produces ComponentSpec. If you change the pipeline persistence format, these missing feature implementations won't magically fix themselves. We're practicing "demand-driven-development" where we implement the features when there are requests for them. Please file the issues so we can learn about the demand. For example, we have an open PR for making use of the Kubernetes options from loaded graph component tasks (getting passing kubernetes_options to ContainerOp). But the PR is not going forward due to lack of demand.
I'd consider these to be pretty Argo-specific. Argo implements those as containers that just run kubectl. I think would make the SDK more portable if we change ResourceOp to use an explicit container. I've already created a PR last week to do that for ResourceOp.delete(): https://github.com/kubeflow/pipelines/pull/3841/files ResourceOp.create() will follow.
Per-pipeline ExitOp probably belongs to the PipelineRunSpec, not ComponentSpec.
Conditionals are supported by the format (although not applied during loading). See TaskSpec.is_enabled.
? The graph ComponentSpec supports that. Every
This is the only real limitation of the ComponentSpec as of now. Loops are not easy to design (especially to be portable). Also loops usually require advanced capabilities for output aggregation which is even harder to design. I have plans to add for-style loop with foreach-style loops (like Argo's withItems) implemented by compilers as syntactic sugar.
This is a missing SDK feature which was moving slowly due to lack of demand. Please create/upvote issues, so that features can be prioritized. See #3447 and #3448
Can you explain the feature and the limitation? I think
Please create feature request issues. The whole available design is bigger than the implemented parts since we do not want to implement features prematurely until there is demand. When features are implemented prematurely without first collecting the demand feedback first, the design can be suboptimal and require breaking changes in the future. We're trying to aviod that by keeping the design minimal. P.S. For some of the features you've listed there are sizeable gaps in P.P.S. One reason for some of the |
P.S. I really liked your presentation and the visual pipeline editing tool. |
@Ark-kun can you please share what the TFX IR is? I ask because Metaflow are asking for the KFP IR to interface with KFP here Netflix/metaflow#16 (comment) thanks! |
A clarification: My initial answer might have been ambiguous. The TFX IR is the recommended way forward for future development. The information about the KFP's graph |
Thanks @Ark-kun is the TFX IR in development or ready to share? Are you able to share links? |
@zhitaoli on TFX IR. We are actively discussing this topic now and should can provide an initial update here around middle of next week. It's possible we may define a layered IR (core and different extensions). |
The first initial checkin for IR is here. @zhitaoli to correct me. As for how KFP side do changes correspondingly, I'm still evaluating more details. It's a big change and may worth a version number 2.0. My current thought / design goal is
The IR is expected to abstract the K8s concepts (e.x. PVC, ConfigMap etc.).
Currently our FE has many K8s/Argo concepts while many data can be fetched from file-system and normally indexed by MLMD. Alexey also mentioned same in previous reply: "Making Frontend IR-native". I may extend it as "Making Frontend IR & MLMD native". So if Tekton solution proposed by Animesh can generates the same data, the new visualizer should can work. Welcome more inputs. |
@rmgogogo or @zhitaoli is there a corresponding RFC/doc that describes the thought behind the IR in more details? @rmgogogo @talebzeghmi What are the implications of using proto as opposed to say OpenAPI/Swagger as the IDL?
What does this mean in terms of how K8s concepts get surfaced? e.g. if people want to be able to attach PVCs do we first need to define a suitable abstraction in the IR? /cc @animeshsingh |
We are working towards publishing a doc about the IR. Because it also
includes semantics for async pipelines which might be foreign to
batch-based pipelines, we intend to take a gradual approach to first
discuss those semantics, then present an IR proposal which can be used to
model that.
…On Wed, Jun 3, 2020 at 6:22 AM Jeremy Lewi ***@***.***> wrote:
@rmgogogo <https://github.com/rmgogogo> or @zhitaoli
<https://github.com/zhitaoli> is there a corresponding RFC/doc that
describes the thought behind the IR in more details?
@rmgogogo <https://github.com/rmgogogo> @talebzeghmi
<https://github.com/talebzeghmi> What are the implications of using proto
as opposed to say OpenAPI/Swagger as the IDL?
The IR is expected to abstract the K8s concepts (e.x. PVC, ConfigMap etc.).
What does this mean in terms of how K8s concepts get surfaced? e.g. if
people want to be able to attach PVCs do we first need to define a suitable
abstraction in the IR?
/cc @animeshsingh <https://github.com/animeshsingh>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3703 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAY6AZT3ER5RPFOTW5OW42TRUZFAFANCNFSM4M2TLI7Q>
.
--
Cheers,
Zhitao Li
|
Thanks @rmgogogo and @zhitaoli - great news! Will dive deeper into the IDL. Vis a vis OpenAPI/Swagger would be perfect, but proto is not a deal breaker here. What maybe can be am issue is if we are not able to surface k8s constructs (PVC, ConfigMap, and there quite a few more in KFP DSL) either through core IDL, or an IDL extension for folks running on Kube, which is what more than half of the enterprises are using in production right now. Our efforts below to map Kubeflow Pipeline DSL to Tekton backend enlist all the DSL functionalities we have to implement for Tekton, and we would expect IDL to be able to capture those from DSL, so that they can relayed back to Argo or Tetkon. https://github.com/kubeflow/kfp-tekton/blob/master/sdk/FEATURES.md And again, the more collaborative we can be on this effort, better for the project, so moving an IR proposal as quickly as possible to have community align it help out will work to everyone's mutual advantage here. |
+1, it's important from runtime perspective. From ML perspective, here is one worth a check to understand more of @zhitaoli 's previous replies. |
@hongye-sun (didn't find Ruoyu's Github account) Some high level info around how we plan to provide an IR and support it.
|
As for Tekton runner, one big difference between Argo is that it can run multiple steps in one pod so that it can share common setup, e.x. secret, volume. Any others I missed? I'm thinking on whether we bring this concept to IR but would like to learn more Tekton benefits. In ML pipeline, actually put multiple steps to one pod may has its disadvantage, e.x. one step requires big CPU/MEM/GPU while others don't. So better don't put it to one pod. In Tekton it has to be in two pods but it also means loss the benefit of using Tekton differentiated feature. (BTW, it's also possible we impl another orchestration engine, not on Argo or Tekton but more MLMD friendly as MLMD itself is highly related with orchestration. It's more like long term plan. In first step (Q3/Q4), we may still do delta changes based on Argo.) |
@rmgogogo For the execution spec; is there a proto or OpenAPI Spec somewhere that indicates what this might look like? |
@rmgogogo ping? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hongye's PR: Pipeline IR #4371 contains the detail info. |
/lifecycle frozen |
Is there still an ongoing effort to get the IR YAML format supported by different SDKs and execution engines? And if so, where could I find the status? We are now compiling Fondant pipelines to IR YAML, partially relying on the KfP SDK, which allows us to run them on both KfP and Vertex AI. We would be interested in the ability to execute Fondant pipelines on more execution engines leveraging IR YAML. We have also implemented a simple |
@Ark-kun could you tell me where I can find the latest status on this? See my message above. |
There are currently two planned Kubeflow pipeline (KFP) efforts to compile to an intermediate representation.
I'm creating this issue to coordinate and single out an intermediate representation (IR). The IR would ideally be a neutral project outside of both KFP and TFX, and so should the Python SDK to produce such an IR as @animeshsingh suggested.
Possible immediate representations (no order):
Common Workflow Language: https://en.wikipedia.org/wiki/Common_Workflow_Language
https://metadata.datadrivendiscovery.org/schemas/v0/pipeline.json
See also:
MLGraph or subset of it.
PFA (successor to PMML).
https://github.com/openml/flow2 from https://www.openml.org/
The text was updated successfully, but these errors were encountered: