Logical Intermediate Pipeline Representation #3703

talebzeghmi · 2020-05-06T16:50:54Z

There are currently two planned Kubeflow pipeline (KFP) efforts to compile to an intermediate representation.

Kubeflow Pipelines and Tekton #3647: design doc
Merged TFX and KFP SDK: design doc

I'm creating this issue to coordinate and single out an intermediate representation (IR). The IR would ideally be a neutral project outside of both KFP and TFX, and so should the Python SDK to produce such an IR as @animeshsingh suggested.

Possible immediate representations (no order):

Common Workflow Language: https://en.wikipedia.org/wiki/Common_Workflow_Language
https://metadata.datadrivendiscovery.org/schemas/v0/pipeline.json

Sharing and persistence

For component sharing the KFP SDK already has a portable platform-independent structure called ComponentSpec (schema, outline, approximate proto). This structure is usually serialized into component.yaml definition files and we have built a big ecosystem of those components (hundreds of components). This format has existed in stable version from the first release of KFP. We are pledging support for components authored and persisted in this format. See an example of component yaml.

Graph components

While most components are backed by a container, the ComponentSpec structure also allows having a graph implementation. (schema, outline, approximate proto)
This essentially enables 'pipeline-as-component' feature. See an example of an end-to-end pipeline saved as a graph component yaml.

KFP's Python SDK provides a way to

Load an existing [graph] component so that it can be used in a pipeline: load_component_from_url
Create graph components from a pipeline function: create_graph_component_from_pipeline_func. This function is currently not full-featured, but we can improve it based on user feedback.

A pipeline saved as a graph component can be easily sent for execution:

my_pipeline_op = kfp.components.load_component_from_url(...)
kfp.Client().create_run_from_pipeline_func(my_pipeline_op, arguments={})

being part of the ComponentSpec format, graph components provide portable platform-independent pipeline persistence format. This format is simple, portable and minimalistic, so it is pretty easy to convert to some other workflow format for execution: Argo, Tekton, Airflow, etc.

The advice from the KFP team is:

If you want to persist a pipeline, the python code is the most supported way for now, but if you want to persist the pipeline in a non-python format, then use the graph component file format.

We'd like the graph component format to be the KFP's logical intermediate pipeline representation.

Open questions:

Adding Backend support for IR/graph-components

This has been considered, but not all stakeholders were on board.
Implementation is doable.

Making Frontend IR-native

Frontend directly works with the workflow status object updated by the orchestrator. Since the orchestrator is Argo, the frontend is currently tied to the Argo WorkflowStatus structure (in addition to the WorkflowSpec structure).

Refactoring the DSL to be based on the Graph Component structures.

This might be a good idea that can improve the capabilities of the create_graph_component_from_pipeline_func function that essentially compiles python pipeline to the IR.

Implementing those major refactorings does not seem to bring immediate improvements for the KFP users and the KFP team might lack the bandwidth to implement the above feature.

We're seeking feedback for improving the pipeline --> graph component transformation code so that more features are available. We're also encouraging the compiler author to consider the graph component format as the intermediate representation they can work with.

animeshsingh · 2020-05-28T04:13:01Z

Thanks @Ark-kun
Is this the final word on IR - Using GraphComponentSpec? We were being told there another IR in consideration with TFX team - has that goal been dropped?

If answer is yes to the above question, then this makes sense. Now apart from the issues you mentioned

Adding Backend support for IR/graph-components
Making Frontend IR-native
Refactoring the DSL to be based on the Graph Component structures.

few other things are needed

Pipelines either are persisted in Python, or using this IR, and that's how they are shared e.g. on Google's AIHub
Additionally these are the limitations in the IR

Only works on containerOps
Will not work on ResourceOp, VolumeOp, VolumeSnapShotOp, ExitOp
Conditionals, Loops, and nested pipeline are not yet supported for KFP 0.5
executionOptions for adding Kubernetes spec doesn’t seem to work.
Features such as input artifacts, runAfter, and timeout are also not part of the Component.yaml spec.

cc @neuromage @paveldournov @jessiezcc

animeshsingh · 2020-05-28T04:21:16Z

some more details in the slides I used for pipeline community meeting
https://www.slideshare.net/AnimeshSingh/kubeflow-pipelines-with-tekton

Ark-kun · 2020-05-28T08:45:17Z

Is this the final word on IR - Using GraphComponentSpec? We were being told there another IR in consideration with TFX team - has that goal been dropped?

No, that goal is not dropped. I guess my answer was ambiguous. I've reworded it.
The information about graph ComponentSpec was only applicable to people who only use KFP SDK and want to persist their pipelines right now. Graph ComponentSpec is only an alternative to inventing a new IR, not an alternative to TFX IR.

The TFX IR is the road forward for the future development. Please try using it.

Pipelines either are persisted in Python, or using this IR, and that's how they are shared e.g. on Google's AIHub

I'm not sure I fully understand what you want to say with this item.

The reason that the sample pipelines use Python is so that they are easier to understand for the users (Python DSL vs YAML). We do not have good editing tools for YAML-based pipelines.

You can upload graph component.yaml files to AI Hub. You can even have zip files with both Argo's pipeline.yaml and component.yaml. load_component_from_url has supported loading them since AI-Hub launch.

Additionally these are the limitations in the IR

I think most of the limitations are not really limitations of the graph ComponentSpec format, but rather limitations of the particular SDK and how it consumes/produces ComponentSpec. If you change the pipeline persistence format, these missing feature implementations won't magically fix themselves.

We're practicing "demand-driven-development" where we implement the features when there are requests for them. Please file the issues so we can learn about the demand. For example, we have an open PR for making use of the Kubernetes options from loaded graph component tasks (getting passing kubernetes_options to ContainerOp). But the PR is not going forward due to lack of demand.

Will not work on ResourceOp, VolumeOp, VolumeSnapShotOp

I'd consider these to be pretty Argo-specific.

Argo implements those as containers that just run kubectl. I think would make the SDK more portable if we change ResourceOp to use an explicit container. I've already created a PR last week to do that for ResourceOp.delete(): https://github.com/kubeflow/pipelines/pull/3841/files ResourceOp.create() will follow.

ExitOp

Per-pipeline ExitOp probably belongs to the PipelineRunSpec, not ComponentSpec.

Conditionals

Conditionals are supported by the format (although not applied during loading). See TaskSpec.is_enabled.

nested pipeline

? The graph ComponentSpec supports that. Every TaskSpec has component_ref which can point to any component - container or graph. The loading should also work (I need to check and add a test for that).

Loops

This is the only real limitation of the ComponentSpec as of now. Loops are not easy to design (especially to be portable). Also loops usually require advanced capabilities for output aggregation which is even harder to design. I have plans to add for-style loop with foreach-style loops (like Argo's withItems) implemented by compilers as syntactic sugar.

executionOptions for adding Kubernetes spec doesn’t seem to work.

This is a missing SDK feature which was moving slowly due to lack of demand. Please create/upvote issues, so that features can be prioritized. See #3447 and #3448

Features such as input artifacts

Can you explain the feature and the limitation? I think ComponentSpec has same or better support for artifacts then even ContainerOp.

runAfter, and timeout

Please create feature request issues. The whole available design is bigger than the implemented parts since we do not want to implement features prematurely until there is demand.

When features are implemented prematurely without first collecting the demand feedback first, the design can be suboptimal and require breaking changes in the future. We're trying to aviod that by keeping the design minimal.

P.S. For some of the features you've listed there are sizeable gaps in DSL -> graph ComponentSpec and graph ComponentSpec -> ContainerOp conversion, but these are not the issues in the format itself. Whatever is used as IR, these gaps will need to be filled.

P.P.S. One reason for some of the DSL -> graph ComponentSpec gaps is that the component library part of KFP (kfp.components) is explicitly independent from the DSL and compiler.

Ark-kun · 2020-05-28T08:54:23Z

some more details in the slides I used for pipeline community meeting
https://www.slideshare.net/AnimeshSingh/kubeflow-pipelines-with-tekton

P.S. I really liked your presentation and the visual pipeline editing tool.

talebzeghmi · 2020-05-28T18:49:46Z

The TFX IR is the road forward for the future development. Please try using it.

@Ark-kun can you please share what the TFX IR is?

I ask because Metaflow are asking for the KFP IR to interface with KFP here Netflix/metaflow#16 (comment)

thanks!

Ark-kun · 2020-05-28T19:34:34Z

A clarification:

My initial answer might have been ambiguous. The TFX IR is the recommended way forward for future development. The information about the KFP's graph ComponentSpec was only applicable to people who only use KFP SDK and want to persist their pipelines right now. It was suggested only an alternative to inventing a new IR, not an alternative to TFX IR. The TFX IR is the recommended way forward for future development.

talebzeghmi · 2020-05-28T20:12:00Z

Thanks @Ark-kun is the TFX IR in development or ready to share? Are you able to share links?
I ask because I'm interested in having Metaflow compile to the IR and run on KFP.
thank you!

rmgogogo · 2020-05-29T01:41:19Z

Thanks @Ark-kun is the TFX IR in development or ready to share? Are you able to share links?
I ask because I'm interested in having Metaflow compile to the IR and run on KFP.
thank you!

@zhitaoli on TFX IR.

We are actively discussing this topic now and should can provide an initial update here around middle of next week. It's possible we may define a layered IR (core and different extensions).

animeshsingh · 2020-05-29T02:12:11Z

Thanks @Ark-kun for clarification. So established that TFX IR is the future.

@rmgogogo looking forward to the update

rmgogogo · 2020-06-03T07:00:06Z

The first initial checkin for IR is here.
https://github.com/tensorflow/tfx/blob/master/tfx/proto/orchestration/pipeline.proto#L318

@zhitaoli to correct me.

As for how KFP side do changes correspondingly, I'm still evaluating more details. It's a big change and may worth a version number 2.0.

My current thought / design goal is

decouple orchestrator (e.x. Argo) from pipeline authoring (SDK) via IR.

The IR is expected to abstract the K8s concepts (e.x. PVC, ConfigMap etc.).
So that the IR can be quick tested in other env (e.x. local/dev env without a K8s cluster).

decouple orchestrator (e.x. Argo) from front-end / visualizer

Currently our FE has many K8s/Argo concepts while many data can be fetched from file-system and normally indexed by MLMD. Alexey also mentioned same in previous reply: "Making Frontend IR-native". I may extend it as "Making Frontend IR & MLMD native". So if Tekton solution proposed by Animesh can generates the same data, the new visualizer should can work.

Welcome more inputs.

jlewi · 2020-06-03T13:21:55Z

@rmgogogo or @zhitaoli is there a corresponding RFC/doc that describes the thought behind the IR in more details?

@rmgogogo @talebzeghmi What are the implications of using proto as opposed to say OpenAPI/Swagger as the IDL?

The IR is expected to abstract the K8s concepts (e.x. PVC, ConfigMap etc.).

What does this mean in terms of how K8s concepts get surfaced? e.g. if people want to be able to attach PVCs do we first need to define a suitable abstraction in the IR?

/cc @animeshsingh

zhitaoli · 2020-06-03T18:54:56Z

We are working towards publishing a doc about the IR. Because it also includes semantics for async pipelines which might be foreign to batch-based pipelines, we intend to take a gradual approach to first discuss those semantics, then present an IR proposal which can be used to model that.

…

On Wed, Jun 3, 2020 at 6:22 AM Jeremy Lewi ***@***.***> wrote: @rmgogogo <https://github.com/rmgogogo> or @zhitaoli <https://github.com/zhitaoli> is there a corresponding RFC/doc that describes the thought behind the IR in more details? @rmgogogo <https://github.com/rmgogogo> @talebzeghmi <https://github.com/talebzeghmi> What are the implications of using proto as opposed to say OpenAPI/Swagger as the IDL? The IR is expected to abstract the K8s concepts (e.x. PVC, ConfigMap etc.). What does this mean in terms of how K8s concepts get surfaced? e.g. if people want to be able to attach PVCs do we first need to define a suitable abstraction in the IR? /cc @animeshsingh <https://github.com/animeshsingh> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3703 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAY6AZT3ER5RPFOTW5OW42TRUZFAFANCNFSM4M2TLI7Q> .

-- Cheers, Zhitao Li

animeshsingh · 2020-06-03T19:06:22Z

Thanks @rmgogogo and @zhitaoli - great news! Will dive deeper into the IDL.

Vis a vis OpenAPI/Swagger would be perfect, but proto is not a deal breaker here. What maybe can be am issue is if we are not able to surface k8s constructs (PVC, ConfigMap, and there quite a few more in KFP DSL) either through core IDL, or an IDL extension for folks running on Kube, which is what more than half of the enterprises are using in production right now.

Our efforts below to map Kubeflow Pipeline DSL to Tekton backend enlist all the DSL functionalities we have to implement for Tekton, and we would expect IDL to be able to capture those from DSL, so that they can relayed back to Argo or Tetkon.

https://github.com/kubeflow/kfp-tekton/blob/master/sdk/FEATURES.md

And again, the more collaborative we can be on this effort, better for the project, so moving an IR proposal as quickly as possible to have community align it help out will work to everyone's mutual advantage here.

rmgogogo · 2020-06-05T18:17:07Z

"which is what more than half of the enterprises are using in production right now."

+1, it's important from runtime perspective.

From ML perspective, here is one worth a check to understand more of @zhitaoli 's previous replies.
tensorflow/community#253

animeshsingh · 2020-06-25T09:35:27Z

@rmgogogo @zhitaoli any update on the IR RFC/doc?

rmgogogo · 2020-06-29T11:31:19Z

@rmgogogo @zhitaoli any update on the IR RFC/doc?

@hongye-sun (didn't find Ruoyu's Github account)

Some high level info around how we plan to provide an IR and support it.

each step in a pipeline will have an execution spec
execution spec can be
(a) container based (OK to be run in Docker without K8s)
(b) K8s based (has K8s specific fields comparing with container based)
(c) python class based (for local/special-base-container-image)
(d) customized (e.x. calls GCP APIs)
frontend compiler: DSL codes => IR
backend compiler: IR => specific env's spec (e.x. Argo yaml, Tekton yaml, special CRD's yaml or special API calls)
It's suggested to runner that better rely on MLMD for input/output etc. data tracking and also good for common visualization etc.

rmgogogo · 2020-06-29T11:38:26Z

As for Tekton runner, one big difference between Argo is that it can run multiple steps in one pod so that it can share common setup, e.x. secret, volume. Any others I missed? I'm thinking on whether we bring this concept to IR but would like to learn more Tekton benefits.

In ML pipeline, actually put multiple steps to one pod may has its disadvantage, e.x. one step requires big CPU/MEM/GPU while others don't. So better don't put it to one pod. In Tekton it has to be in two pods but it also means loss the benefit of using Tekton differentiated feature.

(BTW, it's also possible we impl another orchestration engine, not on Argo or Tekton but more MLMD friendly as MLMD itself is highly related with orchestration. It's more like long term plan. In first step (Q3/Q4), we may still do delta changes based on Argo.)

jlewi · 2020-06-29T23:05:15Z

@rmgogogo For the execution spec; is there a proto or OpenAPI Spec somewhere that indicates what this might look like?

jlewi · 2020-07-21T14:41:14Z

@rmgogogo ping?

stale · 2020-10-19T16:53:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rmgogogo · 2020-10-21T02:19:50Z

Hongye's PR: Pipeline IR #4371 contains the detail info.

Bobgy · 2020-10-22T09:07:30Z

/lifecycle frozen

RobbeSneyders · 2023-10-18T10:29:52Z

Is there still an ongoing effort to get the IR YAML format supported by different SDKs and execution engines? And if so, where could I find the status?

We are now compiling Fondant pipelines to IR YAML, partially relying on the KfP SDK, which allows us to run them on both KfP and Vertex AI. We would be interested in the ability to execute Fondant pipelines on more execution engines leveraging IR YAML.

We have also implemented a simple LocalRunner based on Docker Compose. We currently compile to Docker Compose directly, but if there's wider support for IR YAML, we would be interested to use it as an intermediate representation and then compile to Docker Compose from there. This could lead to a Fondant SDK and simple Docker Compose execution engine for IR YAML.

RobbeSneyders · 2023-11-15T07:44:39Z

@Ark-kun could you tell me where I can find the latest status on this? See my message above.

Ark-kun self-assigned this May 6, 2020

rmgogogo assigned james-jwu and neuromage May 7, 2020

rmgogogo added the kind/discussion label May 7, 2020

Bobgy added the status/triaged Whether the issue has been explicitly triaged label May 7, 2020

This was referenced May 19, 2020

Support for Airflow and Kubernetes Netflix/metaflow#16

Closed

Integration with Kubeflow pipelines Netflix/metaflow#25

Closed

rmgogogo self-assigned this May 29, 2020

jlewi mentioned this issue Aug 7, 2020

RFC: TFX DSL Data Model and IR tensorflow/community#271

Merged

hongye-sun mentioned this issue Aug 13, 2020

Pipeline IR #4371

Merged

talebzeghmi mentioned this issue Sep 24, 2020

Metaflow Pluginify our KFP Plugin zillow/metaflow#12

Merged

stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Oct 19, 2020

stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Oct 21, 2020

k8s-ci-robot added the lifecycle/frozen label Oct 22, 2020

alexec mentioned this issue Dec 7, 2021

Kubeflow Pipelines "Intermediate Representation" argoproj/argo-workflows#6919

Closed

james-jwu closed this as completed Jun 13, 2023

Logical Intermediate Pipeline Representation #3703

Logical Intermediate Pipeline Representation #3703

Comments

talebzeghmi commented May 6, 2020 • edited Loading

animeshsingh commented May 6, 2020

kumare3 commented May 7, 2020

talebzeghmi commented May 7, 2020

kumare3 commented May 8, 2020

karlschriek commented May 12, 2020

kumare3 commented May 12, 2020

jlewi commented May 13, 2020

savingoyal commented May 16, 2020 • edited Loading

kumare3 commented May 16, 2020

jlewi commented May 16, 2020 • edited Loading

talebzeghmi commented May 22, 2020

Ark-kun commented May 27, 2020 • edited Loading

Sharing and persistence

Graph components

Open questions:

Adding Backend support for IR/graph-components

Making Frontend IR-native

Refactoring the DSL to be based on the Graph Component structures.

animeshsingh commented May 28, 2020 • edited Loading

animeshsingh commented May 28, 2020

Ark-kun commented May 28, 2020 • edited Loading

Ark-kun commented May 28, 2020

talebzeghmi commented May 28, 2020 • edited Loading

Ark-kun commented May 28, 2020 • edited Loading

talebzeghmi commented May 28, 2020

rmgogogo commented May 29, 2020

animeshsingh commented May 29, 2020

rmgogogo commented Jun 3, 2020

jlewi commented Jun 3, 2020

zhitaoli commented Jun 3, 2020 via email

animeshsingh commented Jun 3, 2020 • edited Loading

rmgogogo commented Jun 5, 2020 • edited Loading

animeshsingh commented Jun 25, 2020

rmgogogo commented Jun 29, 2020 • edited Loading

rmgogogo commented Jun 29, 2020

jlewi commented Jun 29, 2020

jlewi commented Jul 21, 2020

stale bot commented Oct 19, 2020

rmgogogo commented Oct 21, 2020

Bobgy commented Oct 22, 2020

RobbeSneyders commented Oct 18, 2023

RobbeSneyders commented Nov 15, 2023

talebzeghmi commented May 6, 2020 •

edited

Loading

savingoyal commented May 16, 2020 •

edited

Loading

jlewi commented May 16, 2020 •

edited

Loading

Ark-kun commented May 27, 2020 •

edited

Loading

animeshsingh commented May 28, 2020 •

edited

Loading

Ark-kun commented May 28, 2020 •

edited

Loading

talebzeghmi commented May 28, 2020 •

edited

Loading

Ark-kun commented May 28, 2020 •

edited

Loading

animeshsingh commented Jun 3, 2020 •

edited

Loading

rmgogogo commented Jun 5, 2020 •

edited

Loading

rmgogogo commented Jun 29, 2020 •

edited

Loading