Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logical Intermediate Pipeline Representation #3703

Closed
talebzeghmi opened this issue May 6, 2020 · 36 comments
Closed

Logical Intermediate Pipeline Representation #3703

talebzeghmi opened this issue May 6, 2020 · 36 comments
Assignees
Labels
kind/discussion lifecycle/frozen status/triaged Whether the issue has been explicitly triaged

Comments

@talebzeghmi
Copy link

talebzeghmi commented May 6, 2020

There are currently two planned Kubeflow pipeline (KFP) efforts to compile to an intermediate representation.

  1. Kubeflow Pipelines and Tekton #3647: design doc
  2. Merged TFX and KFP SDK: design doc

I'm creating this issue to coordinate and single out an intermediate representation (IR). The IR would ideally be a neutral project outside of both KFP and TFX, and so should the Python SDK to produce such an IR as @animeshsingh suggested.

Possible immediate representations (no order):

  1. Common Workflow Language: https://en.wikipedia.org/wiki/Common_Workflow_Language

  2. https://metadata.datadrivendiscovery.org/schemas/v0/pipeline.json

See also:

  1. MLGraph or subset of it.

  2. PFA (successor to PMML).

  3. https://github.com/openml/flow2 from https://www.openml.org/

@Ark-kun Ark-kun self-assigned this May 6, 2020
@animeshsingh
Copy link
Contributor

@talebzeghmi this is something we have been discussing with KFP team, and hope to reach a conclusion in very near future around the state of IR, and get the effort moved in community soon.

@paveldournov @neuromage @jessiezcc @Ark-kun

@kumare3
Copy link

kumare3 commented May 7, 2020

Hello all,
I lead an effort called Flyte. We could see if Flyte could be a potential target for Kfp. We have an experimental compiler (incomplete), but would love to see what your thoughts/opinions are. We would love to collaborate and explore options.

Ketan

@Bobgy Bobgy added the status/triaged Whether the issue has been explicitly triaged label May 7, 2020
@talebzeghmi
Copy link
Author

The benefit of an intermediate representation (IR), especially if residing in a neutral repo outside from KFP, is that disparate ML SDKs can compile to the same IR, and the IR can be executed by disparate engines.

Possible SDKs:

  • KF Pipelines
  • Metaflow
  • Flyte

Possible execution engines:

  • Argo
  • Tekton
  • Flyte
  • A lightweight local implementation

@kumare3
Copy link

kumare3 commented May 8, 2020

@talebzeghmi at the moment Flyte has an intermediate representation, specified in protobuf - https://github.com/lyft/flyteidl/blob/master/protos/flyteidl/core/workflow.proto#L147

So this allows FlyteAdmin (control plane to JIT to the executable representation) -> so in our case from this representation to FlytePropeller's Workflow CRD. We could potentially alternatively target Argo CRD or Tekton too, I am sure it is not 1:1, and there in lies the challenge.

@karlschriek
Copy link

@kumare3 I work with quite a few teams who are currently thinking of moving their workflows to Kubeflow, but they are also interested initiatives like Flyte (and also Metaflow), but at the moment it is hard to concretely show them how how these could work together.

Metaflow currently comes across as very much its own thing. (Although they claim architectural independence in their design, the fact is that currently it only supports AWS Managed Services in any meaningful way). The fact that Flyte runs on Kubernetes seems to be a natural fit for Kubeflow. Anyway, I don't want to hijack the specific discussion here, just wanted to point out that having Flyte play nicely with Kubeflow (Pipelines in particular) would likely be greeted with enthusiasm by a lot of users.

@kumare3
Copy link

kumare3 commented May 12, 2020

@karlschriek I am absolutely open to all conversations where we would love to serve the Kubeflow community in general. But, we are planning to have an experimental "Kubeflow Pipelines" to "Flyte" compiler soon, which should allow using Flyte to run Kubeflow pipelines.

This is not the ideal solution because, it will not use Flyte's native Plugin system, and will not use KfP's UI (as they have no hooks today). But, we could start pushing for the right integration?

On the other hand @karlschriek please join Flyte Slack and you can ping me, if we want to discuss ideas

@jlewi
Copy link
Contributor

jlewi commented May 13, 2020

@kumare3 Would you be interested in presenting flyte at an upcoming Kubeflow community meeting?
You can find the calendar here

Just leave a comment in the Google doc to let us know which date you are interested in presenting on.

@savingoyal
Copy link

savingoyal commented May 16, 2020

@karlschriek Although most of our (metaflow) integrations are with AWS managed services, there is work underway to integrate with GCP and K8S.

We also have a PR out for compiling Metaflow workflows into AWS Step Functions specification, and there is significant interest within the community to have a similar integration with KfP.

@kumare3
Copy link

kumare3 commented May 16, 2020

@jlewi I would definitely be interested to presenting Flyte at the community meeting. Do you think a general presentation about Flyte and our design decisions would be good?

@jlewi
Copy link
Contributor

jlewi commented May 16, 2020

@kumare3 yes. Could you email the mailing list at kubeflow-discuss@googlegroups.com to coordinate a presentation at an upcoming community meeting.

@talebzeghmi
Copy link
Author

@Ark-kun
Copy link
Contributor

Ark-kun commented May 27, 2020

Please take a look at the TFX IR for a Logical pipeline representation. You can author a pipeline using the TFX SDK and then submit it for execution on Kubeflow.

The TFX IR is the suggested way forward for future development.

Only if for some reason you cannot use TFX SDK and TFX IR, then the KFP SDK already has a structure for pipeline persistence.

Sharing and persistence

For component sharing the KFP SDK already has a portable platform-independent structure called ComponentSpec (schema, outline, approximate proto). This structure is usually serialized into component.yaml definition files and we have built a big ecosystem of those components (hundreds of components). This format has existed in stable version from the first release of KFP. We are pledging support for components authored and persisted in this format. See an example of component yaml.

Graph components

While most components are backed by a container, the ComponentSpec structure also allows having a graph implementation. (schema, outline, approximate proto)
This essentially enables 'pipeline-as-component' feature. See an example of an end-to-end pipeline saved as a graph component yaml.

KFP's Python SDK provides a way to

A pipeline saved as a graph component can be easily sent for execution:

my_pipeline_op = kfp.components.load_component_from_url(...)
kfp.Client().create_run_from_pipeline_func(my_pipeline_op, arguments={})

being part of the ComponentSpec format, graph components provide portable platform-independent pipeline persistence format. This format is simple, portable and minimalistic, so it is pretty easy to convert to some other workflow format for execution: Argo, Tekton, Airflow, etc.

The advice from the KFP team is:

If you want to persist a pipeline, the python code is the most supported way for now, but if you want to persist the pipeline in a non-python format, then use the graph component file format.

We'd like the graph component format to be the KFP's logical intermediate pipeline representation.

Open questions:

Adding Backend support for IR/graph-components

This has been considered, but not all stakeholders were on board.
Implementation is doable.

Making Frontend IR-native

Frontend directly works with the workflow status object updated by the orchestrator. Since the orchestrator is Argo, the frontend is currently tied to the Argo WorkflowStatus structure (in addition to the WorkflowSpec structure).

Refactoring the DSL to be based on the Graph Component structures.

This might be a good idea that can improve the capabilities of the create_graph_component_from_pipeline_func function that essentially compiles python pipeline to the IR.

Implementing those major refactorings does not seem to bring immediate improvements for the KFP users and the KFP team might lack the bandwidth to implement the above feature.

We're seeking feedback for improving the pipeline --> graph component transformation code so that more features are available. We're also encouraging the compiler author to consider the graph component format as the intermediate representation they can work with.

@animeshsingh
Copy link
Contributor

animeshsingh commented May 28, 2020

Thanks @Ark-kun
Is this the final word on IR - Using GraphComponentSpec? We were being told there another IR in consideration with TFX team - has that goal been dropped?

If answer is yes to the above question, then this makes sense. Now apart from the issues you mentioned

  1. Adding Backend support for IR/graph-components
  2. Making Frontend IR-native
  3. Refactoring the DSL to be based on the Graph Component structures.

few other things are needed

  1. Pipelines either are persisted in Python, or using this IR, and that's how they are shared e.g. on Google's AIHub
  2. Additionally these are the limitations in the IR
  • Only works on containerOps
  • Will not work on ResourceOp, VolumeOp, VolumeSnapShotOp, ExitOp
  • Conditionals, Loops, and nested pipeline are not yet supported for KFP 0.5
  • executionOptions for adding Kubernetes spec doesn’t seem to work.
  • Features such as input artifacts, runAfter, and timeout are also not part of the Component.yaml spec.

cc @neuromage @paveldournov @jessiezcc

@animeshsingh
Copy link
Contributor

some more details in the slides I used for pipeline community meeting
https://www.slideshare.net/AnimeshSingh/kubeflow-pipelines-with-tekton

@Ark-kun
Copy link
Contributor

Ark-kun commented May 28, 2020

Is this the final word on IR - Using GraphComponentSpec? We were being told there another IR in consideration with TFX team - has that goal been dropped?

No, that goal is not dropped. I guess my answer was ambiguous. I've reworded it.
The information about graph ComponentSpec was only applicable to people who only use KFP SDK and want to persist their pipelines right now. Graph ComponentSpec is only an alternative to inventing a new IR, not an alternative to TFX IR.

The TFX IR is the road forward for the future development. Please try using it.

Pipelines either are persisted in Python, or using this IR, and that's how they are shared e.g. on Google's AIHub

I'm not sure I fully understand what you want to say with this item.

The reason that the sample pipelines use Python is so that they are easier to understand for the users (Python DSL vs YAML). We do not have good editing tools for YAML-based pipelines.

You can upload graph component.yaml files to AI Hub. You can even have zip files with both Argo's pipeline.yaml and component.yaml. load_component_from_url has supported loading them since AI-Hub launch.

Additionally these are the limitations in the IR

I think most of the limitations are not really limitations of the graph ComponentSpec format, but rather limitations of the particular SDK and how it consumes/produces ComponentSpec. If you change the pipeline persistence format, these missing feature implementations won't magically fix themselves.

We're practicing "demand-driven-development" where we implement the features when there are requests for them. Please file the issues so we can learn about the demand. For example, we have an open PR for making use of the Kubernetes options from loaded graph component tasks (getting passing kubernetes_options to ContainerOp). But the PR is not going forward due to lack of demand.

Will not work on ResourceOp, VolumeOp, VolumeSnapShotOp

I'd consider these to be pretty Argo-specific.

Argo implements those as containers that just run kubectl. I think would make the SDK more portable if we change ResourceOp to use an explicit container. I've already created a PR last week to do that for ResourceOp.delete(): https://github.com/kubeflow/pipelines/pull/3841/files ResourceOp.create() will follow.

ExitOp

Per-pipeline ExitOp probably belongs to the PipelineRunSpec, not ComponentSpec.

Conditionals

Conditionals are supported by the format (although not applied during loading). See TaskSpec.is_enabled.

nested pipeline

? The graph ComponentSpec supports that. Every TaskSpec has component_ref which can point to any component - container or graph. The loading should also work (I need to check and add a test for that).

Loops

This is the only real limitation of the ComponentSpec as of now. Loops are not easy to design (especially to be portable). Also loops usually require advanced capabilities for output aggregation which is even harder to design. I have plans to add for-style loop with foreach-style loops (like Argo's withItems) implemented by compilers as syntactic sugar.

executionOptions for adding Kubernetes spec doesn’t seem to work.

This is a missing SDK feature which was moving slowly due to lack of demand. Please create/upvote issues, so that features can be prioritized. See #3447 and #3448

Features such as input artifacts

Can you explain the feature and the limitation? I think ComponentSpec has same or better support for artifacts then even ContainerOp.

runAfter, and timeout

Please create feature request issues. The whole available design is bigger than the implemented parts since we do not want to implement features prematurely until there is demand.

When features are implemented prematurely without first collecting the demand feedback first, the design can be suboptimal and require breaking changes in the future. We're trying to aviod that by keeping the design minimal.

P.S. For some of the features you've listed there are sizeable gaps in DSL -> graph ComponentSpec and graph ComponentSpec -> ContainerOp conversion, but these are not the issues in the format itself. Whatever is used as IR, these gaps will need to be filled.

P.P.S. One reason for some of the DSL -> graph ComponentSpec gaps is that the component library part of KFP (kfp.components) is explicitly independent from the DSL and compiler.

@Ark-kun
Copy link
Contributor

Ark-kun commented May 28, 2020

some more details in the slides I used for pipeline community meeting
https://www.slideshare.net/AnimeshSingh/kubeflow-pipelines-with-tekton

P.S. I really liked your presentation and the visual pipeline editing tool.

@talebzeghmi
Copy link
Author

talebzeghmi commented May 28, 2020

The TFX IR is the road forward for the future development. Please try using it.

@Ark-kun can you please share what the TFX IR is?

I ask because Metaflow are asking for the KFP IR to interface with KFP here Netflix/metaflow#16 (comment)

thanks!

@Ark-kun
Copy link
Contributor

Ark-kun commented May 28, 2020

A clarification:

My initial answer might have been ambiguous. The TFX IR is the recommended way forward for future development. The information about the KFP's graph ComponentSpec was only applicable to people who only use KFP SDK and want to persist their pipelines right now. It was suggested only an alternative to inventing a new IR, not an alternative to TFX IR. The TFX IR is the recommended way forward for future development.

@talebzeghmi
Copy link
Author

Thanks @Ark-kun is the TFX IR in development or ready to share? Are you able to share links?
I ask because I'm interested in having Metaflow compile to the IR and run on KFP.
thank you!

@rmgogogo rmgogogo self-assigned this May 29, 2020
@rmgogogo
Copy link
Contributor

Thanks @Ark-kun is the TFX IR in development or ready to share? Are you able to share links?
I ask because I'm interested in having Metaflow compile to the IR and run on KFP.
thank you!

@zhitaoli on TFX IR.

We are actively discussing this topic now and should can provide an initial update here around middle of next week. It's possible we may define a layered IR (core and different extensions).

@animeshsingh
Copy link
Contributor

Thanks @Ark-kun for clarification. So established that TFX IR is the future.

@rmgogogo looking forward to the update

@rmgogogo
Copy link
Contributor

rmgogogo commented Jun 3, 2020

The first initial checkin for IR is here.
https://github.com/tensorflow/tfx/blob/master/tfx/proto/orchestration/pipeline.proto#L318

@zhitaoli to correct me.

As for how KFP side do changes correspondingly, I'm still evaluating more details. It's a big change and may worth a version number 2.0.

My current thought / design goal is

  1. decouple orchestrator (e.x. Argo) from pipeline authoring (SDK) via IR.

The IR is expected to abstract the K8s concepts (e.x. PVC, ConfigMap etc.).
So that the IR can be quick tested in other env (e.x. local/dev env without a K8s cluster).

  1. decouple orchestrator (e.x. Argo) from front-end / visualizer

Currently our FE has many K8s/Argo concepts while many data can be fetched from file-system and normally indexed by MLMD. Alexey also mentioned same in previous reply: "Making Frontend IR-native". I may extend it as "Making Frontend IR & MLMD native". So if Tekton solution proposed by Animesh can generates the same data, the new visualizer should can work.

Welcome more inputs.

@jlewi
Copy link
Contributor

jlewi commented Jun 3, 2020

@rmgogogo or @zhitaoli is there a corresponding RFC/doc that describes the thought behind the IR in more details?

@rmgogogo @talebzeghmi What are the implications of using proto as opposed to say OpenAPI/Swagger as the IDL?

The IR is expected to abstract the K8s concepts (e.x. PVC, ConfigMap etc.).

What does this mean in terms of how K8s concepts get surfaced? e.g. if people want to be able to attach PVCs do we first need to define a suitable abstraction in the IR?

/cc @animeshsingh

@zhitaoli
Copy link

zhitaoli commented Jun 3, 2020 via email

@animeshsingh
Copy link
Contributor

animeshsingh commented Jun 3, 2020

Thanks @rmgogogo and @zhitaoli - great news! Will dive deeper into the IDL.

Vis a vis OpenAPI/Swagger would be perfect, but proto is not a deal breaker here. What maybe can be am issue is if we are not able to surface k8s constructs (PVC, ConfigMap, and there quite a few more in KFP DSL) either through core IDL, or an IDL extension for folks running on Kube, which is what more than half of the enterprises are using in production right now.

Our efforts below to map Kubeflow Pipeline DSL to Tekton backend enlist all the DSL functionalities we have to implement for Tekton, and we would expect IDL to be able to capture those from DSL, so that they can relayed back to Argo or Tetkon.

https://github.com/kubeflow/kfp-tekton/blob/master/sdk/FEATURES.md

And again, the more collaborative we can be on this effort, better for the project, so moving an IR proposal as quickly as possible to have community align it help out will work to everyone's mutual advantage here.

@rmgogogo
Copy link
Contributor

rmgogogo commented Jun 5, 2020

"which is what more than half of the enterprises are using in production right now."

+1, it's important from runtime perspective.

From ML perspective, here is one worth a check to understand more of @zhitaoli 's previous replies.
tensorflow/community#253

@animeshsingh
Copy link
Contributor

@rmgogogo @zhitaoli any update on the IR RFC/doc?

@rmgogogo
Copy link
Contributor

rmgogogo commented Jun 29, 2020

@rmgogogo @zhitaoli any update on the IR RFC/doc?

@hongye-sun (didn't find Ruoyu's Github account)

Some high level info around how we plan to provide an IR and support it.

  • each step in a pipeline will have an execution spec
  • execution spec can be
    (a) container based (OK to be run in Docker without K8s)
    (b) K8s based (has K8s specific fields comparing with container based)
    (c) python class based (for local/special-base-container-image)
    (d) customized (e.x. calls GCP APIs)
  • frontend compiler: DSL codes => IR
  • backend compiler: IR => specific env's spec (e.x. Argo yaml, Tekton yaml, special CRD's yaml or special API calls)
  • It's suggested to runner that better rely on MLMD for input/output etc. data tracking and also good for common visualization etc.

@rmgogogo
Copy link
Contributor

As for Tekton runner, one big difference between Argo is that it can run multiple steps in one pod so that it can share common setup, e.x. secret, volume. Any others I missed? I'm thinking on whether we bring this concept to IR but would like to learn more Tekton benefits.

In ML pipeline, actually put multiple steps to one pod may has its disadvantage, e.x. one step requires big CPU/MEM/GPU while others don't. So better don't put it to one pod. In Tekton it has to be in two pods but it also means loss the benefit of using Tekton differentiated feature.

(BTW, it's also possible we impl another orchestration engine, not on Argo or Tekton but more MLMD friendly as MLMD itself is highly related with orchestration. It's more like long term plan. In first step (Q3/Q4), we may still do delta changes based on Argo.)

@jlewi
Copy link
Contributor

jlewi commented Jun 29, 2020

@rmgogogo For the execution spec; is there a proto or OpenAPI Spec somewhere that indicates what this might look like?

@jlewi
Copy link
Contributor

jlewi commented Jul 21, 2020

@rmgogogo ping?

@stale
Copy link

stale bot commented Oct 19, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Oct 19, 2020
@rmgogogo
Copy link
Contributor

Hongye's PR: Pipeline IR #4371 contains the detail info.

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Oct 21, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Oct 22, 2020

/lifecycle frozen

@RobbeSneyders
Copy link

Is there still an ongoing effort to get the IR YAML format supported by different SDKs and execution engines? And if so, where could I find the status?

We are now compiling Fondant pipelines to IR YAML, partially relying on the KfP SDK, which allows us to run them on both KfP and Vertex AI. We would be interested in the ability to execute Fondant pipelines on more execution engines leveraging IR YAML.

We have also implemented a simple LocalRunner based on Docker Compose. We currently compile to Docker Compose directly, but if there's wider support for IR YAML, we would be interested to use it as an intermediate representation and then compile to Docker Compose from there. This could lead to a Fondant SDK and simple Docker Compose execution engine for IR YAML.

@RobbeSneyders
Copy link

@Ark-kun could you tell me where I can find the latest status on this? See my message above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/discussion lifecycle/frozen status/triaged Whether the issue has been explicitly triaged
Projects
None yet
Development

No branches or pull requests