Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support disabling node output caching for custom KFP components #2905

Merged
merged 7 commits into from
Aug 26, 2022

Conversation

ptitzler
Copy link
Member

@ptitzler ptitzler commented Aug 24, 2022

Pipeline nodes produce output, such as files. Some runtime environments support caching of these outputs, eliminating the need to re-execute nodes, which can improve performance and reduce resource usage. If a node does not produce output in a deterministic way - that is given the same inputs the generated output is different - re-using the output from previous executions might lead to unexpected results.

This PR enables users to optionally disable caching for nodes for Kubeflow Pipelines pipelines. Caching can be disabled by defining a pipeline default or by explicitly disabling it for individual nodes.

Example: Kubeflow Central Dashboard: cached output was re-used
image

Example: Kubeflow Central Dashboard: cached output was not re-used
image

Closes #2894

What changes were proposed in this pull request?

  • Add new pipeline default property in new section 'custom node defaults'

    image
  • Add new node property for custom KFP components

  • Updated relevant documentation in pipelines documentation topic

Known limitations:

  • User cannot explicitly choose "use runtime environment default behavior". Once a user checks the 'disable' box they are locked in and can only choose between enable and disable (Add support for three-valued BooleanControl #2906)
  • Pipeline default property is displayed in generic pipeline editor and airflow pipeline editor even though it is not supported in those runtime environments

How was this pull request tested?

  • Manual testing:
    • caching not disabled (pipeline default and node property)
    • caching disabled (pipeline default) and not overridden for individual nodes
    • caching disabled (pipeline default) and but overridden for individual nodes
    • caching not disabled (pipeline default) but disabled for individual nodes
  • As part of each test the pod metadata and the task status was reviewed
  • Updated existing tests
  • Reviewed output of make docs (user_guide/pipelines.html)

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

@ptitzler ptitzler added kind:enhancement New feature or request component:pipeline-editor pipeline editor platform: pipeline-Kubeflow Related to usage of Kubeflow Pipelines as pipeline runtime labels Aug 24, 2022
@ptitzler ptitzler added this to the 3.12.0 milestone Aug 24, 2022
@elyra-bot
Copy link

elyra-bot bot commented Aug 24, 2022

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

@ptitzler
Copy link
Member Author

The property labels and descriptions still require editing for clarity.

@akchinSTC akchinSTC self-requested a review August 24, 2022 15:33
Copy link
Member

@akchinSTC akchinSTC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified cache and annotations set correctly, some small nits =)

image
image

docs/source/user_guide/pipelines.md Outdated Show resolved Hide resolved
docs/source/user_guide/pipelines.md Outdated Show resolved Hide resolved
elyra/pipeline/kfp/processor_kfp.py Show resolved Hide resolved
Copy link
Member

@kiersten-stokes kiersten-stokes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! All scenarios testing as expected (with the known limitation re: the Boolean handling in mind)

Pipeline default property is displayed in generic pipeline editor and airflow pipeline editor even though it is not supported in those runtime environments

An upcoming PR of mine should address that - this will be a good test of that PR's functionality 🙂

@ptitzler
Copy link
Member Author

Final doc updates are done.

@akchinSTC akchinSTC merged commit ea16958 into elyra-ai:main Aug 26, 2022
@ptitzler ptitzler deleted the configure-kfp-cache branch August 27, 2022 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:pipeline-editor pipeline editor kind:enhancement New feature or request platform: pipeline-Kubeflow Related to usage of Kubeflow Pipelines as pipeline runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline editor: allow for enabling/disabling of custom component node caching for Kubeflow Pipelines
3 participants