Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support data exchange between custom & generic components via volumes #2799

Merged
merged 27 commits into from
Jul 6, 2022

Conversation

kiersten-stokes
Copy link
Member

@kiersten-stokes kiersten-stokes commented Jun 24, 2022

Fixes #2760

What changes were proposed in this pull request?

While working through #2785, an issue was encountered that complicated things to a degree that made that approach almost useless. Long story short, the elyra_ prefix that we use to denote which properties are 'component parameters' or 'pipeline default' properties was affecting both how the component properties appeared in the GUI and how they were processed. In this PR, a different approach is taken, as described below.

The mounted_volumes property is added to the canvas properties for custom components. It will exist as a child within the component_parameters stanza of the pipeline JSON, which is the same as how this property is handled for generic components. Processing proceeds the same for both generic and custom components in the case where the custom component does not define a property of the same id (mounted_volumes) in its component definition.

A few additional safeguards are required to handle the case where a property id collision does occur (e.g. the component defines its own property called mounted_volumes). First, a new keyword arg is passed to the properties template for custom components that tells the template not to render any Elyra-owned properties that have the same id as any component-defined properties that are parsed from the definition. Similar logic is added to pipeline_definition.py that checks for any property id collisions and does not perform certain logic (property propagation from pipeline defaults, conversion to relevant dataclass objects, etc.) if the property is determined to be component-defined. Any value entered for a component-defined property of the same id should not be touched, but rather passed directly to the component constructor (via the DAG or the ContainerOp object).

Much of the above paragraph could be simplified or removed should we decide that we want to do a pipeline migration, in which case we could change the id for the built-in mounted_volumes property to something that we could be nearly sure would not result in a collision of property ids (though I suppose this could never be guaranteed so perhaps we wouldn't want to remove the safeguards?).

A new mounted_volumes property is also added to the Operation class. A kubernetes_secret property is added to the GenericOperation class as well just to make processing more straightforward.

Finally, logic was added to both processor_kfp.py and processor_airflow.py to handle the addition of volumes for custom components. I also refactored how the Airflow processor handles volumes and secrets in order to simplify the DAG template. Now, all volumes and secrets are rendered by functions defined in processor_airflow.py and passed to the template env. (I think we could also further re-factor this in a future PR to make the DAG template even more straightforward).

Documentation has been updated to reflect the availability of volumes on custom components.

How was this pull request tested?

Existing server tests have been updated. Tests have been added to cover the following scenarios:

  1. Ensure that a component that defines its own property called mounted_volumes results in the appropriate properties JSON (i.e., skips the system-defined mounted_volumes property and only defines its own)
  2. Ensure that a pipeline with custom components and mounted volumes defined has its properties propagated appropriately (propagation works the same as with generic nodes)
  3. Ensure that a custom component that defines its own property called mounted_volumes does not go through the processing steps that the system-defined property of the same name does (propagation, conversion to dataclass, etc.).

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

@kiersten-stokes kiersten-stokes added this to the 3.10.0 milestone Jun 24, 2022
@elyra-bot
Copy link

elyra-bot bot commented Jun 24, 2022

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

@kevin-bates
Copy link
Member

kevin-bates commented Jun 24, 2022

@kiersten-stokes - the description is excellent and extremely helpful - thank you.

Much of the above paragraph could be simplified or removed should we decide that we want to do a pipeline migration, in which case we could change the id for the built-in mounted_volumes property to something that we could be nearly sure would not result in a collision of property ids (though I suppose this could never be guaranteed so perhaps we wouldn't want to remove the safeguards?).

I view things like mounted_volumes, kubernetes_secrets, env_vars, etc., as system-owned properties. If we were to reserve the elyra_ prefix as an indicator of system-owned properties and not only apply the prefix in a temporal manner (as we currently do) but also make it permanent, would that be sufficient to move forward, albeit at the cost of a migration? (I'm assuming our system-owned properties have corresponding display names so the elyra_ wouldn't appear in various UI dialogs.)

Seems like we could then easily catch custom component properties that are prefixed by elyra_ and either reject the component or tolerate its existence by temporarily prefixing with another (system-defined) prefix so that that property doesn't collide.

@kiersten-stokes
Copy link
Member Author

kiersten-stokes commented Jun 24, 2022

I view things like mounted_volumes, kubernetes_secrets, env_vars, etc., as system-owned properties. If we were to reserve the elyra_ prefix as an indicator of system-owned properties and not only apply the prefix in a temporal manner (as we currently do) and make it permanent, would that be sufficient to move forward, albeit at the cost of a migration?

Just to clarify, do you mean adding the elyra_ prefix to all component_parameters (including system owned) and then another elyra_ prefix to those system-owned properties? E.g., for the purposes of the canvas template, mounted_volumes would become elyra_elyra_mounted_volumes and then for the purposes of pipeline processing, the MOUNTED_VOLUMESconstant would become elyra_mounted_volumes?

I think that would be all that's needed! And I would imagine it would be fairly easy migration code too (I may even be able to do it myself 😄 ).

Of course, it sounds like we can get rid of that "first" elyra_ prefix pending some RJSF changes in the future.

(I'm assuming our system-owned properties have corresponding display names so the elyra_ wouldn't appear in various UI dialogs.)

Correct! The property id (called the ref in the canvas properties) is never displayed.

@kevin-bates
Copy link
Member

Just to clarify, do you mean adding the elyra_ prefix to all component_parameters (including system owned) and then another elyra_ prefix to those system-owned properties?

No, only prefix system-owned properties. This way, elyra_ or whatever we choose as the prefix is the indicator that we control that property.

@kiersten-stokes kiersten-stokes marked this pull request as ready for review June 29, 2022 15:36
@akchinSTC akchinSTC self-requested a review June 30, 2022 20:21
@ptitzler ptitzler self-requested a review July 1, 2022 14:07
@ptitzler ptitzler added kind:enhancement New feature or request component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines labels Jul 1, 2022
@kiersten-stokes kiersten-stokes changed the title Support volumes for custom component (attempt 2) Support data exchange between custom & generic components via volumes Jul 1, 2022
@ptitzler ptitzler self-requested a review July 1, 2022 19:30
@ptitzler
Copy link
Member

ptitzler commented Jul 1, 2022

  • Documentation is looking good
  • Kubeflow Pipelines test completed (including CLI validation)
  • Apache Airflow test in progress ...

@kiersten-stokes kiersten-stokes force-pushed the custom-component-volumes-2 branch from a1ff622 to 808f0c4 Compare July 5, 2022 23:56
Copy link
Member

@ptitzler ptitzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@akchinSTC akchinSTC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, kfp portion. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines kind:enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mount "data volumes" from "generic nodes defaults" on KFP components
4 participants