Add backend support for Kubernetes secrets environment variables #2715

kiersten-stokes · 2022-05-06T17:38:57Z

Provides backend support for #2674

Frontend piece will be delivered when RJSF support is available (#2464)

What changes were proposed in this pull request?l

These changes add support for Kubernetes-secret-backed environment variables. This PR is currently working off the assumption that secrets will be input as a string in a strict format shown below:

ENV_VAR=secret_name:secret_key

Some backend changes will be required when the frontend piece is delivered.

First, a "kubernetes_secret" list (StringArrayControl) property is added to both the pipeline and generic node properties. Behavior works similarly to environment variables and mounted volumes: the property is treated as a KeyValueList type after pipeline submission, propagating the pipeline default values when applicable.

One additional consideration is necessary for Kubernetes secrets. A pipeline may need to be executed both locally and remotely. In the case of a local submission, any Kubernetes secrets-based variables will never be accessed. Instead, the corresponding variable in the Environment Variable property should have a value assigned. In this case of remote execution, however, any environment variable that has a value in both the Environment Variables property and the Kubernetes Secrets property will defer to using the Kubernetes secret. The corresponding Environment Variable key/value pair will be stripped from the list. This avoids sending any sensitive information for remote execution, but at the same time allows for local execution of the same files if desired (as long as the variable is set appropriately in both locations). This occurs during propagate_pipeline_default_properties.

After the relevant properties are converted to KeyValueList types and propagated from pipeline-level to node-level, each secret and volume is then further converted to their well-defined types: KubernetesSecret and VolumeMount. This is done in convert_data_class_properties on the Node object of a PipelineDefinition. No validation is performed in this function; the attributes are simply propagated to their associated types. (This function can perhaps be made less explicit when the pipeline JSON structure changes as a result of the new RJSF editor. E.g. maybe we can add metadata to the properties templates that indicates which data class to use and properties can be fed directly to their constructor without the need to parse). Doing this conversion in the construction of the PipelineDefinition object will allow for them to be accessed from the pipeline CLI (specifically the describe command) in a follow-up PR.

During processing, attributes of these new data classes are now accessed directly. The KubernetesSecret data class object has properties of with env_var, name, and key attributes. These attributes are converted to the appropriate form for each runtime type (in kfp/operator.py init for KFP and in the DAG template for Airflow). The VolumeMount data class object has properties path and pvc_name attributes is returned and accessed appropriately during processing.

Finally, validation checks have been added to the validation service that check for valid Kubernetes resource names (for PVC names in volumes and secret names in secrets) and valid Kubernetes secret key format. This has the benefit that we are catching these invalid values early on before any more expensive processing occurs.

Documentation has been updated to reflect the new property.

How was this pull request tested?

Add test to validation service to check volumes and secrets for valid values
Remove test_processor_base anymore?

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

elyra-bot · 2022-05-06T17:38:59Z

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link:

ptitzler · 2022-05-06T18:12:00Z

elyra/kfp/operator.py

@@ -106,6 +109,7 @@ def __init__(
          gpu_limit: maximum number of GPUs allowed for the operation
          workflow_engine: Kubeflow workflow engine, defaults to 'argo'
          volume_mounts: data volumes to be mounted
+          kubernetes_secrets: secrets to be made available as environment variables


Kubernetes secrets can also be mounted as volumes. In case we also support that in a future release we should probably pick a more discriminating variable name now that makes it possible to distinguish between secrets that are exposed as an env variable and secrets that are mounted as a volume.

When "mounted as a volume" are those particular, secret-based, PV definitions differentiated from regular PV definitions? That is, do we need to have an additional "type" of mount or secret?

I agree that a distinguishing name for the env-based field is probably warranted. Just curious about how distinguishing things are.

Yeah totally agree about the distinguishing names - I was thinking that during the discussion yesterday.

When "mounted as a volume" are those particular, secret-based, PV definitions differentiated from regular PV definitions? That is, do we need to have an additional "type" of mount or secret?

I believe the two different types are consumed/accessed in completely different ways, making an additional type unnecessary. But I may be misunderstanding, so I'd defer to @ptitzler expertise

In the running container this is exposed like other volumes are. afaik the consumer does not have any awareness what the source is, just like the consumer doesn't know wether an env variable value was derived from a constant or a kubernetes resource/secret. I haven't tested what happens if one tries to mount a volume and a secret to the same mount point in the container, but would expect it to fail.

elyra/pipeline/processor.py

kevin-bates · 2022-05-06T18:48:18Z

This PR is currently working off the assumption that secrets will be input as a string in a strict format shown below:
ENV_VAR_NAME=secret_name:secret_key
Since this is not a long-term solution, some backend changes will be required when the frontend piece is delivered.

Could we say that this format is the "contract" that when the front-end finalizes its changes, it is responsible for producing a list of strings in this format? That way, no back-end changes should be required.

elyra/pipeline/pipeline.py

kiersten-stokes · 2022-05-06T19:10:11Z

Could we say that this format is the "contract" that when the front-end finalizes its changes, it is responsible for producing a list of strings in this format? That way, no back-end changes should be required.

I remember when this came up in discussion that there was some pushback on this, but I can't remember why exactly. One thing that we were worried about is the case where a colon would appear in the secret name or key, but I believe Kubernetes requires that both of those values are completely alphanumeric, so I don't think that would be an issue.

kevin-bates · 2022-05-06T19:19:05Z

Yeah, all I'm saying is that how these are displayed - whether they be 3 separate fields (preferred) or not, is independent of the format in which they are sent to the server and that the server should dictate that format. If we dictate the format to be an "encoded" string, then we should certainly pick "separators" that can't appear in the sub-strings - which seems to be the case. I think we're good.

elyra/pipeline/processor.py

elyra/templates/components/generic_properties_template.jinja2

elyra/templates/pipeline/pipeline_properties_template.jinja2

elyra/pipeline/pipeline_definition.py

docs/source/user_guide/pipelines.md

ptitzler · 2022-05-24T17:59:16Z

We should also add a short motivation to https://elyra.readthedocs.io/en/latest/user_guide/best-practices-file-based-nodes.html, indicating that secrets should be used to make sensitive information available to nodes because their values are not exposed in the pipeline file, the pipeline editor, or the runtime environment.

elyra/pipeline/pipeline.py

kevin-bates

Looks good - thanks @kiersten-stokes

ptitzler

LGTM

kiersten-stokes added 2 commits May 6, 2022 11:23

Add backend support for Kubernetes secrets-based env vars

b723a1c

Fix server tests

7b66723

kiersten-stokes added this to the 3.9.0 milestone May 6, 2022

kiersten-stokes added the impact:needs doc updates This PR requires follow up PR(s) for documentation and/or examples label May 6, 2022

ptitzler reviewed May 6, 2022

View reviewed changes

elyra/pipeline/processor.py Outdated Show resolved Hide resolved

kevin-bates reviewed May 6, 2022

View reviewed changes

elyra/pipeline/pipeline.py Outdated Show resolved Hide resolved

ptitzler added the kind:enhancement New feature or request label May 6, 2022

kiersten-stokes added 3 commits May 6, 2022 14:31

Address review: docstring and resource name-checking

7e0592e

Update snapshots

2a36791

Fix lint

6568f5c

ptitzler reviewed May 6, 2022

View reviewed changes

elyra/pipeline/processor.py Outdated Show resolved Hide resolved

ptitzler mentioned this pull request May 6, 2022

Extend 'elyra-pipeline describe' to identify kubernetes secrets as a dependency #2716

Closed

kiersten-stokes changed the title ~~Add backend support for Kubernetes secrets environment variables~~ Add support for Kubernetes secrets environment variables May 9, 2022

kiersten-stokes marked this pull request as draft May 10, 2022 16:07

ptitzler mentioned this pull request May 11, 2022

Elyra pipeline describe enhancements #2722

Merged

kiersten-stokes changed the title ~~Add support for Kubernetes secrets environment variables~~ Add backend support for Kubernetes secrets environment variables May 12, 2022

kiersten-stokes added 4 commits May 17, 2022 18:21

Define data classes for secrets and mounted volumes

e2fcc3b

Update tests

b954d47

Add test for secret formatting

bc09462

Refactor calls to get mounts/secrets into static methods

029d2d6

kiersten-stokes marked this pull request as ready for review May 18, 2022 22:27

ptitzler requested review from ptitzler and kevin-bates May 19, 2022 16:53

ptitzler reviewed May 19, 2022

View reviewed changes

elyra/templates/components/generic_properties_template.jinja2 Outdated Show resolved Hide resolved

ptitzler reviewed May 19, 2022

View reviewed changes

elyra/templates/pipeline/pipeline_properties_template.jinja2 Outdated Show resolved Hide resolved

Coerce volume and secret kv-lists into lists of objects early

8daa10a

kevin-bates reviewed May 24, 2022

View reviewed changes

elyra/pipeline/pipeline_definition.py Outdated Show resolved Hide resolved

kiersten-stokes added 2 commits May 24, 2022 11:19

Refactor convert_kv_properties to return if already converted

e658423

Update property template descriptions

a320e16

kevin-bates reviewed May 24, 2022

View reviewed changes

elyra/pipeline/pipeline_definition.py Outdated Show resolved Hide resolved

Update docs with info on secrets

5ac3f6e

kevin-bates reviewed May 24, 2022

View reviewed changes

elyra/pipeline/pipeline_definition.py Outdated Show resolved Hide resolved

kiersten-stokes added 2 commits May 24, 2022 11:39

Add info about validation of secrets and volumes to docs

7d28e11

Update properties json resource to reflect template update

121cd85

kiersten-stokes removed the impact:needs doc updates This PR requires follow up PR(s) for documentation and/or examples label May 24, 2022

ptitzler reviewed May 24, 2022

View reviewed changes

docs/source/user_guide/pipelines.md Outdated Show resolved Hide resolved

ptitzler reviewed May 24, 2022

View reviewed changes

docs/source/user_guide/pipelines.md Outdated Show resolved Hide resolved

ptitzler reviewed May 24, 2022

View reviewed changes

docs/source/user_guide/pipelines.md Outdated Show resolved Hide resolved

kiersten-stokes added 3 commits May 24, 2022 15:41

Address code review

798a419

Address documentation review

f24ca9b

Fix lint

50a228f

kiersten-stokes requested a review from ptitzler May 24, 2022 21:24

Remove test_processor_base

c809f58

ptitzler mentioned this pull request May 25, 2022

Syntax for mounting PVC Volumes is confusing #2738

Closed

kiersten-stokes added 2 commits May 25, 2022 09:57

Add test for env vars and secret duplicate keys

dec4a8b

Make documentation topic more action-oriented

2e6924b

kevin-bates reviewed May 25, 2022

View reviewed changes

elyra/pipeline/pipeline.py Outdated Show resolved Hide resolved

Adjust import statements for dataclasses

9ad4fff

kevin-bates approved these changes May 25, 2022

View reviewed changes

ptitzler approved these changes May 25, 2022

View reviewed changes

ptitzler merged commit 51176f4 into elyra-ai:master May 25, 2022

kiersten-stokes mentioned this pull request May 26, 2022

Fix processing of node properties for pipeline app commands #2740

Merged

ptitzler mentioned this pull request May 26, 2022

Confirm that env_vars contains no keys before writing pipelines #2674

Closed

ptitzler linked an issue Jun 6, 2022 that may be closed by this pull request

Confirm that env_vars contains no keys before writing pipelines #2674

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add backend support for Kubernetes secrets environment variables #2715

Add backend support for Kubernetes secrets environment variables #2715

kiersten-stokes commented May 6, 2022 •

edited

Loading

elyra-bot bot commented May 6, 2022

ptitzler May 6, 2022

kevin-bates May 6, 2022

kiersten-stokes May 6, 2022

ptitzler May 6, 2022

kevin-bates commented May 6, 2022

kiersten-stokes commented May 6, 2022

kevin-bates commented May 6, 2022

ptitzler commented May 24, 2022

kevin-bates left a comment

ptitzler left a comment

Add backend support for Kubernetes secrets environment variables #2715

Add backend support for Kubernetes secrets environment variables #2715

Conversation

kiersten-stokes commented May 6, 2022 • edited Loading

What changes were proposed in this pull request?l

How was this pull request tested?

elyra-bot bot commented May 6, 2022

ptitzler May 6, 2022

Choose a reason for hiding this comment

kevin-bates May 6, 2022

Choose a reason for hiding this comment

kiersten-stokes May 6, 2022

Choose a reason for hiding this comment

ptitzler May 6, 2022

Choose a reason for hiding this comment

kevin-bates commented May 6, 2022

kiersten-stokes commented May 6, 2022

kevin-bates commented May 6, 2022

ptitzler commented May 24, 2022

kevin-bates left a comment

Choose a reason for hiding this comment

ptitzler left a comment

Choose a reason for hiding this comment

kiersten-stokes commented May 6, 2022 •

edited

Loading