Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backend support for Kubernetes secrets environment variables #2715

Merged
merged 27 commits into from
May 25, 2022

Conversation

kiersten-stokes
Copy link
Member

@kiersten-stokes kiersten-stokes commented May 6, 2022

Provides backend support for #2674

Frontend piece will be delivered when RJSF support is available (#2464)

What changes were proposed in this pull request?l

These changes add support for Kubernetes-secret-backed environment variables. This PR is currently working off the assumption that secrets will be input as a string in a strict format shown below:

ENV_VAR=secret_name:secret_key

Some backend changes will be required when the frontend piece is delivered.

First, a "kubernetes_secret" list (StringArrayControl) property is added to both the pipeline and generic node properties. Behavior works similarly to environment variables and mounted volumes: the property is treated as a KeyValueList type after pipeline submission, propagating the pipeline default values when applicable.

One additional consideration is necessary for Kubernetes secrets. A pipeline may need to be executed both locally and remotely. In the case of a local submission, any Kubernetes secrets-based variables will never be accessed. Instead, the corresponding variable in the Environment Variable property should have a value assigned. In this case of remote execution, however, any environment variable that has a value in both the Environment Variables property and the Kubernetes Secrets property will defer to using the Kubernetes secret. The corresponding Environment Variable key/value pair will be stripped from the list. This avoids sending any sensitive information for remote execution, but at the same time allows for local execution of the same files if desired (as long as the variable is set appropriately in both locations). This occurs during propagate_pipeline_default_properties.

After the relevant properties are converted to KeyValueList types and propagated from pipeline-level to node-level, each secret and volume is then further converted to their well-defined types: KubernetesSecret and VolumeMount. This is done in convert_data_class_properties on the Node object of a PipelineDefinition. No validation is performed in this function; the attributes are simply propagated to their associated types. (This function can perhaps be made less explicit when the pipeline JSON structure changes as a result of the new RJSF editor. E.g. maybe we can add metadata to the properties templates that indicates which data class to use and properties can be fed directly to their constructor without the need to parse). Doing this conversion in the construction of the PipelineDefinition object will allow for them to be accessed from the pipeline CLI (specifically the describe command) in a follow-up PR.

During processing, attributes of these new data classes are now accessed directly. The KubernetesSecret data class object has properties of with env_var, name, and key attributes. These attributes are converted to the appropriate form for each runtime type (in kfp/operator.py init for KFP and in the DAG template for Airflow). The VolumeMount data class object has properties path and pvc_name attributes is returned and accessed appropriately during processing.

Finally, validation checks have been added to the validation service that check for valid Kubernetes resource names (for PVC names in volumes and secret names in secrets) and valid Kubernetes secret key format. This has the benefit that we are catching these invalid values early on before any more expensive processing occurs.

Documentation has been updated to reflect the new property.

How was this pull request tested?

  • Add test to validation service to check volumes and secrets for valid values
  • Remove test_processor_base anymore?

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

@kiersten-stokes kiersten-stokes added this to the 3.9.0 milestone May 6, 2022
@elyra-bot
Copy link

elyra-bot bot commented May 6, 2022

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

@kiersten-stokes kiersten-stokes added the impact:needs doc updates This PR requires follow up PR(s) for documentation and/or examples label May 6, 2022
@@ -106,6 +109,7 @@ def __init__(
gpu_limit: maximum number of GPUs allowed for the operation
workflow_engine: Kubeflow workflow engine, defaults to 'argo'
volume_mounts: data volumes to be mounted
kubernetes_secrets: secrets to be made available as environment variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes secrets can also be mounted as volumes. In case we also support that in a future release we should probably pick a more discriminating variable name now that makes it possible to distinguish between secrets that are exposed as an env variable and secrets that are mounted as a volume.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When "mounted as a volume" are those particular, secret-based, PV definitions differentiated from regular PV definitions? That is, do we need to have an additional "type" of mount or secret?

I agree that a distinguishing name for the env-based field is probably warranted. Just curious about how distinguishing things are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah totally agree about the distinguishing names - I was thinking that during the discussion yesterday.

When "mounted as a volume" are those particular, secret-based, PV definitions differentiated from regular PV definitions? That is, do we need to have an additional "type" of mount or secret?

I believe the two different types are consumed/accessed in completely different ways, making an additional type unnecessary. But I may be misunderstanding, so I'd defer to @ptitzler expertise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the running container this is exposed like other volumes are. afaik the consumer does not have any awareness what the source is, just like the consumer doesn't know wether an env variable value was derived from a constant or a kubernetes resource/secret. I haven't tested what happens if one tries to mount a volume and a secret to the same mount point in the container, but would expect it to fail.

@kevin-bates
Copy link
Member

This PR is currently working off the assumption that secrets will be input as a string in a strict format shown below:

ENV_VAR_NAME=secret_name:secret_key

Since this is not a long-term solution, some backend changes will be required when the frontend piece is delivered.

Could we say that this format is the "contract" that when the front-end finalizes its changes, it is responsible for producing a list of strings in this format? That way, no back-end changes should be required.

@kiersten-stokes
Copy link
Member Author

Could we say that this format is the "contract" that when the front-end finalizes its changes, it is responsible for producing a list of strings in this format? That way, no back-end changes should be required.

I remember when this came up in discussion that there was some pushback on this, but I can't remember why exactly. One thing that we were worried about is the case where a colon would appear in the secret name or key, but I believe Kubernetes requires that both of those values are completely alphanumeric, so I don't think that would be an issue.

@ptitzler ptitzler added the kind:enhancement New feature or request label May 6, 2022
@kevin-bates
Copy link
Member

Yeah, all I'm saying is that how these are displayed - whether they be 3 separate fields (preferred) or not, is independent of the format in which they are sent to the server and that the server should dictate that format. If we dictate the format to be an "encoded" string, then we should certainly pick "separators" that can't appear in the sub-strings - which seems to be the case. I think we're good.

@kiersten-stokes kiersten-stokes changed the title Add backend support for Kubernetes secrets environment variables Add support for Kubernetes secrets environment variables May 9, 2022
@kiersten-stokes kiersten-stokes marked this pull request as draft May 10, 2022 16:07
@kiersten-stokes kiersten-stokes changed the title Add support for Kubernetes secrets environment variables Add backend support for Kubernetes secrets environment variables May 12, 2022
@kiersten-stokes kiersten-stokes marked this pull request as ready for review May 18, 2022 22:27
@kiersten-stokes kiersten-stokes removed the impact:needs doc updates This PR requires follow up PR(s) for documentation and/or examples label May 24, 2022
@ptitzler
Copy link
Member

We should also add a short motivation to https://elyra.readthedocs.io/en/latest/user_guide/best-practices-file-based-nodes.html, indicating that secrets should be used to make sensitive information available to nodes because their values are not exposed in the pipeline file, the pipeline editor, or the runtime environment.

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - thanks @kiersten-stokes

Copy link
Member

@ptitzler ptitzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confirm that env_vars contains no keys before writing pipelines
3 participants