Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backend support for Kubernetes secrets environment variables #2715

Merged
merged 27 commits into from
May 25, 2022
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
b723a1c
Add backend support for Kubernetes secrets-based env vars
kiersten-stokes May 6, 2022
7b66723
Fix server tests
kiersten-stokes May 6, 2022
7e0592e
Address review: docstring and resource name-checking
kiersten-stokes May 6, 2022
2a36791
Update snapshots
kiersten-stokes May 6, 2022
6568f5c
Fix lint
kiersten-stokes May 6, 2022
e2fcc3b
Define data classes for secrets and mounted volumes
kiersten-stokes May 17, 2022
b954d47
Update tests
kiersten-stokes May 17, 2022
bc09462
Add test for secret formatting
kiersten-stokes May 18, 2022
029d2d6
Refactor calls to get mounts/secrets into static methods
kiersten-stokes May 18, 2022
ed5eb35
Standardize env_var as the placeholder for env var names
kiersten-stokes May 19, 2022
4de9bd6
Add new function to check for the format of secret key
kiersten-stokes May 19, 2022
c49a939
Fix lint
kiersten-stokes May 19, 2022
dc5b06a
Move certain secret and volume logic to validation service
kiersten-stokes May 23, 2022
8fc3645
Update and add tests
kiersten-stokes May 23, 2022
8daa10a
Coerce volume and secret kv-lists into lists of objects early
kiersten-stokes May 23, 2022
e658423
Refactor convert_kv_properties to return if already converted
kiersten-stokes May 24, 2022
a320e16
Update property template descriptions
kiersten-stokes May 24, 2022
5ac3f6e
Update docs with info on secrets
kiersten-stokes May 24, 2022
7d28e11
Add info about validation of secrets and volumes to docs
kiersten-stokes May 24, 2022
121cd85
Update properties json resource to reflect template update
kiersten-stokes May 24, 2022
798a419
Address code review
kiersten-stokes May 24, 2022
f24ca9b
Address documentation review
kiersten-stokes May 24, 2022
50a228f
Fix lint
kiersten-stokes May 24, 2022
c809f58
Remove test_processor_base
kiersten-stokes May 24, 2022
dec4a8b
Add test for env vars and secret duplicate keys
kiersten-stokes May 25, 2022
2e6924b
Make documentation topic more action-oriented
kiersten-stokes May 25, 2022
9ad4fff
Adjust import statements for dataclasses
kiersten-stokes May 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/source/user_guide/best-practices-file-based-nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ The Visual Pipeline Editor can detect which environment variables notebooks/scri

Refer to the next section for a list of proprietary environment variables that cannot be modified using the node properties settings.

If using environment variables containing sensitive information, refer to the [Handling sensitive information topic](#handling-sensitive-information) for details.

### Proprietary environment variables

Elyra makes a set of proprietary environment variables available to notebooks and scripts during execution. Unless indicated otherwise, these variables are defined in all runtime environments.
Expand All @@ -104,3 +106,8 @@ notebook or script is executed in:
- `local` - JupyterLab
- `kfp` - Kubeflow Pipelines
- `airflow` - Apache Airflow

### Handling sensitive information
The Kubernetes Secrets property can be used to associate environment variable names with secrets, preventing sensitive information from being exposed in the pipeline file, the pipeline editor, and the runtime environment. As with static environment variables, secret-based environment variable values can be set on an individual node and/or defined as pipeline default values and shared across nodes belonging to the same pipeline. A default value can also be overridden for a particular node by redefining the secret for a given variable name in the node properties.

Secrets are ignored when the pipeline is executed locally. For remote execution, if an environment variable was assigned both a static value (via the 'Environment Variables' property) and a Kubernetes secret value, the secret's value is used.
24 changes: 17 additions & 7 deletions docs/source/user_guide/pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,15 @@ The [tutorials](/getting_started/tutorials.md) provide comprehensive step-by-ste
- The value is ignored when the pipeline is executed locally.
- **Environment variables**
- A list of environment variables to be set in the container that executes the Jupyter notebook or script. Format: `ENV_VAR_NAME=value`. Entries that are empty (`ENV_VAR_NAME=`) or malformed are ignored.
- **Data volume**
- A list of [Persistent Volume Claims](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PVC) to be mounted into the container that executes the Jupyter notebook or script. Format: `/mnt/path=existing-pvc-name`. Entries that are empty (`/mnt/path=`) or malformed are ignored.
- **Data volumes**
- A list of [Persistent Volume Claims](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PVC) to be mounted into the container that executes the Jupyter notebook or script. Format: `/mnt/path=existing-pvc-name`. Entries that are empty (`/mnt/path=`) or malformed are ignored. Entries with a PVC name considered to be an [invalid Kubernetes resource name](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names) will raise a validation error after pipeline submission or export.
- The referenced PVCs must exist in the Kubernetes namespace where the generic pipeline nodes are executed.
- Data volumes are not mounted when the pipeline is executed locally.
- **Kubernetes secrets**
- A list of [Kubernetes Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) to be accessed as environment variables during Jupyter notebook or script execution. Format: `ENV_VAR=secret-name:secret-key`. Entries that are empty (`ENV_VAR=`) or malformed are ignored. Entries with a secret name considered to be an [invalid Kubernetes resource name](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names) or with [an invalid secret key](https://kubernetes.io/docs/concepts/configuration/secret/#restriction-names-data) will raise a validation error after pipeline submission or export.
- The referenced secrets must exist in the Kubernetes namespace where the generic pipeline nodes are executed.
- Secrets are ignored when the pipeline is executed locally. For remote execution, if an environment variable was assigned both a static value (via the 'Environment Variables' property) and a Kubernetes secret value, the secret's value is used.


#### Adding nodes

Expand All @@ -102,11 +107,11 @@ The [tutorials](/getting_started/tutorials.md) provide comprehensive step-by-ste

![Add generic components from file browser](../images/user_guide/pipelines/add-components-from-file-browser.gif)

1. Define the dependencies between nodes by connecting them, essentially creating an execution graph.
2. Define the dependencies between nodes by connecting them, essentially creating an execution graph.

![Connect nodes](../images/user_guide/pipelines/connect-nodes.gif)

1. Define the runtime properties for each node. Highlight a node, right click, and select `Open Properties`. Runtime properties configure a component and govern its execution behavior.
3. Define the runtime properties for each node. Highlight a node, right click, and select `Open Properties`. Runtime properties configure a component and govern its execution behavior.

![Configure node](../images/user_guide/pipelines/configure-node.gif)

Expand Down Expand Up @@ -136,15 +141,20 @@ The [tutorials](/getting_started/tutorials.md) provide comprehensive step-by-ste
- Example: `data/*.csv`

**Data Volumes**
- Optional. A list of [Persistent Volume Claims](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PVC) to be mounted into the container that executes the Jupyter notebook or script. Format: `/mnt/path=existing-pvc-name`. Entries that are empty (`/mnt/path=`) or malformed are ignored. The referenced PVCs must exist in the Kubernetes namespace where the node is executed.
- Optional. A list of [Persistent Volume Claims](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PVC) to be mounted into the container that executes the Jupyter notebook or script. Format: `/mnt/path=existing-pvc-name`. Entries that are empty (`/mnt/path=`) or malformed are ignored. Entries with a PVC name considered to be an [invalid Kubernetes resource name](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names) will raise a validation error after pipeline submission or export. The referenced PVCs must exist in the Kubernetes namespace where the node is executed.
- Data volumes are not mounted when the pipeline is executed locally.
- Example: `/mnt/vol1=data-pvc`

1. Associate each node with a comment to document its purpose.
**Kubernetes Secrets**
- Optional. A list of [Kubernetes Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) to be accessed as environment variables during Jupyter notebook or script execution. Format: `ENV_VAR=secret-name:secret-key`. Entries that are empty (`ENV_VAR=`) or malformed are ignored. Entries with a secret name considered to be an [invalid Kubernetes resource name](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names) or with [an invalid secret key](https://kubernetes.io/docs/concepts/configuration/secret/#restriction-names-data) will raise a validation error after pipeline submission or export. The referenced secrets must exist in the Kubernetes namespace where the generic pipeline nodes are executed.
- Secrets are ignored when the pipeline is executed locally. For remote execution, if an environment variable was assigned both a static value (via the 'Environment Variables' property) and a Kubernetes secret value, the secret's value is used.
- Example: `ENV_VAR=secret-name:secret-key`

5. Associate each node with a comment to document its purpose.

![Add comment to node](../images/user_guide/pipelines/add-comment-to-node.gif)

1. Save the pipeline file.
6. Save the pipeline file.

Note: You can rename the pipeline file in the JupyterLab _File Browser_.

Expand Down
31 changes: 24 additions & 7 deletions elyra/kfp/operator.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,13 @@
from kubernetes.client.models import V1EnvVarSource
from kubernetes.client.models import V1ObjectFieldSelector
from kubernetes.client.models import V1PersistentVolumeClaimVolumeSource
from kubernetes.client.models import V1SecretKeySelector
from kubernetes.client.models import V1Volume
from kubernetes.client.models import V1VolumeMount

from elyra._version import __version__
from elyra.pipeline.pipeline import KubernetesSecret
from elyra.pipeline.pipeline import VolumeMount

"""
The ExecuteFileOp uses a python script to bootstrap the user supplied image with the required dependencies.
Expand Down Expand Up @@ -81,7 +84,8 @@ def __init__(
mem_request: Optional[str] = None,
gpu_limit: Optional[str] = None,
workflow_engine: Optional[str] = "argo",
volume_mounts: Optional[Dict[str, str]] = None,
volume_mounts: Optional[List[VolumeMount]] = None,
kubernetes_secrets: Optional[List[KubernetesSecret]] = None,
**kwargs,
):
"""Create a new instance of ContainerOp.
Expand All @@ -106,6 +110,7 @@ def __init__(
gpu_limit: maximum number of GPUs allowed for the operation
workflow_engine: Kubeflow workflow engine, defaults to 'argo'
volume_mounts: data volumes to be mounted
kubernetes_secrets: secrets to be made available as environment variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes secrets can also be mounted as volumes. In case we also support that in a future release we should probably pick a more discriminating variable name now that makes it possible to distinguish between secrets that are exposed as an env variable and secrets that are mounted as a volume.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When "mounted as a volume" are those particular, secret-based, PV definitions differentiated from regular PV definitions? That is, do we need to have an additional "type" of mount or secret?

I agree that a distinguishing name for the env-based field is probably warranted. Just curious about how distinguishing things are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah totally agree about the distinguishing names - I was thinking that during the discussion yesterday.

When "mounted as a volume" are those particular, secret-based, PV definitions differentiated from regular PV definitions? That is, do we need to have an additional "type" of mount or secret?

I believe the two different types are consumed/accessed in completely different ways, making an additional type unnecessary. But I may be misunderstanding, so I'd defer to @ptitzler expertise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the running container this is exposed like other volumes are. afaik the consumer does not have any awareness what the source is, just like the consumer doesn't know wether an env variable value was derived from a constant or a kubernetes resource/secret. I haven't tested what happens if one tries to mount a volume and a secret to the same mount point in the container, but would expect it to fail.

kwargs: additional key value pairs to pass e.g. name, image, sidecars & is_exit_handler.
See Kubeflow pipelines ContainerOp definition for more parameters or how to use
https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.dsl.html#kfp.dsl.ContainerOp
Expand Down Expand Up @@ -133,6 +138,7 @@ def __init__(
self.mem_request = mem_request
self.gpu_limit = gpu_limit
self.volume_mounts = volume_mounts # optional data volumes to be mounted to the pod
self.kubernetes_secrets = kubernetes_secrets # optional secrets to be made available as env vars

argument_list = []

Expand Down Expand Up @@ -227,23 +233,34 @@ def __init__(
# or this generic operation will fail
if self.volume_mounts:
unique_pvcs = []
for mount_path, pvc_name in self.volume_mounts.items():
if pvc_name not in unique_pvcs:
for volume_mount in self.volume_mounts:
if volume_mount.pvc_name not in unique_pvcs:
self.add_volume(
V1Volume(
name=pvc_name,
persistent_volume_claim=V1PersistentVolumeClaimVolumeSource(claim_name=pvc_name),
name=volume_mount.pvc_name,
persistent_volume_claim=V1PersistentVolumeClaimVolumeSource(
claim_name=volume_mount.pvc_name
),
)
)
unique_pvcs.append(pvc_name)
self.container.add_volume_mount(V1VolumeMount(mount_path=mount_path, name=pvc_name))
unique_pvcs.append(volume_mount.pvc_name)
self.container.add_volume_mount(V1VolumeMount(mount_path=volume_mount.path, name=volume_mount.pvc_name))

# We must deal with the envs after the superclass initialization since these amend the
# container attribute that isn't available until now.
if self.pipeline_envs:
for key, value in self.pipeline_envs.items(): # Convert dict entries to format kfp needs
self.container.add_env_variable(V1EnvVar(name=key, value=value))

if self.kubernetes_secrets:
for secret in self.kubernetes_secrets: # Convert tuple entries to format kfp needs
self.container.add_env_variable(
V1EnvVar(
name=secret.env_var,
value_from=V1EnvVarSource(secret_key_ref=V1SecretKeySelector(name=secret.name, key=secret.key)),
)
)

# If crio volume size is found then assume kubeflow pipelines environment is using CRI-o as
# its container runtime
if self.emptydir_volume_size:
Expand Down
9 changes: 5 additions & 4 deletions elyra/pipeline/airflow/processor_airflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@
from elyra.pipeline.pipeline import Operation
from elyra.pipeline.pipeline import Pipeline
from elyra.pipeline.pipeline_constants import COS_OBJECT_PREFIX
from elyra.pipeline.pipeline_constants import KUBERNETES_SECRETS
from elyra.pipeline.pipeline_constants import MOUNTED_VOLUMES
from elyra.pipeline.processor import PipelineProcessor
from elyra.pipeline.processor import PipelineProcessorResponse
from elyra.pipeline.processor import RuntimePipelineProcessor
Expand Down Expand Up @@ -317,8 +319,6 @@ def _cc_pipeline(self, pipeline: Pipeline, pipeline_name: str, pipeline_instance
outputs=operation.outputs,
)

volume_mounts = self._get_volume_mounts(operation=operation)

target_op = {
"notebook": operation.name,
"id": operation.id,
Expand All @@ -333,7 +333,8 @@ def _cc_pipeline(self, pipeline: Pipeline, pipeline_name: str, pipeline_instance
"operator_source": operation.component_params["filename"],
"is_generic_operator": True,
"doc": operation.doc,
"volume_mounts": volume_mounts,
"volume_mounts": operation.component_params.get(MOUNTED_VOLUMES, []),
"kubernetes_secrets": operation.component_params.get(KUBERNETES_SECRETS, []),
}

if runtime_image_pull_secret is not None:
Expand Down Expand Up @@ -507,7 +508,7 @@ def create_pipeline_file(
python_output = template.render(
operations_list=target_ops,
pipeline_name=pipeline_instance_id,
namespace=user_namespace,
user_namespace=user_namespace,
cos_secret=cos_secret,
kube_config_path=None,
is_paused_upon_creation="False",
Expand Down
7 changes: 4 additions & 3 deletions elyra/pipeline/kfp/processor_kfp.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@
from elyra.pipeline.pipeline import Operation
from elyra.pipeline.pipeline import Pipeline
from elyra.pipeline.pipeline_constants import COS_OBJECT_PREFIX
from elyra.pipeline.pipeline_constants import KUBERNETES_SECRETS
from elyra.pipeline.pipeline_constants import MOUNTED_VOLUMES
from elyra.pipeline.processor import PipelineProcessor
from elyra.pipeline.processor import PipelineProcessorResponse
from elyra.pipeline.processor import RuntimePipelineProcessor
Expand Down Expand Up @@ -520,8 +522,6 @@ def _cc_pipeline(
f"Creating pipeline component archive '{operation_artifact_archive}' for operation '{operation}'"
)

volume_mounts = self._get_volume_mounts(operation=operation)

target_ops[operation.id] = ExecuteFileOp(
name=sanitized_operation_name,
pipeline_name=pipeline_name,
Expand All @@ -546,7 +546,8 @@ def _cc_pipeline(
"mlpipeline-metrics": f"{pipeline_envs['ELYRA_WRITABLE_CONTAINER_DIR']}/mlpipeline-metrics.json", # noqa
"mlpipeline-ui-metadata": f"{pipeline_envs['ELYRA_WRITABLE_CONTAINER_DIR']}/mlpipeline-ui-metadata.json", # noqa
},
volume_mounts=volume_mounts,
volume_mounts=operation.component_params.get(MOUNTED_VOLUMES, []),
kubernetes_secrets=operation.component_params.get(KUBERNETES_SECRETS, []),
)

if operation.doc:
Expand Down
72 changes: 62 additions & 10 deletions elyra/pipeline/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
import dataclasses
from dataclasses import dataclass
kiersten-stokes marked this conversation as resolved.
Show resolved Hide resolved
import json
import logging
from logging import Logger
import os
Expand Down Expand Up @@ -194,6 +197,16 @@ def _scrub_list(dirty: Optional[List[Optional[str]]]) -> List[str]:
def is_generic_operation(operation_classifier) -> bool:
return operation_classifier in Operation.generic_node_types

@staticmethod
def log_message(msg: str, logger: Optional[Logger] = None, level: Optional[int] = logging.DEBUG):
"""
Log a message with the given logger at the given level or simply print.
"""
if logger:
logger.log(level, msg)
else:
print(msg)


class GenericOperation(Operation):
"""
Expand Down Expand Up @@ -454,26 +467,30 @@ def to_dict(self, logger: Optional[Logger] = None) -> Dict[str, str]:

key = key.strip()
if not key:
KeyValueList.log_message(f"Skipping inclusion of property '{kv}': no key found", logger, logging.WARN)
Operation.log_message(f"Skipping inclusion of property '{kv}': no key found", logger, logging.WARN)
continue
if isinstance(value, str):
value = value.strip()
if not value:
KeyValueList.log_message(
Operation.log_message(
f"Skipping inclusion of property '{key}': no value specified", logger, logging.DEBUG
)
continue

kv_dict[key] = value
return kv_dict

@classmethod
def to_str(cls, key: str, value: str) -> str:
return f"{key}{cls._key_value_separator}{value}"

@classmethod
def from_dict(cls, kv_dict: Dict) -> "KeyValueList":
"""
Convert a set of key-value pairs stored in a dictionary to
a KeyValueList of strings with the defined separator.
"""
str_list = [f"{key}{cls._key_value_separator}{value}" for key, value in kv_dict.items()]
str_list = [KeyValueList.to_str(key, value) for key, value in kv_dict.items()]
return KeyValueList(str_list)

@classmethod
Expand All @@ -487,12 +504,47 @@ def merge(cls, primary: "KeyValueList", secondary: "KeyValueList") -> "KeyValueL

return KeyValueList.from_dict({**secondary_dict, **primary_dict})

@staticmethod
def log_message(msg: str, logger: Optional[Logger] = None, level: Optional[int] = logging.DEBUG):
@classmethod
def difference(cls, minuend: "KeyValueList", subtrahend: "KeyValueList") -> "KeyValueList":
"""
Log a message with the given logger at the given level or simply print.
Given KeyValueLists, convert to dictionaries and remove any keys found in the
second (subtrahend) from the first (minuend), if present.

:param minuend: list to be subtracted from
:param subtrahend: list whose keys will be removed from the minuend

:returns: the difference of the two lists
"""
if logger:
logger.log(level, msg)
else:
print(msg)
subtract_dict = minuend.to_dict()
for key in subtrahend.to_dict().keys():
if key in subtract_dict:
subtract_dict.pop(key)

return KeyValueList.from_dict(subtract_dict)


@dataclass
class VolumeMount:
path: str
pvc_name: str


@dataclass
class KubernetesSecret:
env_var: str
name: str
key: str


class DataClassJSONEncoder(json.JSONEncoder):
"""
A JSON Encoder class to prevent errors during serialization of dataclasses.
"""

def default(self, o):
"""
Render dataclass content as dict
"""
if dataclasses.is_dataclass(o):
return dataclasses.asdict(o)
return super().default(o)
1 change: 1 addition & 0 deletions elyra/pipeline/pipeline_constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
RUNTIME_IMAGE = "runtime_image"
ENV_VARIABLES = "env_vars"
MOUNTED_VOLUMES = "mounted_volumes"
KUBERNETES_SECRETS = "kubernetes_secrets"
PIPELINE_META_PROPERTIES = ["name", "description", "runtime"]
# optional static prefix to be used when generating an object name for object storage
COS_OBJECT_PREFIX = "cos_object_prefix"
Loading