Handling Secrets in KF Pipeline #23

chritter · 2020-04-29T15:35:35Z

Hello,

It would be great to be able to manage the secrets and credentials in the KF Pipeline components. This avoids accidental leakage of sensitive information.

Christian

blairdrummond · 2020-04-29T15:44:41Z

This does not handle dynamic secret like with Vault, but this is how I handle passwords (with getpass).

zachomedia · 2020-04-29T15:46:02Z

We will leverage Hashicorp Vault for this, which can automatically inject the secrets into a running container. We can start with static secrets (stored in key-value pairs) and longer term looking at switching to more dynamic secrets for different components (like what we're doing with MinIO).

zachomedia · 2020-05-05T14:47:29Z

Hey @justbert , if you have time, would you be able to look at whether we can apply Vault policies to users based on their preferred_username attribute from Azure Active Directory?

justbert · 2020-05-05T14:48:12Z

Yeah, on it!

justbert · 2020-05-05T23:59:05Z

Proof of concept is built! This is how we can operationalize it:

Create Vault JWT backend created for role="kubeflow". Maps AAD users to <preferred_username>
Create Vault Policy which allows access to users' key-value stores where name="kv_profile-<kubeflow-profile-name>"
Pull Request Modify policy (name="profile-configurator") which allows for ["creation", "read"] on identity/entity and ["creation", "read"] on identity/entity-alias.
Assign the policy (name="profile-configurator") to a kubernetes_role (name="profile-configurator") for the ServiceAccount (name="profile-configurator") and Namespace (name="daaas") in which the kubeflow-controller is running.
Pull Request Update kubeflow-controller to create a Vault Entity where name="<preferred_username>", policy="kubeflow-profile", and metadata={kubeflow_profile="<kubeflow-profile-name>"}
Pull Request Update kubeflow-controller to create a Vault entity-alias linking the AAD <preferred_username> to the Vault Entity whose name is <preferred_username> using the OIDC token accessor. (Will require the previously created entity's canonical_id)
Update kubeflow-controller pod to have OIDC_AUTH_ACCESSOR environment variable
Use vault.hashicorp.com/agent-inject-token annotation to inject the vault token in a Jupyter notebook (needs testing) to allow HTTP calls to Vault

<kubeflow-profile-name> is compliant with DNS Subdomain Names (RFC 1123) and is a sanitized version of <preferred_username>.

justbert · 2020-05-08T16:38:14Z

Just a written update:
As a cornerstone of secure access to secrets in the platform it is imperative to ensure that the possibilities that someone may access the Vault maliciously and with advanced privileges are as minimal as possible.

Due to the potential security pitfalls that could be created if the process for granting access to the stores isn't well structured. We've refactored the access and policy model got a good access model around it. I'm making sure that accounts and services in the system follow the Least Required Access principle and refactoring the system structure to enhance that even more.

justbert · 2020-05-15T12:22:31Z

Here's an update:
We've hit another snag in Vault while trying to reduce the surface area of attack vectors in automating this process. It turns out that building a very specific policy for our automated process through the use of allowed_parameters and other restrictions doesn't apply to complex data types as noted in this Vault issue. This and other problems wouldn't be an issue in Vault Enterprise due to the extension of the its policy capabilities through Sentinel.
I'm going to spend some time on the weekend to come up with some extra architectures to try to mitigate these shortcomings.

ca-scribner · 2020-05-28T20:40:43Z

After researching this issue, and unfortunately diffusing the discussion of it by opening #62, I wanted to summarize where we are and what's left to do.

Concrete use cases embodying this issue

Users need to pass secrets (either their own or those we provide, such as minio credentials) safely to kubeflow pipelines. Use cases include:

Single-use pipeline
- Pipeline will be authored and run immediately
- Secret is needed for a limited time
Reusable pipeline
- PL will be used over time, not just when first authored
- Secret must persist for the life cycle of the pipeline definition, not a single run of the pipeline
Shared, reusable pipeline
- Pipeline authored by UserA but might be shared with or run by UserB, UserC..
- Pipelines shared through(?)
  - Users run from a shared namespace
  - Users share the yaml and run from their own workspace (more like a template)
- Not sure what should happen with secret persistence here...
Repeatly submitting a pipeline (single or reusable)
- Example: Running a pipeline twice, once using minio minimal and one using minio premium (or some other example where the same pipeline needs to run with two different secret values)

Lit review of examples of secrets being passed

Secure examples

An example of secret sharing is shown in the Kubeflow pipelines sdk for cloud credentials:

These handle credentials by:

(implicitly done sometime before pipeline runs):
- put credentials into Kubeflow Secret(s) in the pipeline runner's namespace
(during pipeline's runtime)
- pull credentials from Kubeflow Secret and put them into environment variables
- pipeline script then imports secrets from the environment variables

This results in secrets that appear as environment variables during runtime, but that do not appear in the yaml like a plain environment variable. For example:

    env:
      - name: ENV_VAR_OF_MY_SECRET
        valueFrom:
          secretKeyRef:
            name: KUBEFLOW_SECRET_NAME
            key: KUBEFLOW_SECRET_KEY

This is the only approach I found for passing secrets without showing them in the yaml. The limitation here is that the secret needs to be in the pipeline runner's namespace (n=kubeflow) to be accessible, but the author of the pipeline never has access to that namespace. To solve this we need a way of putting the secret in the pipeline runner's namespace. Secrets cannot be shared across namespaces, but can be copied if someone has access to both

Less secure examples...

There are also examples where people pass secrets as bare environment variables, but then in the yaml they're exposed like:

    env:
      - name: ENV_VAR_OF_MY_SECRET
        value: VALUE_OF_MY_SECRET

For cases that pass secrets as environment variables, they must not be worried about the security of those secrets or they expect the wider namespace security covers their needs (eg: nobody who shouldn't see the secret ever gets access to the yaml)?

Work completed so far

#62 documents how environment variables passed to pipelines are visible in pipeline yaml files and thus are not suitable for secrets, especially for shared yaml files.

Work on enabling vault for personal secrets is nearly complete (this issue, see notes above from @justbert). This will allow:

users to define secrets in their own personal vault at the vault.covid...
users to access those secrets from a notebook in their namespace via injection through the vault sidecar
This is great for secrets users interact with in their notebook servers.

...but if we wanted these secrets available to kubeflow during pipeline runtime, we have to give the entire vault (or maybe just an entire namespace in a vault? either way...) to all running kubeflow pipelines via the ServiceAccounts and namespace. That means vault secrets would be public to anyone inside the kubeflow pipeline runner's namespace (kubeflow). Right now we have no way of saying "this pod in the pipeline runner gets Vault.A, this pod gets Vault.B"

Still to do:

While we have ways of storing secrets, we still do not have a way of passing them from a user to a running pipeline

Ideas

(from @justbert) need to devise a trust model that would only inject the secrets that that user has access to, either through OPA or by giving users specific ServiceAccounts (maybe) that could be used for authentication?
(me) the external examples of using secrets all revolve around them being available in the pipeline runner's namespace. Is there some pattern we're missing, maybe where typically in kfp implementations users will have access to the namespace of their runner? Can everyone have their own personal runner?
...?

blairdrummond · 2020-12-28T16:50:50Z

@chritter @ca-scribner , now that we have namespaced pipelines, you can use regular ol' Kubernetes secrets for this.

I think we should document this and place an example somwhere

blairdrummond · 2020-12-28T16:51:23Z

Going to close in favour of #91

brendangadd added component/kubeflow Kubeflow Related kind/feature New feature or request priority/soon size/L 4-5 days labels Apr 29, 2020

Colette-G added the current-sprint label Apr 29, 2020

zachomedia self-assigned this Apr 30, 2020

justbert self-assigned this May 5, 2020

justbert mentioned this issue May 7, 2020

Allow users to request shared namespaces for projects #2

Closed

ca-scribner mentioned this issue May 19, 2020

Secrets passed by env var to Kubeflow Pipeline available in plaintext yaml #62

Closed

ca-scribner mentioned this issue May 28, 2020

Document that users should not pass secrets to kubeflow pipelines via environment variables #91

Closed

blairdrummond closed this as completed Dec 28, 2020

wg102 mentioned this issue Jul 12, 2022

Upgrade 1.6: JWA #1242

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling Secrets in KF Pipeline #23

Handling Secrets in KF Pipeline #23

chritter commented Apr 29, 2020

blairdrummond commented Apr 29, 2020

zachomedia commented Apr 29, 2020

zachomedia commented May 5, 2020

justbert commented May 5, 2020

justbert commented May 5, 2020 •

edited

Loading

justbert commented May 8, 2020

justbert commented May 15, 2020

ca-scribner commented May 28, 2020

blairdrummond commented Dec 28, 2020

blairdrummond commented Dec 28, 2020

Handling Secrets in KF Pipeline #23

Handling Secrets in KF Pipeline #23

Comments

chritter commented Apr 29, 2020

blairdrummond commented Apr 29, 2020

zachomedia commented Apr 29, 2020

zachomedia commented May 5, 2020

justbert commented May 5, 2020

justbert commented May 5, 2020 • edited Loading

justbert commented May 8, 2020

justbert commented May 15, 2020

ca-scribner commented May 28, 2020

Concrete use cases embodying this issue

Lit review of examples of secrets being passed

Secure examples

Less secure examples...

Work completed so far

Still to do:

Ideas

blairdrummond commented Dec 28, 2020

blairdrummond commented Dec 28, 2020

justbert commented May 5, 2020 •

edited

Loading