Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Secrets in KF Pipeline #23

Closed
chritter opened this issue Apr 29, 2020 · 10 comments
Closed

Handling Secrets in KF Pipeline #23

chritter opened this issue Apr 29, 2020 · 10 comments
Assignees
Labels
component/kubeflow Kubeflow Related kind/feature New feature or request priority/soon size/L 4-5 days

Comments

@chritter
Copy link

Hello,

It would be great to be able to manage the secrets and credentials in the KF Pipeline components. This avoids accidental leakage of sensitive information.

Christian

@brendangadd brendangadd added component/kubeflow Kubeflow Related kind/feature New feature or request priority/soon size/L 4-5 days labels Apr 29, 2020
@blairdrummond
Copy link
Contributor

This does not handle dynamic secret like with Vault, but this is how I handle passwords (with getpass).

sreenshot-2020-04-29_11-04-1588174852

@zachomedia
Copy link

We will leverage Hashicorp Vault for this, which can automatically inject the secrets into a running container. We can start with static secrets (stored in key-value pairs) and longer term looking at switching to more dynamic secrets for different components (like what we're doing with MinIO).

@zachomedia
Copy link

Hey @justbert , if you have time, would you be able to look at whether we can apply Vault policies to users based on their preferred_username attribute from Azure Active Directory?

@justbert
Copy link

justbert commented May 5, 2020

Yeah, on it!

@justbert justbert self-assigned this May 5, 2020
@justbert
Copy link

justbert commented May 5, 2020

Proof of concept is built! This is how we can operationalize it:

  • Create Vault JWT backend created for role="kubeflow". Maps AAD users to <preferred_username>

  • Create Vault Policy which allows access to users' key-value stores where name="kv_profile-<kubeflow-profile-name>"

  • Pull Request Modify policy (name="profile-configurator") which allows for ["creation", "read"] on identity/entity and ["creation", "read"] on identity/entity-alias.

  • Assign the policy (name="profile-configurator") to a kubernetes_role (name="profile-configurator") for the ServiceAccount (name="profile-configurator") and Namespace (name="daaas") in which the kubeflow-controller is running.

  • Pull Request Update kubeflow-controller to create a Vault Entity where name="<preferred_username>", policy="kubeflow-profile", and metadata={kubeflow_profile="<kubeflow-profile-name>"}

  • Pull Request Update kubeflow-controller to create a Vault entity-alias linking the AAD <preferred_username> to the Vault Entity whose name is <preferred_username> using the OIDC token accessor. (Will require the previously created entity's canonical_id)

  • Update kubeflow-controller pod to have OIDC_AUTH_ACCESSOR environment variable

  • Use vault.hashicorp.com/agent-inject-token annotation to inject the vault token in a Jupyter notebook (needs testing) to allow HTTP calls to Vault

<kubeflow-profile-name> is compliant with DNS Subdomain Names (RFC 1123) and is a sanitized version of <preferred_username>.

@justbert
Copy link

justbert commented May 8, 2020

Just a written update:
As a cornerstone of secure access to secrets in the platform it is imperative to ensure that the possibilities that someone may access the Vault maliciously and with advanced privileges are as minimal as possible.

Due to the potential security pitfalls that could be created if the process for granting access to the stores isn't well structured. We've refactored the access and policy model got a good access model around it. I'm making sure that accounts and services in the system follow the Least Required Access principle and refactoring the system structure to enhance that even more.

@justbert
Copy link

Here's an update:
We've hit another snag in Vault while trying to reduce the surface area of attack vectors in automating this process. It turns out that building a very specific policy for our automated process through the use of allowed_parameters and other restrictions doesn't apply to complex data types as noted in this Vault issue. This and other problems wouldn't be an issue in Vault Enterprise due to the extension of the its policy capabilities through Sentinel.
I'm going to spend some time on the weekend to come up with some extra architectures to try to mitigate these shortcomings.

@ca-scribner
Copy link
Contributor

After researching this issue, and unfortunately diffusing the discussion of it by opening #62, I wanted to summarize where we are and what's left to do.

Concrete use cases embodying this issue

Users need to pass secrets (either their own or those we provide, such as minio credentials) safely to kubeflow pipelines. Use cases include:

  • Single-use pipeline
    • Pipeline will be authored and run immediately
    • Secret is needed for a limited time
  • Reusable pipeline
    • PL will be used over time, not just when first authored
    • Secret must persist for the life cycle of the pipeline definition, not a single run of the pipeline
  • Shared, reusable pipeline
    • Pipeline authored by UserA but might be shared with or run by UserB, UserC..
    • Pipelines shared through(?)
      • Users run from a shared namespace
      • Users share the yaml and run from their own workspace (more like a template)
    • Not sure what should happen with secret persistence here...
  • Repeatly submitting a pipeline (single or reusable)
    • Example: Running a pipeline twice, once using minio minimal and one using minio premium (or some other example where the same pipeline needs to run with two different secret values)

Lit review of examples of secrets being passed

Secure examples

An example of secret sharing is shown in the Kubeflow pipelines sdk for cloud credentials:

These handle credentials by:

  • (implicitly done sometime before pipeline runs):
    • put credentials into Kubeflow Secret(s) in the pipeline runner's namespace
  • (during pipeline's runtime)
    • pull credentials from Kubeflow Secret and put them into environment variables
    • pipeline script then imports secrets from the environment variables

This results in secrets that appear as environment variables during runtime, but that do not appear in the yaml like a plain environment variable. For example:

    env:
      - name: ENV_VAR_OF_MY_SECRET
        valueFrom:
          secretKeyRef:
            name: KUBEFLOW_SECRET_NAME
            key: KUBEFLOW_SECRET_KEY

This is the only approach I found for passing secrets without showing them in the yaml. The limitation here is that the secret needs to be in the pipeline runner's namespace (n=kubeflow) to be accessible, but the author of the pipeline never has access to that namespace. To solve this we need a way of putting the secret in the pipeline runner's namespace. Secrets cannot be shared across namespaces, but can be copied if someone has access to both

Less secure examples...

There are also examples where people pass secrets as bare environment variables, but then in the yaml they're exposed like:

    env:
      - name: ENV_VAR_OF_MY_SECRET
        value: VALUE_OF_MY_SECRET

For cases that pass secrets as environment variables, they must not be worried about the security of those secrets or they expect the wider namespace security covers their needs (eg: nobody who shouldn't see the secret ever gets access to the yaml)?

Work completed so far

#62 documents how environment variables passed to pipelines are visible in pipeline yaml files and thus are not suitable for secrets, especially for shared yaml files.

Work on enabling vault for personal secrets is nearly complete (this issue, see notes above from @justbert). This will allow:

  • users to define secrets in their own personal vault at the vault.covid...
  • users to access those secrets from a notebook in their namespace via injection through the vault sidecar
    This is great for secrets users interact with in their notebook servers.

...but if we wanted these secrets available to kubeflow during pipeline runtime, we have to give the entire vault (or maybe just an entire namespace in a vault? either way...) to all running kubeflow pipelines via the ServiceAccounts and namespace. That means vault secrets would be public to anyone inside the kubeflow pipeline runner's namespace (kubeflow). Right now we have no way of saying "this pod in the pipeline runner gets Vault.A, this pod gets Vault.B"

Still to do:

While we have ways of storing secrets, we still do not have a way of passing them from a user to a running pipeline

Ideas

  1. (from @justbert) need to devise a trust model that would only inject the secrets that that user has access to, either through OPA or by giving users specific ServiceAccounts (maybe) that could be used for authentication?

  2. (me) the external examples of using secrets all revolve around them being available in the pipeline runner's namespace. Is there some pattern we're missing, maybe where typically in kfp implementations users will have access to the namespace of their runner? Can everyone have their own personal runner?

  3. ...?

@blairdrummond
Copy link
Contributor

@chritter @ca-scribner , now that we have namespaced pipelines, you can use regular ol' Kubernetes secrets for this.

I think we should document this and place an example somwhere

@blairdrummond
Copy link
Contributor

Going to close in favour of #91

@wg102 wg102 mentioned this issue Jul 12, 2022
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/kubeflow Kubeflow Related kind/feature New feature or request priority/soon size/L 4-5 days
Projects
None yet
Development

No branches or pull requests

7 participants