Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Support source references in remote values #11966

Closed
wants to merge 1 commit into from
Closed

fix: Support source references in remote values #11966

wants to merge 1 commit into from

Conversation

jkroepke
Copy link
Contributor

@jkroepke jkroepke commented Jan 12, 2023

fixes #11866

This PR includes support for multi-source applications in combination with helm-secrets. Before reviewing the code, I would like to discuss the concept and idea first.

instead use $ref for source references, use $(ref) which is district from env substitution. Kubernetes follows the same pattern. https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/#define-an-environment-dependent-variable-for-a-container

To prevent unexpected evaluation from URL parameters, URLs parameter should be URL encoded, e.g. %24%28values%29
to prevent replacing from ArgoCD itself.

This PR is a result from a discussion in CNCF slack (https://cloud-native.slack.com/archives/C01TSERG0KZ/p1673265568561419)


Note sure, if this is way to late for 2.6.0. But if 2.6.0 GA is released with the current implementation, we have think about breaking changes.


Note on DCO:

If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).

@jkroepke jkroepke changed the title Support source references in remote values fix Support source references in remote values Jan 12, 2023
@jkroepke jkroepke changed the title fix Support source references in remote values fix: Support source references in remote values Jan 12, 2023
Signed-off-by: Jan-Otto Kröpke <joe@cloudeteer.de>
@crenshaw-dev
Copy link
Member

I think a breaking change now is much preferable to a breaking change later.

I'd rather avoid doing arbitrary replacements in the valuesFile string. The substituted path is sensitive, because it includes a random UUID, meant to protect the repo cache location from path traversal. Arbitrary substitutions make it more likely that an attacker might convince Helm to leak that path value to some output.

Alternative recommendation: parse the valuesFile path as a URL and substituted only certain parts. For example, if there's no schema, we replace the first path component. If it's a secret:// schema, we replace the first segment after secret://. This gives us more confidence about how the (sensitive) substituted path is used.

Another alternative (better, but more work): don't substitute the long-lived cache path of the referenced repo. Instead, copy the one referenced file out of the referenced repo to a new, randomized, temporary path. This has three advantages:

  1. we don't care so much about the possibility of leaking the path - I think we could arbitrarily substitute the path into the valuesFile string
  2. we don't have to hold a lock on the referenced source path as long - we release the lock immediately after copying the one file out
  3. we no longer have to prevent referencing the same repo at a different revision, because we're no longer holding a lock on the referenced repo while generating the referencing repo's sources - I've seen at least one person who wanted this restriction lifted

@jkroepke
Copy link
Contributor Author

jkroepke commented Jan 12, 2023

I'd rather avoid doing arbitrary replacements in the valuesFile string. The substituted path is sensitive, because it includes a random UUID, meant to protect the repo cache location from path traversal. Arbitrary substitutions make it more likely that an attacker might convince Helm to leak that path value to some output.

Alternative recommendation: parse the valuesFile path as a URL and substituted only certain parts. For example, if there's no schema, we replace the first path component. If it's a secret:// schema, we replace the first segment after secret://. This gives us more confidence about how the (sensitive) substituted path is used.

The problem is that you can not assume a static file path after the scheme. In helm-secrets uses a lot logic inside the file url, e.g. load a gpg key into ArgoCD keychain (since ArgoCD does not support this)..

For example, import a gpg key and use it for decryption. This is required, since ArgoCD has no option to import a private GPG into the repo-server keychain.

# Pattern: secrets+gpg-import://<key-volume-mount>/<key-name>.asc?<relative/path/to/the/encrypted/secrets.yaml>
secrets+gpg-import:///helm-secrets-private-keys/key.asc?$(repo)/secrets.yaml

# Pattern: secrets+gpg-import-kubernetes://<namespace>/<secret-name>#<key-name>.asc?<relative/path/to/the/encrypted/secrets.yaml>
secrets+gpg-import-kubernetes://argocd/helm-secrets-private-keys#key.asc?$(repo)/secrets.yaml

In that case, the real file path is at the end of the URL, not at the start.


In general, helm-secrets bypasses a lot of from ArgoCD protections and that the reason why this is not allowed by default (helm scheme restriction). At least, there are some configurable restriction on helm-secrets to prevent path traversal https://github.com/jkroepke/helm-secrets/wiki/Security-in-shared-environments

My alternative recommendation would be a multiple repo-server and have the possibility to pin repo-server instances to ArgoCD Projects and assume that file leakage inside a ArgoCD project is fine

But now we are shifting a bit. From my point of view, its not worth that ArgoCD apply projection on URLs that used by 3rd party application.

Since 3rd party application like helm plugin are not allowed on default configuration, the behavior is fine.

@jannfis
Copy link
Member

jannfis commented Jan 12, 2023

My alternative recommendation would be a multiple repo-server and have the possibility to pin repo-server instances to ArgoCD Projects and assume that file leakage inside a ArgoCD project is fine

Yup, that's something I have thought about as well. I'm close to finishing a design proposal, which when submitted, you'd be invited to review and collaborate on.

@crenshaw-dev
Copy link
Member

@jkroepke how would you feel about the copy-then-substitute alternative? That would give us the ability to support all types of valuesFile paths, while also avoiding leaking the sensitive repo cache path.

@jkroepke
Copy link
Contributor Author

@crenshaw-dev Could you please more explain the "copy-then-substitute alternative" approach?

@crenshaw-dev
Copy link
Member

@jkroepke for sure!

Current implementation

Immediately before calling helm template, we loop over all the references from valuesFile. For each reference, we check out the referenced repo at the specified revision. We then acquire a lock on that repo, so that other processes don't check out a different commit before we call helm template. Finally, we substitute the path of the referenced source in the $ref position in the valuesFile string.

This approach has three disadvantages:

  1. We have to hold the lock on the referenced repo for a long time. This slows down other processes which need access to that repo.
  2. We cannot reference two different revisions of the same repo. Since a lock is held at one revision, a different source at a different revision will cause a deadlock.
  3. We have to use the long-lived, randomized repo cache path in place of $ref. The more places we use that path, the greater risk we leak it. The path is sensitive, because the random portion helps mitigate directory traversal attacks.

Proposed implementation

In the area of code where we currently loop, checkout, lock, and substitute paths for each source, we should instead:

  1. Loop over referenced sources.
  2. For each source, checkout the referenced repo at the specified revision, acquiring a lock.
  3. Copy the referenced file to a new, temporary, randomized location.
  4. Release the lock.
  5. Substitute the new, randomized path for $ref.
  6. After looping, finish running helm template.

This solves the disadvantages above. Solving problem 3 above makes me more comfortable substituting the path for arbitrary/unknown uses. If the file only exists for a few moments, we're not too worried about leaking the path.

One challenge: how do we know what the path is? For secrets://some/thing?path=$ref/file.yaml&other=blah, is the path file.yaml, or is it file.yaml&other=blah.

Proposal: parse the valuesFile field as a URL and perform substitution on 1) the path and 2) each query param individually. I think that'll cover the mentioned use cases.

@jkroepke
Copy link
Contributor Author

Before we are start to parse URLs and value files which would increase the complexity, I would also introduce another approach.

The idea that all sources from one Application have a shared root directory.

The working directory is always the shared root directory, if possible.

Example App

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: guestbook
  namespace: argocd
spec:
  project: default
  destination:
    server: https://kubernetes.default.svc/
    namespace: default
  sources:
    - chart: elasticsearch
      repoURL: https://helm.elastic.co/
      targetRevision: 8.5.1
      alias: elasticsearch
    - repoURL: https://github.com/argoproj/argocd-example-apps.git
      path: guestbook
      targetRevision: HEAD
      alias: guestbook

Out shared root is /tmp/uuid/ and its the working directory for all executions.

ArgoCD pull the sources into sub directories. If an alias is defined, it will be used als directory name. The first source lives in
/tmp/uuid/elasticsearch/ and the second source in /tmp/uuid/guestbook/.


Second example:

apiVersion: argoproj.io/v1alpha1
kind: Application
spec:
  sources:
  - repoURL: 'https://prometheus-community.github.io/helm-charts'
    chart: prometheus
    targetRevision: 15.7.1
    helm:
      valueFiles:
      - 'app-values/charts/prometheus/values.yaml'
      - 'secrets://app-values/charts/prometheus/values.yaml'
      - 'secrets://some/thing?path=app-values/file.yaml&other=blah'
    alias: prometheus
  - repoURL: 'https://git.example.gom/org/value-files.git'
    targetRevision: dev
    alias: app-values

Since both sources live in a shared root, relative value file can be used. This approach completely omits references. Absolute file paths could be always denied without any exception. That is example, the helm command would be translated to

helm template ./prometheus -f app-values/charts/prometheus/values.yaml -f 'secrets://app-values/charts/prometheus/values.yaml' 'secrets://some/thing?path=app-values/file.yaml&other=blah'

No transformation of valuePaths are needed in that case.

@crenshaw-dev
Copy link
Member

@jkroepke I do like that. An upside would be very short lock time for all repos. The downside would be that you'd have to fully copy each repo to a new, temporary location for each manifest generation. I think you'd want to set a max number of parallel requests to avoid filling up the disk.

@jkroepke
Copy link
Contributor Author

I think you'd want to set a max number of parallel requests to avoid filling up the disk.

ArgoCD already has a configurable parallel requests limit

command.Flags().Int64Var(&parallelismLimit, "parallelismlimit", int64(env.ParseNumFromEnv("ARGOCD_REPO_SERVER_PARALLELISM_LIMIT", 0, 0, math.MaxInt32)), "Limit on number of concurrent manifests generate requests. Any value less the 1 means no limit.")
?


My question is, how we proceed here? I have some lack of knowledge regards golang and software development principals, feeling not in the position to develop such functionally here.

@crenshaw-dev
Copy link
Member

@jkroepke I think either solution will have to wait for 2.7. I'd like to personally look at whether copy-whole-repo is viable, but I won't be able to do that in time for 2.6. Copy-one-file is simpler, but I'm not sure I could get the URL parsing/substitution right in time for 2.6. But if we go that route, there may be some hope for including it in a patch release, since it's a smaller change.

@jkroepke
Copy link
Contributor Author

Alright, thanks for the input. 👍

@jkroepke jkroepke closed this Jan 18, 2023
@jkroepke jkroepke deleted the secrets-source-ref branch January 18, 2023 07:20
@donofriov
Copy link

@crenshaw-dev @jkroepke Is the new functionality still on schedule for 2.7? I haven't seen any other updates here or #11866 so just wanted to follow up to verify I'm watching the correct issue/pr. Thanks!

@jkroepke
Copy link
Contributor Author

jkroepke commented Apr 3, 2023

I have implement some workarounds on helm-secrets side. But I'm currently seen any open fixes related to the proposal from above yet. I could not see any fixes/features in 2.7 RC, I guess no.

From my point of view, I cant contribute a pull request based on the described proposal from above.

@crenshaw-dev
Copy link
Member

Probably worth seeing whether this PR solves your use case: #12508

It just copies the whole referenced repo.

@jwitko
Copy link

jwitko commented Feb 27, 2024

@jkroepke Any chance you could be convinced to reopen this PR? Based on #12508 (comment) which lead to #11866 (comment)

@jkroepke
Copy link
Contributor Author

I have the feeling, it would still not accepted by the maintainers.

@crenshaw-dev
Copy link
Member

I think this PR is in the right direction, but needs some additional work to match the design in #11966 (comment)

@jkroepke
Copy link
Contributor Author

Great to know. Unfortunately I deleted the whole fork in meantime, Github does not allow to reopen this PR. Sorry for that.

It would take some time to bring the PR up again. As I know, this PR was created in a time frame, before the feature was not GA and breaking changes are fine. The PR contains some breaking changes which need to remove first.

But unlike 1 year ago, I same some other maintainer assignments now. Since my time is limited, I expect would take some weeks.

I would more than happy, if someone else could continue this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new multiple sources doesn't work with helm-secrets
5 participants