Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HelmRelease install & upgrade timeout regex prevent release uninstallation #3227

Closed
1 task done
NicolasFloquet opened this issue Oct 20, 2022 · 2 comments · Fixed by fluxcd/helm-controller#549
Closed
1 task done

Comments

@NicolasFloquet
Copy link

Describe the bug

I have an HelmReleases installed with install & upgrade set as following:

spec:
  interval: 1m
  install:
    timeout: 60m
    remediation:
      retries: -1
  upgrade:
    timeout: 60m
    remediation:
      retries: -1

Which worked perfectly well with older flux version (0.26.0)

Now, we're trying to upgrade to latest flux version (0.35.0), and we're facing issues uninstalling such HelmRelease.

The HelmRelease resource itself is reporting reconciliation failure:
reconciliation failed: Operation cannot be fulfilled on helmreleases.helm.toolkit.fluxcd.io "ad-server": the object has been modified; please apply your changes to the latest version and try again

When checking the HelmController logs, following error is raised:

{
  "level": "error",
  "ts": "2022-10-20T09:30:30.357Z",
  "msg": "Reconciler error",
  "controller": "helmrelease",
  "controllerGroup": "helm.toolkit.fluxcd.io",
  "controllerKind": "HelmRelease",
  "HelmRelease": {
    "name": "release-name",
    "namespace": "namespace-name"
  },
  "namespace": "namespace-name",
  "name": "release-name",
  "reconcileID": "3deabf86-9f0c-41fb-8529-e8b22b7228e2",
  "error": "HelmRelease.helm.toolkit.fluxcd.io \"release-name\" is invalid: [spec.install.timeout: Invalid value: \"1h0m0s\": spec.install.timeout in body should match '^([0-9]+(\\.[0-9]+)?(ms|s|m))+$', spec.upgrade.timeout: Invalid value: \"1h0m0s\": spec.upgrade.timeout in body should match '^([0-9]+(\\.[0-9]+)?(ms|s|m))+$']"
}

From my understanding, the timeouts we provide are compliant with the new regex '^([0-9]+(\\.[0-9]+)?(ms|s|m))+$', but it is internally converted to 1h0m0s, which is not.

My questions are following:

  • Why is the regex so restrictive?
  • Although my timeout is compliant with the regex, it gets rejected when uninstalling. How can I work around that? Right now this service is installed on many clusters that I want to upgrade to new flux version, so I have a strong interdependency between my HelmRelease configuration and flux version, which is annoying

Thanks!

Steps to reproduce

  1. Install HelmRelease with install & upgrade timeouts set to 60m
  2. Uninstall previously installed HelmRelease

Expected behavior

We expect the helmRelease to be properly uninstalled.

Screenshots and recordings

No response

OS / Distro

Ubuntu 20.04

Flux version

v0.35.0

Flux check

N/A

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@hiddeco
Copy link
Member

hiddeco commented Oct 20, 2022

The setting is this restrictive because otherwise arbitrary timeouts can cause a denial of service. In case you would be using a controller with 4 workers, and multiple (malicious) release are added with an extremely high timeout this would block any other release from being processed. See GHSA-f4p5-x4vc-mh4v for more information.

My first question to address the issue with your release specifically is if it really takes this much time to run any release action? In case this is true, my advise would be to set it to 59m.

@NicolasFloquet
Copy link
Author

My first question to address the issue with your release specifically is if it really takes this much time to run any release action? In case this is true, my advise would be to set it to 59m.

Our use-case is indeed in "edge" environnement that, due to poor connectivity, may may take a long time to reconcile a release. We can of-course reduce it to 59m, but:

  • In my opinion, it's still a work around for an undocumented behavior
  • Our HelmRelease manifests are generated and updating those values will be somewhat painful
  • This work around doesn't work if we end-up needing longer timeouts

I understand our use-case is uncommon, and we'll most likely use your work around short term, but I feel this limitation should be either fixed or documented :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants