Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm charts (almost) always re-deploy even when nothing has changed #3473

Closed
worldofgeese opened this issue Dec 30, 2022 · 4 comments · Fixed by #5138 or #6745
Closed

Helm charts (almost) always re-deploy even when nothing has changed #3473

worldofgeese opened this issue Dec 30, 2022 · 4 comments · Fixed by #5138 or #6745

Comments

@worldofgeese
Copy link
Contributor

worldofgeese commented Dec 30, 2022

Bug

I make use of a hefty Helm chart (CockroachDB's) inside a helm Garden module. Even when no values have been changed, on garden deploy it's very likely Garden will version and release a new chart to the cluster, which can take up to minutes. One of our community members also experiences this bug.

Current Behavior

Here are two separate releases deployed to my cluster of the same chart. You can see one update can take up to 164 seconds so it's quite costly.

Nothing has changed in the chart or the values provided to the chart.

2022-12-28!UNITO-UNDERSCORE!15h34!UNITO-UNDERSCORE!40

2022-12-29!UNITO-UNDERSCORE!11h02!UNITO-UNDERSCORE!27

Expected behavior

Charts should only cut a new release version when an input value has changed.

Reproducible example

Use my full-fat example with your own authenticated deployment registry defined.

Workaround

None.

Your environment

  • OS: Windows 11 running WSL
  • How I'm running Kubernetes: GKE Autopilot

garden version
0.12.48

@ashwiren
Copy link

ashwiren commented Jan 3, 2023

My experience has been that this only seems to happen if there are garden variables as values in the helm chart values.

If you have a static values in the helm chart then there's no redeployment.

examples:
No redeployment

kind: Module
description: In-cluster Redis for glx-connect
type: helm
name: redis
chartPath: ./redis
values:
  architecture: 'standalone'
  auth:
    enabled: false
  master:
    resources:
      limits:
        cpu: 500m
        memory: 256Mi
      requests:
        cpu: 50m
        memory: 128Mi

Redeploys.

kind: Module
description: In-cluster PostgreSQL for Keycloak
disabled: ${var.keycloak.database.rds} # if rds is enabled, see infrastructure-aws
type: helm
name: keycloak-pg
chartPath: ./postgresql
dependencies: 
  - infrastructure-aws
values:
  architecture: 'standalone'
  auth:
    database: ${var.keycloak.database.name}
    username: ${var.keycloak.database.username}
    password: "${runtime.services.infrastructure-aws.outputs.keycloak_db_password}"

@vvagaytsev
Copy link
Collaborator

Closing as done via #5138.

@vvagaytsev vvagaytsev self-assigned this Jan 25, 2024
@vvagaytsev
Copy link
Collaborator

It looks like this is happening again, see https://discord.com/channels/817392104711651328/1199912856355881121/1199912856355881121

@vvagaytsev vvagaytsev reopened this Jan 25, 2024
@vvagaytsev vvagaytsev removed their assignment Feb 20, 2024
@stefreak
Copy link
Member

stefreak commented Jan 24, 2025

This issue might finally be addressed by #6745 – I stumbled across this issue when refactoring the module resolver, when I noticed that we factor the value of overrides into the configuration itself, which affects version calculation.

Thinking about this, I realised that we should exclude variables and varfiles from version calculation, since only their effect on the spec and other relevant fields should be taken into account for version calculation.

WDYT, am I missing something @thsig @vvagaytsev?

See also 7aa19b6

stefreak added a commit that referenced this issue Jan 24, 2025
…ion calculation is only affected if the variable is used. fixes #3473
github-merge-queue bot pushed a commit that referenced this issue Jan 31, 2025
…gs when absolutely needed (#6745)

* perf: improve preprocess action perf by capturing context instead of
partially resolving template strings

* perf: use layered context and capture context instead of eager variable merging

* chore: fix some issues

* fix: layered context implementation and serialisation of partially
resolved values

* perf: make sure it resolves and improve performance further

* chore: remove yaml highlighting

The yaml highlighting did not work in production builds anyway and removing this functionality made it easier to compare performance between production builds and the dev build for commands like `get graph`.

* perf: do not call `resolveTemplateString` anymore, use `deepEvaluate` instead.

* chore: do not validate providers if not fully resolved yet.

* chore: remove `resolveTemplateStrings` and make sure that the template tests pass

* chore: fix docs generation

* chore: fix build by not ignoring parser.d.ts

* chore: fix vote example

* chore: fix bunch of tests

* fix: print correct wrapped error stack in tests

* chore: detect invalid keyed access of unresolved template values

fail early if we access an unresolved template value

* chore: change structural operator implementation to allow partial
evaluation

* chore: also detect unpermitted write access to unresolved template value

* fix: fix early provider resolution when resolving Garden params

* refactor: introduce type `UnresolvedProviderConfig`

* chore: fix provider config

Co-authored-by: Vladimir Vagaytsev <vvagaytsev@users.noreply.github.com>

* chore(template): primitives for partial resolution needed for module resolution

* chore: first step making module resolver work

* test: add extra test cases for template string access protection

* chore: fix lint errors

* chore: enhance error handling

* chore: add `toString()` for unresolved template values

* fix: fix the processing order in the context resolution loop

* chore: helper function to serialise unresolved template values in tests

Co-authored-by: Steffen Neubauer <steffen@garden.io>

* test: fix one test failure

* chore: fix more bugs

* refactor: introduce named type for symbol `CONTEXT_RESOLVE_KEY_NOT_FOUND`

* test: fix assertions in provider configuration tests

* chore: rewrite ConfigContext.resolve

* chore: do not pass unresolved action variables to configure handler (to avoid problems joi validation issues)

* chore: fix more tests and reintroduce nodePath

* chore: fix all ConfigContext tests

* chore: fix BuildCommand tests

* chore: fix a symlink test

* chore: fix CustomCommandWrapper tests

* chore: fix e2e test templated-k8s-container

* chore: make sure to forward crashes in prepareModuleResource

* chore: fix lint errors

* chore: fix templated-k8s-container-modules by skipping early prepareBuildDependencies

* chore: uncomment object freeze in parseTemplateCollection to fix some tests

* chore: fix "should correctly parse" test

* test: fix tests in RunWorkflowCommand

* chore: fix command error handling tests

* chore: fix further tests

* chore: fix output test

* refactor(test): parse whole workflow configs as template values

* chore: fix lint issues

* docs: re-generate docs

* test: fix test crashes

* test: fix assertions

* chore: fix some tests by not cloning actions

* test: fix crash in pickEnvironment tests

* chore: fix lint issues

* test: fix some test crashes in `scanAndAddConfigs`

* test: fix some test crashes in `getConfigGraph`

* test: fix template string assertion

* test: fix assertions for variables

* test: fix some assertions in template-string.ts

* test: fix some assertions in template-string.ts

* fix: process `$concat` correctly in `$forEach`

* fix: add work-around for nodejs exit 0 crash

* fix: parse template strings in project config tests

* fix: resolveProjectConfig test

* fix: throw on missing secret keys

* test: fix tests for `getActionTemplateReferences`

* fix: more informative message if no keys are available in the context

* test: fix assertions for "should handle keys with dots and unresolvable member expressions correctly"

* fix: update plugin context schema

* test: fix variable assertions in "pickEnvironment"

* fix: evaluate all values that do not have essential runtime references in partiallyEvaluateModule

(The comparison was inverted before)

* test: make ResolveActionTask pass

* test: fix default providers definition in test helpers

* chore: fix resolveWorkflowConfig tests

* fix: partiallyEvaluateModule accidentally compared the wrong thing

* fix: further test cases "ModuleResolver"

* test: trivial assertion corrections in "DefaultEnvironmentContext"

* test: fix assertions in provider configuration tests

Reverts wrong assertion changes made in 50e6d90.

* fix: set provider path before calling `configureProvider` handler

This fixes the test failure in "Garden.'resolveProviders.should call a configureProvider handler if applicable'"

* Revert "fix: set provider path before calling `configureProvider` handler"

This reverts commit 5e10469.

* test: fix assertion in "resolveProviders.should call a configureProvider handler if applicable"

* fix: remove unnecessary hack in `action.disable` flag resolution

* chore: extract defaults from input schema (best-effort) so we can apply the schema during action resolution

* fix: build

* fix: do not use capture in `renderModules`

* fix: partial module resolution dependency detection / BuildCommand tests

* test: remove test "should normalize build dependencies" - prepareModuleResource does not normalise build deps anymore.

* test: serialise unresolved templates before using expect

* fix: test "should remove null values in provider configs"

* fix: resolveConfigTemplate tests

* fix: make a couple of renderConfigTemplate tests work

* fix: ensure right-to-left the order of precedence in `LayeredContext`

* fix: renderConfigTemplate tests

* Revert "fix: remove unnecessary hack in `action.disable` flag resolution"

This reverts commit 3247a3c.

* fix: test "should resolve template strings in project source definitions"

* fix: test "should return a different version for a module when a variable used by it changes"

* chore: update assertions in test "should resolve modules from config templates and any modules referencing them"

* fix: test "sets variables for the action"

* fix: test "should respect the action variables < action varfile < CLI var precedence order"

* fix: test "should resolve disabled flag in actions and allow two actions with same key if one is disabled"

* fix: test should resolve actions from config templates

* chore: remove unused and misleading internal flag

* fix: "preprocessActionConfig" tests

* fix: rehydrate template strings after converting modules to actions

* fix: do not parse template on already parsed module

* fix: do early module spec validation only if inputs are resolved

* fix: do early module spec validation only if inputs are resolved

Follow-up fix for 57d968e.

* fix: perform toJSON serialization only for unresolved template values

* test: fix tests in "resolveProjectOutputs"

* chore: fix lint errors

* fix: enforce correct variable precedence centrally and make sure version calculation is only affected if the variable is used. fixes #3473

* fix: do not validate BaseAction instances in joi augmentGraph schema

Otherwise joi

* fix: additional test fixes

* refactor: use  @ts-expect-error todo comments instead of as unknown casts

* chore: (experiment) remove hack with `then` field that was referenced by js engine

* chore: fix lint issues

* chore: wrap config error into template string error for better ux

To be able to locate the problematic piece of the configuration.

* chore(test): lowercase actual err message only once

* test: fix error message assertions

Template string errors are rendered differently and include snippets of the original config.

* test: follow-up fix for test assertion

* test: restore secrets value in test data

The value was accidentally removed in 7aa19b6.

* refactor: declare context properties as readonly if possible

* test: parse template collection in exec plugin tests

Fixes "actions should be able to access exec provider script result"

* test: fix setup and assertions in k8s and helm integ tests

* chore: fix lint error

* chore: fix kubernetes-type handlers integ tests

* fix: throwOnMissingSecretKeys should not throw an error if secrets are optional in an expression

* test: restore original k8s resource state after each test

* test: separate context for tests w/o artifacts + avoid unnecessary tmp dir creation

* test: use standalone lightweight project to test PodRunner

We do not need to resolve graph for the heavy project defined in `test-project/container`

* chore(test): extract helper to create empty project with local-kubernetes provider

* test: simpler and faster test setup for ingress-controller.ts

* test: simpler and faster test setup for ingress.ts

* fix: integ test "should resolve fine with null values for manifests in
spec.files"

* refactor: extract local var

* test: prevent unsafe config modifications

Some actions are used by multiple tests and should not be modified to avoid dirty shared data state.

* test: fix test action spec definition

Follow-up for the prev commit

* test: prevent unsafe config modifications in "should only show logs returned from failed containers"

Some actions are used by multiple tests and should not be modified to avoid dirty shared data state.

* test: prevent unsafe config modifications in "should return events and Pod logs if the Pod can't start"

Some actions are used by multiple tests and should not be modified to avoid dirty shared data state.

* test: wrap logs- and events-test in a context

* chore: split tests to 2 semantic contexts

* test: avoid dirty state if the shared helm configs and resources

* test: add todo-comment

* chore: fix lint issues

* test: correct deploy action type + removed obsolete delete statement

* test: declare "getWorkloadPods" as top-level spec

* test: simplify test setup for "getWorkloadPods"

* chore: check if this helps with the timeouts

* chore: gather more information about nature of the hang

* Revert "chore: check if this helps with the timeouts"

This reverts commit a43d75c.

* test: run helm tests in different namespaces

Avoid interference by running helm tests in different namespaces.

* chore: check if this helps with the timeouts

* test: don't cache garden instances

* test: fix helm project setup in AEC test

* chore(test): simplify `getContainerTestGarden` helper

* chore: change the type of `var` field in some var contexts

* chore(test): clear diagnostic interval in `afterAll` global hook

* Revert "test: run helm tests in different namespaces"

This reverts commit 56847c4

* Revert "test: fix helm project setup in AEC test"

This reverts commit 4b46b45.

* chore: fix lint errors

* fix: correct rendering of available keys when key is missing in the context

* docs: update config-resolution.md

* test: restore allowPartial tests and refactor escaping code

* fix: do not use legacyAllowPartial in module resolution

It's very difficult to support the complexity of legacyAllowPartial combined with structural template operators. The code will be easier to maintain this way. The benefit is only marginal, and the old code also struggled with ForEach expressions in the module resolution.

* fix: use simpler context in caprturing

Co-authored-by: Steffen Neubauer <steffen@garden.io>

* chore: actualise todo-comment

* chore: fix lint issues

* chore: typos and spacing

* fix: restore key highlighting in template error messages

---------

Co-authored-by: Vladimir Vagaytsev <10628074+vvagaytsev@users.noreply.github.com>
Co-authored-by: Vladimir Vagaytsev <vvagaytsev@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment