Reconciliation failing after upgrade to 2.2.2 #4529

Djiit · 2024-01-06T13:56:30Z

Describe the bug

When the kustomize-controller reconcile resources, it throws these logs:

info: server-side apply completed (with all resources unchanged -- this is expected)
error : health check failed after 1.145123312s: failed early due to stalled resources: <resource> status: 'Unknown' (listing all the managed resources)

Steps to reproduce

Well, I'm not sure. It just started to happen yesterday, roughly a day after a 2.1.0 => 2.2.0 upgrade. Looks like one of my image was updated through an updatepolicy, then the reconciliation failed, and then every minute the kustomize-controller threw the logs above.

Expected behavior

Everything works (actually it works, but the controller is telling me ALL the resources are in a status "unknown"). I'd had to mute my notifications for now.

Screenshots and recordings

No response

OS / Distro

macOS latest version

Flux version

flux: v2.2.2

Flux check

❯ flux check
► checking prerequisites
✔ Kubernetes 1.26.11 >=1.26.0-0
► checking version in cluster
✔ distribution: flux-v2.2.2
✔ bootstrapped: true
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.37.2
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v1.2.3
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v1.2.3
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.37.0
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.31.1
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v1.2.1
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta3
✔ buckets.source.toolkit.fluxcd.io/v1beta2
✔ gitrepositories.source.toolkit.fluxcd.io/v1
✔ helmcharts.source.toolkit.fluxcd.io/v1beta2
✔ helmreleases.helm.toolkit.fluxcd.io/v2beta2
✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1
✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2
✔ providers.notification.toolkit.fluxcd.io/v1beta3
✔ receivers.notification.toolkit.fluxcd.io/v1
✔ imagepolicies.image.toolkit.fluxcd.io/v1beta2
✔ imagerepositories.image.toolkit.fluxcd.io/v1beta2
✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1
✔ all checks passed

Git provider

Github

Container Registry provider

No response

Additional context

Is it possible there is a threshold of some sort that was misconfigured when upgrading to 2.2.2 ?

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

Djiit · 2024-01-06T15:39:16Z

I can also see some desync betweeb flux and its HR?

❯ flux reconcile hr -n home-asssistant home-assistant
✗ helmreleases.helm.toolkit.fluxcd.io "home-assistant" not found

NAME          	REVISION	SUSPENDED	READY	MESSAGE
home-assistant	13.4.2  	False    	False	Helm upgrade failed for release home-assistant/home-assistant with chart home-assistant@13.4.2: context deadline exceeded

❯ flux suspend hr -n home-assistant home-assistant
► suspending helmrelease home-assistant in home-assistant namespace
✔ helmrelease suspended

❯ flux resume hr -n home-assistant home-assistant
► resuming helmrelease home-assistant in home-assistant namespace
✔ helmrelease resumed
◎ waiting for HelmRelease reconciliation
✔ HelmRelease home-assistant reconciliation completed
✔ applied revision 13.4.2

❯ flux reconcile hr -n home-asssistant home-assistant
✗ helmreleases.helm.toolkit.fluxcd.io "home-assistant" not found

Now that this HR is reconciled (with a suspend/resume), the error is gone...
It was this HR's image that was updated yesterday.

funkymcb · 2024-01-16T15:42:28Z

We got the same error failed early due to stalled resources: <resource> status: 'Unknown' on many different resources.
But those resources are not deployed via helm. So there is no helmrelease we could suspend or resume.
The resources themselves are all up and healthy.

We are on flux 2.2.2 aswell.
Any other fix or workaround?

stefanprodan · 2024-01-16T16:38:20Z

@funkymcb the list of resources should contain the one that failed, grep them and filter out the Unknown ones.

funkymcb · 2024-01-16T18:33:30Z

True. There was one deployment failing. Hard to spot at first sight under hundrets of resources. Thanks

stefanprodan · 2024-01-16T19:20:20Z

Hard to spot on the first sight under hundrets of resources.

@funkymcb I have created an issue for this. My proposal to solve it is by filtering the resources and show only the failed ones fluxcd/pkg#715

patsevanton · 2024-05-31T13:07:39Z

Who needs to increase resources? fluxcd or the application?

wilmardo mentioned this issue Jan 9, 2024

Upgrade from Flux 2.1.x to 2.2.2 leaves most HelmReleases in a broken state #4524

Closed

1 task

stefanprodan mentioned this issue Jan 16, 2024

Improve kstatus output for failed early due to stalled resources fluxcd/pkg#715

Closed

stefanprodan closed this as completed Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconciliation failing after upgrade to 2.2.2 #4529

Reconciliation failing after upgrade to 2.2.2 #4529

Djiit commented Jan 6, 2024

Djiit commented Jan 6, 2024

funkymcb commented Jan 16, 2024

stefanprodan commented Jan 16, 2024

funkymcb commented Jan 16, 2024 •

edited

Loading

stefanprodan commented Jan 16, 2024

patsevanton commented May 31, 2024

Reconciliation failing after upgrade to 2.2.2 #4529

Reconciliation failing after upgrade to 2.2.2 #4529

Comments

Djiit commented Jan 6, 2024

Describe the bug

Steps to reproduce

Expected behavior

Screenshots and recordings

OS / Distro

Flux version

Flux check

Git provider

Container Registry provider

Additional context

Code of Conduct

Djiit commented Jan 6, 2024

funkymcb commented Jan 16, 2024

stefanprodan commented Jan 16, 2024

funkymcb commented Jan 16, 2024 • edited Loading

stefanprodan commented Jan 16, 2024

patsevanton commented May 31, 2024

funkymcb commented Jan 16, 2024 •

edited

Loading