-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm upgrade failed: another operation (install/upgrade/rollback) is in progress #149
Comments
Is this happening so often that it would be possible to enable the |
Not sure if it's related, but one potential source of releases getting stuck in pending-* would be non-graceful termination of the controller pods while a release action (install/upgrade/rollback) is in progress. I see that the controller-runtime has some support for that, not sure if we need to do anything to integrate with or test that, it seems like at least lengthening the default termination grace period (currently 10 seconds) may make sense. |
I also think it would useful if Helm separated the deployment status from the wait status, and allowed running the wait as a standalone functionality, and thus recovery from waits that failed or were interrupted. I'll try to get an issue created for that. |
Based on feedback from another user, it does not seem to be related to pod restarts all the time, but still waiting on logs to confirm this. I tried to build in some behavior to detect a "stuck version" in #166. Technically, without Helm offering full support for a context that can be cancelled, the graceful shutdown period would always require a configuration value equal to the highest timeout a |
We got the same issue in the company I work for. We discovered fluxcd and we wanted to use it, |
Running into this same issue when updating datadog in one of our cluster. Any suggestions on how to handle this?
|
Can you provide additional information on the state of the release (as provided by |
The controller pod did not restart. I just see a bunch of the error above in the helm-controller log messages. I did notice this though when digging into the history a bit. I noticed that it tried upgrading the chart on the 26th and failed. Which was one I probably saw that error message that there was another operation in progress.
I was able to do a rollback to revision 2 and then ran the helm reconcile and it seems to have went through just now. |
Try
I did a helm upgrade by hand, and then it reconciled in flux too. |
I see the contantly on GKE at the moment. Especially if i try to recreate a cluster from scratch.
The helm controller cant therefore also not reach the source controller:
Not sure if flux is the cause by flooding the k8s api and some limits are reached? |
@monotek can you try setting the |
I'm using the fluxcd terraform provider. My pod args look like:
So i guess the default vlaue of 4 is used? I've changed it via "kubectl edit deploy" for now. The cluster has kind of settled, as there were no new fluxcd pod restarts for today. I'll give feedback if adjusting the value helps, if we get unstable master api again. Thanks for your help :) |
I got same issue. Please check the following first. I was not even able to list the release under this usual command
this was responding empty. So funny behavior from helm.
make sure your context is set for correct kuberenetes cluster. then next step is
try applying the rollback to above command.
|
Use the following command to also see charts in all namespaces and also the ones where installation is in progress.
|
this is also happening in flux2. It seems to be the same problem and it is happening very frequently. I have to delete these failed helmreleases to recreate them and sometimes the recreation doesn't even work. Before the helmreleases failed, I wasn't modifying them in VCS. But somehow they failed all of a sudden.
|
also saw this today... no previous revision to roll back to, so had to delete the helmrelease and start the reconciliation again. |
I have encountered the same problem multiple times on different clusters. To fix the HelmRelease state I applied the workaround from this issue comment: helm/helm#8987 (comment) as deleting the HelmRelease could have unexpected consequences. Some background that might be helpful in identifying the problem:
|
Same here, and we constantly manually apply |
For all folks who do not experience helm controller crashes.. could you try adding bigger timeout to HelmRelease |
(Now I realized that I've missed the NOT, the suggestion was only for the folks who are NOT experiencing crashes, clearly not meant for me :D ...) Adding
After:
and the release gets pending:
|
@mfamador your issue is different, based on “ Failed to update lock: Put "https://10.0.0.1:443/apis/coordination.k8s.io/v1/namespaces/flux-system/leases/5b6ca942.fluxcd.io": context deadline exceeded” I would say that your AKS network is broken or the Azure proxy for the Kubernetes API is crashing. Please reach to Azure support as this is not something we can fix for you. |
You're probably right @stefanprodan, I'll reach them, but to clarify, this is a brand new AKS cluster, which has been destroyed and recreated from scratch multiple times, and always ending up with the Flux v2 crashing, most of the times when installing the kube-prometheus-stack helm chart, others with Loki or Tempo. We've been creating several AKS clusters and we're only seeing this when using Flux 2, find hard to believe that's an AKS problem. |
@mfamador if Flux leader election times out then I don’t see how any other controller would work, we don’t do anything special here, leader election is implemented with upstream Kubernetes libraries. Check out the AKS FAQ, seems that Azure has serious architectural issues as they use some proxy called tunnelfront or aks-link that you need to restart from time to time 😱 https://docs.microsoft.com/en-us/azure/aks/troubleshooting
|
If on AKS the cluster API for some reason becomes overwhelmed by the requests (that should be cached, sane, and not cause much pressure on an average cluster), another thing you may want to try is to trim down on the concurrent processing for at least the Helm releases / helm-controller by tweaking the |
Thanks, @stefanprodan and @hiddeco, I'll give it a try |
@hiddeco, the controllers are still crashing after setting |
Then I think it is as @stefanprodan describes, and probably related to some CNI/tunnel front issue in AKS. |
Thanks, @hiddeco, yes, I think you might be right, that brings me many concerns about using AKS in production, I'll try another CNI to see if it gets better. |
@marcocaberletti we fixed the OOM issues, we'll do a flux release today. Please see: https://github.com/fluxcd/helm-controller/blob/main/CHANGELOG.md#0122 |
Great! Thanks! |
This saved me my chrismas eve dinner, thank you so much! |
We have experienced this behavior when helm upgrade fails and helm-controller fails to update the status on HelmRelease. Status update failure is due to HelmRelease object is modified. At that point the HelmRelease stuck in Another operation is in progress even though there is no other operation in pending. Maybe there should be retry on the status updates in helm controller. |
@hiddeco / @stefanprodan - is it possible to get an update on this issue? |
Yes. The helm-controller is scheduled to see the same refactoring round as the source-controller recently did, in which reconciliation logic in the broadest sense will be improved, and long standing issues will be taken care of. I expect to start on this at the beginning of next week. |
That's great news. I sometimes feel that Helm is second-class citizen in Flux world, it's about time it got some love. Don't get me wrong, I love Flux. But, sometimes I do miss ArgoCD's approach where everything is managed through same object kind "Application", meaning Helm gets all the same benefits as any other deployment. |
Just going to throw some context here on where I'm seeing this issue... It seems consistently tied to a helm-controller crash during reconciliation and leaving the
Not sure if this context is helpful or repetitive - I'd assume this behavior is already being tracked in relation to the refactoring mentioned above? I can provide additional logs and observations if that would help. |
We've experienced issues where some of our release get stuck on:
The only way we know how to get passed this is by deleting the release, would the refactoring of the helm-controller address this or is there an alternative way to get the release rolled out without having to delete it?
|
@sharkztex this is the same problem I commonly see. The workarounds I know of are: # example w/ kiali
HR_NAME=kiali
HR_NAMESPACE=kiali
kubectl get secrets -n ${HR_NAMESPACE} | grep ${HR_NAME}
# example output:
sh.helm.release.v1.kiali.v1 helm.sh/release.v1 1 18h
sh.helm.release.v1.kiali.v2 helm.sh/release.v1 1 17h
sh.helm.release.v1.kiali.v3 helm.sh/release.v1 1 17m
# Delete the most recent one:
kubectl delete secret -n ${HR_NAMESPACE} sh.helm.release.v1.${HR_NAME}.v3
# suspend/resume the hr
flux suspend hr -n ${HR_NAMESPACE} ${HR_NAME}
flux resume hr -n ${HR_NAMESPACE} ${HR_NAME} Alternatively you can use helm rollback: HR_NAME=kiali
HR_NAMESPACE=kiali
# Run a helm history command to get the latest release before the issue (should show deployed)
helm history ${HR_NAME} -n ${HR_NAMESPACE}
# Use that revision in this command
helm rollback ${HR_NAME} <revision> -n ${HR_NAMESPACE}
flux reconcile hr bigbang -n ${HR_NAMESPACE} |
I'm currently experiencing this error, but I'm seeing an endless loop of helm releases every few seconds.
I'm new to flux and helm, so I may be missing something obvious, but the output of some of the suggested commands does not have the expected results. As you can see, one of my releases has 8702 revisions and counting.
As you can also see, the revision has gone up in between running these two commands:
There's no "good" revision to rollback to in the output. The latest one has status In addition, when trying to suspend the helm release, I get this error message.
Which confuses me, since the |
This problem still persists. It is very annoying that we cannot rely on GitOps to eventually converge a cluster to the expected state as it is stuck on affected HelmRelease objects, stuck in Is there anything that prevents us from adding a new feature to the helm-controller to detect stuck (locked) HelmReleases and automatically fix them by rolling them back immediately followed by a reconciliation? |
I see this problem frequently, mainly with Helm upgrades that take a while to complete - e.g. |
@alex-berger @danports the locking issue seems to happen only if helm-controller is OOMKILL (due to out of memory) or SIGKILL (if the node where it's running dies without evicting pods first). Is this the case for you? |
I'll have to take a deeper look at logs/metrics to confirm, but that wouldn't surprise me, since I've been having intermittent OOM issues on the nodes where helm-controller runs. If that's the case, this seems less like a Flux bug and more like a node stability issue on my end, though a self-healing feature along the lines of what @alex-berger suggested would be nice. |
Helm itself places a lock when it starts an upgrade, if you kill Helm while doing it, it leaves the lock in place preventing any further upgrade operations. Doing a rollback is very expensive and can have grave consequences for charts with statefulsets or charts that contain hooks which perform db migrations and other state altering operations. We'll need to find a way to remove the lock without affecting the deployed workloads. |
Actually, I doubt this only happens on My educated guess is, that this problem happens whenever there are After all, GitOps should not have to rely on humans, waiting for pager calls just to manually running |
Until we figure out how to recover the Helm storage at restart, I suggest you move helm-controller to EKS Fargate or some dedicated node outside of Karpenter, this would allow helm-controller to perform hours long upgrades without interruption. |
Moving helm-controller to EKS Fargate might mitigate the problem (a bit) and actually this is what we are currently working on. However, as we are using Cilium (CNI) this also needs changes to the NetworkPolicy objects deployed by FluxCD, especially the spec:
podSelector: {}
ingress:
- {} # All ingress
egress:
- {} # All egress
policyTypes:
- Ingress
- Egress As you can see, the work-around using EKS Fargate weakens security as we had to open-up the NetworkPolicy quite a bit. In our case we can take the risk as we only run trusted workloads on those clusters. But for other users this might be a no-go. |
Closing this in favor of #644. Thank you all! |
This was recommended as one of the suggestions in a bug report with the similar issue fluxcd/helm-controller#149.
This was recommended as one of the suggestions in a bug report with the similar issue fluxcd/helm-controller#149.
Signed-off-by: Evsyukov Denis <denis.evsyukov@flant.com> diff --git c/.golangci.yaml i/.golangci.yaml index 34a2b6e..6243489 100644 --- c/.golangci.yaml +++ i/.golangci.yaml @@ -37,11 +37,6 @@ linters-settings: - prefix(github.com/flant/) goimports: local-prefixes: github.com/flant/ - # TODO: replace deprecated methods - # ignore deprecated - # https://staticcheck.dev/docs/checks/#SA1019 - staticcheck: - checks: ["all","-SA1019"] sloglint: # Enforce not mixing key-value pairs and attributes. no-mixed-args: true diff --git c/cmd/addon-operator/main.go i/cmd/addon-operator/main.go index d548892..a4d6e9e 100644 --- c/cmd/addon-operator/main.go +++ i/cmd/addon-operator/main.go @@ -95,11 +95,11 @@ func run(ctx context.Context, operator *addon_operator.AddonOperator) error { operator.SetupKubeConfigManager(bk) if err := operator.Setup(); err != nil { - operator.Logger.Fatalf("setup failed: %s\n", err) + operator.Logger.Fatal("setup failed", log.Err(err)) } if err := operator.Start(ctx); err != nil { - operator.Logger.Fatalf("start failed: %s\n", err) + operator.Logger.Fatal("start failed", log.Err(err)) } // Block action by waiting signals from OS. diff --git c/examples/700-go-hook/global-hooks/global-go-hook.go i/examples/700-go-hook/global-hooks/global-go-hook.go index 7b7c76c..9868bca 100644 --- c/examples/700-go-hook/global-hooks/global-go-hook.go +++ i/examples/700-go-hook/global-hooks/global-go-hook.go @@ -10,6 +10,6 @@ var _ = sdk.RegisterFunc(&gohook.HookConfig{ }, handler) func handler(input *gohook.HookInput) error { - input.Logger.Infof("Start Global Go hook") + input.Logger.Info("Start Global Go hook") return nil } diff --git c/examples/700-go-hook/modules/001-module-go-hooks/hooks/go_hooks.go i/examples/700-go-hook/modules/001-module-go-hooks/hooks/go_hooks.go index 5b8764c..b4ef753 100644 --- c/examples/700-go-hook/modules/001-module-go-hooks/hooks/go_hooks.go +++ i/examples/700-go-hook/modules/001-module-go-hooks/hooks/go_hooks.go @@ -1,6 +1,8 @@ package hooks import ( + "fmt" + "log/slog" "time" v1 "k8s.io/api/core/v1" @@ -54,11 +56,12 @@ func ObjFilter(obj *unstructured.Unstructured) (gohook.FilterResult, error) { func run(input *gohook.HookInput) error { for _, o := range input.Snapshots["pods"] { podSpec := o.(*podSpecFilteredObj) - input.Logger.Infof("Got podSpec: %+v", podSpec) + input.Logger.Info("Got podSpec", + slog.String("spec", fmt.Sprintf("%+v", podSpec))) } - input.Logger.Infof("Hello from on_kube.pods2! I have %d snapshots\n", - len(input.Snapshots)) + input.Logger.Info("Hello from on_kube.pods2! I have snapshots", + slog.Int("count", len(input.Snapshots))) input.MetricsCollector.Add("addon_go_hooks_total", 1.0, nil) diff --git c/pkg/addon-operator/admission_http_server.go i/pkg/addon-operator/admission_http_server.go index f9dea6a..3a6a2e7 100644 --- c/pkg/addon-operator/admission_http_server.go +++ i/pkg/addon-operator/admission_http_server.go @@ -3,6 +3,7 @@ package addon_operator import ( "context" "fmt" + "log/slog" "net/http" "path" "time" @@ -27,7 +28,8 @@ func NewAdmissionServer(listenPort, certsDir string) *AdmissionServer { func (as *AdmissionServer) RegisterHandler(route string, handler http.Handler) { if _, ok := as.routes[route]; ok { - log.Fatalf("Route %q is already registered", route) + log.Fatal("Route is already registered", + slog.String("route", route)) } as.routes[route] = handler @@ -41,7 +43,8 @@ func (as *AdmissionServer) start(ctx context.Context) { mux.Handle(route, handler) } - log.Debugf("Registered admission routes: %v", as.routes) + log.Debug("Registered admission routes", + slog.String("routes", fmt.Sprintf("%v", as.routes))) srv := &http.Server{ Addr: fmt.Sprintf(":%s", as.listenPort), @@ -67,7 +70,7 @@ func (as *AdmissionServer) start(ctx context.Context) { cancel() }() if err := srv.Shutdown(ctx); err != nil { - log.Fatalf("Server Shutdown Failed:%+v", err) + log.Fatal("Server Shutdown Failed", log.Err(err)) } }() } diff --git c/pkg/addon-operator/bootstrap.go i/pkg/addon-operator/bootstrap.go index 2ddafbe..1544f88 100644 --- c/pkg/addon-operator/bootstrap.go +++ i/pkg/addon-operator/bootstrap.go @@ -2,6 +2,7 @@ package addon_operator import ( "fmt" + "log/slog" "github.com/deckhouse/deckhouse/pkg/log" @@ -18,22 +19,24 @@ import ( func (op *AddonOperator) bootstrap() error { log.Info(shapp.AppStartMessage) - log.Infof("Search modules in: %s", app.ModulesDir) + log.Info("Search modules", + slog.String("path", app.ModulesDir)) - log.Infof("Addon-operator namespace: %s", op.DefaultNamespace) + log.Info("Addon-operator namespace", + slog.String("namespace", op.DefaultNamespace)) // Debug server. // TODO: rewrite shapp global variables to the addon-operator ones var err error op.DebugServer, err = shell_operator.RunDefaultDebugServer(shapp.DebugUnixSocket, shapp.DebugHttpServerAddr, op.Logger.Named("debug-server")) if err != nil { - log.Errorf("Fatal: start Debug server: %s", err) + log.Error("Fatal: start Debug server", log.Err(err)) return fmt.Errorf("start Debug server: %w", err) } err = op.Assemble(op.DebugServer) if err != nil { - log.Errorf("Fatal: %s", err) + log.Error("Fatal", log.Err(err)) return fmt.Errorf("assemble Debug server: %w", err) } @@ -68,7 +71,7 @@ func (op *AddonOperator) Assemble(debugServer *debug.Server) (err error) { // SetupKubeConfigManager sets manager, which reads configuration for Modules from a cluster func (op *AddonOperator) SetupKubeConfigManager(bk backend.ConfigHandler) { if op.KubeConfigManager != nil { - log.Warnf("KubeConfigManager is already set") + log.Warn("KubeConfigManager is already set") // return if kube config manager is already set return } diff --git c/pkg/addon-operator/operator.go i/pkg/addon-operator/operator.go index 88af829..93b8ff2 100644 --- c/pkg/addon-operator/operator.go +++ i/pkg/addon-operator/operator.go @@ -154,9 +154,9 @@ func NewAddonOperator(ctx context.Context, opts ...Option) *AddonOperator { DefaultNamespace: app.Namespace, ConvergeState: converge.NewConvergeState(), parallelTaskChannels: parallelTaskChannels{ - channels: make(map[string](chan parallelQueueEvent)), + channels: make(map[string]chan parallelQueueEvent), }, - discoveredGVKs: make(map[string]struct{}, 0), + discoveredGVKs: make(map[string]struct{}), } for _, opt := range opts { @@ -245,7 +245,8 @@ func (op *AddonOperator) Setup() error { if err != nil { return fmt.Errorf("global hooks directory: %s", err) } - log.Infof("global hooks directory: %s", globalHooksDir) + log.Info("global hooks directory", + slog.String("dir", globalHooksDir)) tempDir, err := ensureTempDirectory(shapp.TempDir) if err != nil { @@ -437,7 +438,8 @@ func (op *AddonOperator) RegisterManagerEventsHandlers() { "binding": string(htypes.Schedule), } logEntry := utils.EnrichLoggerWithLabels(op.Logger, logLabels) - logEntry.Debugf("Create tasks for 'schedule' event '%s'", crontab) + logEntry.Debug("Create tasks for 'schedule' event", + slog.String("event", crontab)) var tasks []sh_task.Task op.ModuleManager.HandleScheduleEvent(crontab, @@ -508,7 +510,8 @@ func (op *AddonOperator) RegisterManagerEventsHandlers() { "binding": string(htypes.OnKubernetesEvent), } logEntry := utils.EnrichLoggerWithLabels(op.Logger, logLabels) - logEntry.Debugf("Create tasks for 'kubernetes' event '%s'", kubeEvent.String()) + logEntry.Debug("Create tasks for 'kubernetes' event", + slog.String("event", kubeEvent.String())) var tasks []sh_task.Task op.ModuleManager.HandleKubeEvent(kubeEvent, @@ -735,7 +738,7 @@ func (op *AddonOperator) CreatePurgeTasks(modulesToPurge []string, t sh_task.Tas return newTasks } -// ApplyKubeConfigValues +// HandleApplyKubeConfigValues ... func (op *AddonOperator) HandleApplyKubeConfigValues(t sh_task.Task, logLabels map[string]string) (res queue.TaskResult) { var handleErr error defer trace.StartRegion(context.Background(), "HandleApplyKubeConfigValues").End() @@ -749,7 +752,9 @@ func (op *AddonOperator) HandleApplyKubeConfigValues(t sh_task.Task, logLabels m if handleErr != nil { res.Status = queue.Fail - logEntry.Errorf("HandleApplyKubeConfigValues failed, requeue task to retry after delay. Failed count is %d. Error: %s", t.GetFailureCount()+1, handleErr) + logEntry.Error("HandleApplyKubeConfigValues failed, requeue task to retry after delay.", + slog.Int("count", t.GetFailureCount()+1), + log.Err(handleErr)) op.engine.MetricStorage.CounterAdd("{PREFIX}modules_discover_errors_total", 1.0, map[string]string{}) t.UpdateFailureMessage(handleErr.Error()) t.WithQueuedAt(time.Now()) @@ -758,7 +763,7 @@ func (op *AddonOperator) HandleApplyKubeConfigValues(t sh_task.Task, logLabels m res.Status = queue.Success - logEntry.Debugf("HandleApplyKubeConfigValues success") + logEntry.Debug("HandleApplyKubeConfigValues success") return } @@ -769,7 +774,8 @@ func (op *AddonOperator) HandleConvergeModules(t sh_task.Task, logLabels map[str taskEvent, ok := t.GetProp(converge.ConvergeEventProp).(converge.ConvergeEvent) if !ok { - logEntry.Errorf("Possible bug! Wrong prop type in ConvergeModules: got %T(%#[1]v) instead string.", t.GetProp("event")) + logEntry.Error("Possible bug! Wrong prop type in ConvergeModules: got another type instead string.", + slog.String("type", fmt.Sprintf("%T(%#[1]v)", t.GetProp("event")))) res.Status = queue.Fail return res } @@ -781,7 +787,7 @@ func (op *AddonOperator) HandleConvergeModules(t sh_task.Task, logLabels map[str op.ConvergeState.PhaseLock.Lock() defer op.ConvergeState.PhaseLock.Unlock() if op.ConvergeState.Phase == converge.StandBy { - logEntry.Debugf("ConvergeModules: start") + logEntry.Debug("ConvergeModules: start") // Deduplicate tasks: remove ConvergeModules tasks right after the current task. RemoveAdjacentConvergeModules(op.engine.TaskQueues.GetByName(t.GetQueueName()), t.GetId(), logLabels, op.Logger) @@ -801,7 +807,7 @@ func (op *AddonOperator) HandleConvergeModules(t sh_task.Task, logLabels map[str } if op.ConvergeState.Phase == converge.WaitBeforeAll { - logEntry.Infof("ConvergeModules: beforeAll hooks done, run modules") + logEntry.Info("ConvergeModules: beforeAll hooks done, run modules") var state *module_manager.ModulesState state, handleErr = op.ModuleManager.RefreshEnabledState(t.GetLogLabels()) @@ -829,7 +835,7 @@ func (op *AddonOperator) HandleConvergeModules(t sh_task.Task, logLabels map[str } if op.ConvergeState.Phase == converge.WaitDeleteAndRunModules { - logEntry.Infof("ConvergeModules: ModuleRun tasks done, execute AfterAll global hooks") + logEntry.Info("ConvergeModules: ModuleRun tasks done, execute AfterAll global hooks") // Put AfterAll tasks before current task. tasks, handleErr := op.CreateAfterAllTasks(t.GetLogLabels(), hm.EventDescription) if handleErr == nil { @@ -846,21 +852,24 @@ func (op *AddonOperator) HandleConvergeModules(t sh_task.Task, logLabels map[str // It is the last phase of ConvergeModules task, reset operator's Converge phase. if op.ConvergeState.Phase == converge.WaitAfterAll { op.ConvergeState.Phase = converge.StandBy - logEntry.Infof("ConvergeModules task done") + logEntry.Info("ConvergeModules task done") res.Status = queue.Success return res } if handleErr != nil { res.Status = queue.Fail - logEntry.Errorf("ConvergeModules failed in phase '%s', requeue task to retry after delay. Failed count is %d. Error: %s", op.ConvergeState.Phase, t.GetFailureCount()+1, handleErr) + logEntry.Error("ConvergeModules failed, requeue task to retry after delay.", + slog.String("phase", string(op.ConvergeState.Phase)), + slog.Int("count", t.GetFailureCount()+1), + log.Err(handleErr)) op.engine.MetricStorage.CounterAdd("{PREFIX}modules_discover_errors_total", 1.0, map[string]string{}) t.UpdateFailureMessage(handleErr.Error()) t.WithQueuedAt(time.Now()) return res } - logEntry.Debugf("ConvergeModules success") + logEntry.Debug("ConvergeModules success") res.Status = queue.Success return res } @@ -971,12 +980,16 @@ func (op *AddonOperator) CreateAndStartQueuesForGlobalHooks() { h := op.ModuleManager.GetGlobalHook(hookName) for _, hookBinding := range h.GetHookConfig().Schedules { if op.CreateAndStartQueue(hookBinding.Queue) { - log.Debugf("Queue '%s' started for global 'schedule' hook %s", hookBinding.Queue, hookName) + log.Debug("Queue started for global 'schedule' hook", + slog.String("queue", hookBinding.Queue), + slog.String("hook", hookName)) } } for _, hookBinding := range h.GetHookConfig().OnKubernetesEvents { if op.CreateAndStartQueue(hookBinding.Queue) { - log.Debugf("Queue '%s' started for global 'kubernetes' hook %s", hookBinding.Queue, hookName) + log.Debug("Queue started for global 'kubernetes' hook", + slog.String("queue", hookBinding.Queue), + slog.String("hook", hookName)) } } } @@ -995,7 +1008,9 @@ func (op *AddonOperator) CreateAndStartQueuesForModuleHooks(moduleName string) { for _, hook := range scheduleHooks { for _, hookBinding := range hook.GetHookConfig().Schedules { if op.CreateAndStartQueue(hookBinding.Queue) { - log.Debugf("Queue '%s' started for module 'schedule' hook %s", hookBinding.Queue, hook.GetName()) + log.Debug("Queue started for module 'schedule'", + slog.String("queue", hookBinding.Queue), + slog.String("hook", hook.GetName())) } } } @@ -1004,7 +1019,9 @@ func (op *AddonOperator) CreateAndStartQueuesForModuleHooks(moduleName string) { for _, hook := range kubeEventsHooks { for _, hookBinding := range hook.GetHookConfig().OnKubernetesEvents { if op.CreateAndStartQueue(hookBinding.Queue) { - log.Debugf("Queue '%s' started for module 'kubernetes' hook %s", hookBinding.Queue, hook.GetName()) + log.Debug("Queue started for module 'kubernetes'", + slog.String("queue", hookBinding.Queue), + slog.String("hook", hook.GetName())) } } } @@ -1031,7 +1048,7 @@ func (op *AddonOperator) CreateAndStartParallelQueues() { for i := 0; i < app.NumberOfParallelQueues; i++ { queueName := fmt.Sprintf(app.ParallelQueueNamePattern, i) if op.engine.TaskQueues.GetByName(queueName) != nil { - log.Warnf("Parallel queue %s already exists", queueName) + log.Warn("Parallel queue already exists", slog.String("queue", queueName)) continue } op.engine.TaskQueues.NewNamedQueue(queueName, op.ParallelTasksHandler) @@ -1042,7 +1059,7 @@ func (op *AddonOperator) CreateAndStartParallelQueues() { func (op *AddonOperator) DrainModuleQueues(modName string) { m := op.ModuleManager.GetModule(modName) if m == nil { - log.Warnf("Module %q is absent when we try to drain its queue", modName) + log.Warn("Module is absent when we try to drain its queue", slog.String("module", modName)) return } @@ -1089,7 +1106,7 @@ func (op *AddonOperator) StartModuleManagerEventHandler() { eventLogEntry := utils.EnrichLoggerWithLabels(logEntry, logLabels) // if global hooks haven't been run yet, script enabled extender fails due to missing global values if op.globalHooksNotExecutedYet() { - eventLogEntry.Infof("Global hook dynamic modification detected, ignore until starting first converge") + eventLogEntry.Info("Global hook dynamic modification detected, ignore until starting first converge") break } @@ -1104,12 +1121,13 @@ func (op *AddonOperator) StartModuleManagerEventHandler() { ) // if converge has already begun - restart it immediately if op.engine.TaskQueues.GetMain().Length() > 0 && RemoveCurrentConvergeTasks(op.getConvergeQueues(), logLabels, op.Logger) && op.ConvergeState.Phase != converge.StandBy { - logEntry.Infof("ConvergeModules: global hook dynamic modification detected, restart current converge process (%s)", op.ConvergeState.Phase) + logEntry.Info("ConvergeModules: global hook dynamic modification detected, restart current converge process", + slog.String("phase", string(op.ConvergeState.Phase))) op.engine.TaskQueues.GetMain().AddFirst(convergeTask) op.logTaskAdd(eventLogEntry, "DynamicExtender is updated, put first", convergeTask) } else { // if convege hasn't started - make way for global hooks and etc - logEntry.Infof("ConvergeModules: global hook dynamic modification detected, rerun all modules required") + logEntry.Info("ConvergeModules: global hook dynamic modification detected, rerun all modules required") op.engine.TaskQueues.GetMain().AddLast(convergeTask) } // ConvergeModules may be in progress, Reset converge state. @@ -1128,12 +1146,15 @@ func (op *AddonOperator) StartModuleManagerEventHandler() { switch event.Type { case config.KubeConfigInvalid: op.ModuleManager.SetKubeConfigValid(false) - eventLogEntry.Infof("KubeConfig become invalid") + eventLogEntry.Info("KubeConfig become invalid") case config.KubeConfigChanged: - eventLogEntry.Debugf("ModuleManagerEventHandler-KubeConfigChanged: GlobalSectionChanged %v, ModuleValuesChanged %s, ModuleEnabledStateChanged %s", event.GlobalSectionChanged, event.ModuleValuesChanged, event.ModuleEnabledStateChanged) + eventLogEntry.Debug("ModuleManagerEventHandler-KubeConfigChanged", + slog.Bool("globalSectionChanged", event.GlobalSectionChanged), + slog.Any("moduleValuesChanged", event.ModuleValuesChanged), + slog.Any("moduleEnabledStateChanged", event.ModuleEnabledStateChanged)) if !op.ModuleManager.GetKubeConfigValid() { - eventLogEntry.Infof("KubeConfig become valid") + eventLogEntry.Info("KubeConfig become valid") } // Config is valid now, add task to update ModuleManager state. op.ModuleManager.SetKubeConfigValid(true) @@ -1160,7 +1181,7 @@ func (op *AddonOperator) StartModuleManagerEventHandler() { op.engine.TaskQueues.GetMain().CancelTaskDelay() op.logTaskAdd(eventLogEntry, "KubeConfigExtender is updated, put first", kubeConfigTask) } - eventLogEntry.Infof("Kube config modification detected, ignore until starting first converge") + eventLogEntry.Info("Kube config modification detected, ignore until starting first converge") break } @@ -1176,7 +1197,8 @@ func (op *AddonOperator) StartModuleManagerEventHandler() { ) // if main queue isn't empty and there was another convergeModules task: if op.engine.TaskQueues.GetMain().Length() > 0 && RemoveCurrentConvergeTasks(op.getConvergeQueues(), logLabels, op.Logger) { - logEntry.Infof("ConvergeModules: kube config modification detected, restart current converge process (%s)", op.ConvergeState.Phase) + logEntry.Info("ConvergeModules: kube config modification detected, restart current converge process", + slog.String("phase", string(op.ConvergeState.Phase))) // put ApplyKubeConfig->NewConvergeModulesTask sequence in the beginning of the main queue if kubeConfigTask != nil { op.engine.TaskQueues.GetMain().AddFirst(kubeConfigTask) @@ -1193,7 +1215,7 @@ func (op *AddonOperator) StartModuleManagerEventHandler() { if kubeConfigTask != nil { op.engine.TaskQueues.GetMain().AddFirst(kubeConfigTask) } - logEntry.Infof("ConvergeModules: kube config modification detected, rerun all modules required") + logEntry.Info("ConvergeModules: kube config modification detected, rerun all modules required") op.engine.TaskQueues.GetMain().AddLast(convergeTask) } // ConvergeModules may be in progress, Reset converge state. @@ -1214,7 +1236,9 @@ func (op *AddonOperator) StartModuleManagerEventHandler() { for i := len(reloadTasks) - 1; i >= 0; i-- { op.engine.TaskQueues.GetMain().AddAfter(kubeConfigTask.GetId(), reloadTasks[i]) } - logEntry.Infof("ConvergeModules: kube config modification detected, append %d tasks to rerun modules %+v", len(reloadTasks), modulesToRerun) + logEntry.Info("ConvergeModules: kube config modification detected, append tasks to rerun modules", + slog.Int("count", len(reloadTasks)), + slog.Any("modules", modulesToRerun)) op.logTaskAdd(logEntry, "tail", reloadTasks...) } } @@ -1258,7 +1282,8 @@ func (op *AddonOperator) StartModuleManagerEventHandler() { op.engine.TaskQueues.GetMain().AddLast(newTask.WithQueuedAt(time.Now())) op.logTaskAdd(logEntry, fmt.Sprintf("detected %s, append", additionalDescription), newTask) } else { - eventLogEntry.With("task.flow", "noop").Infof("Detected %s, ModuleRun task already queued", additionalDescription) + eventLogEntry.With("task.flow", "noop").Info("Detected event, ModuleRun task already queued", + slog.String("description", additionalDescription)) } } } @@ -1436,7 +1461,7 @@ func (op *AddonOperator) HandleGlobalHookEnableKubernetesBindings(t sh_task.Task defer trace.StartRegion(context.Background(), "DiscoverHelmReleases").End() logEntry := utils.EnrichLoggerWithLabels(op.Logger, labels) - logEntry.Debugf("Global hook enable kubernetes bindings") + logEntry.Debug("Global hook enable kubernetes bindings") hm := task.HookMetadataAccessor(t) globalHook := op.ModuleManager.GetGlobalHook(hm.HookName) @@ -1501,7 +1526,9 @@ func (op *AddonOperator) HandleGlobalHookEnableKubernetesBindings(t sh_task.Task "queue": t.GetQueueName(), "activation": "OperatorStartup", }) - logEntry.Errorf("Global hook enable kubernetes bindings failed, requeue task to retry after delay. Failed count is %d. Error: %s", t.GetFailureCount()+1, err) + logEntry.Error("Global hook enable kubernetes bindings failed, requeue task to retry after delay.", + slog.Int("count", t.GetFailureCount()+1), + log.Err(err)) t.UpdateFailureMessage(err.Error()) t.WithQueuedAt(queuedAt) res.Status = queue.Fail @@ -1510,13 +1537,14 @@ func (op *AddonOperator) HandleGlobalHookEnableKubernetesBindings(t sh_task.Task // Substitute current task with Synchronization tasks for the main queue. // Other Synchronization tasks are queued into specified queues. // Informers can be started now — their events will be added to the queue tail. - logEntry.Debugf("Global hook enable kubernetes bindings success") + logEntry.Debug("Global hook enable kubernetes bindings success") // "Wait" tasks are queued first for _, tsk := range parallelSyncTasksToWait { q := op.engine.TaskQueues.GetByName(tsk.GetQueueName()) if q == nil { - log.Errorf("Queue %s is not created while run GlobalHookEnableKubernetesBindings task!", tsk.GetQueueName()) + log.Error("Queue is not created while run GlobalHookEnableKubernetesBindings task!", + slog.String("queue", tsk.GetQueueName())) } else { // Skip state creation if WaitForSynchronization is disabled. thm := task.HookMetadataAccessor(tsk) @@ -1529,7 +1557,8 @@ func (op *AddonOperator) HandleGlobalHookEnableKubernetesBindings(t sh_task.Task for _, tsk := range parallelSyncTasks { q := op.engine.TaskQueues.GetByName(tsk.GetQueueName()) if q == nil { - log.Errorf("Queue %s is not created while run GlobalHookEnableKubernetesBindings task!", tsk.GetQueueName()) + log.Error("Queue is not created while run GlobalHookEnableKubernetesBindings task!", + slog.String("queue", tsk.GetQueueName())) } else { q.AddLast(tsk) } @@ -1550,12 +1579,14 @@ func (op *AddonOperator) HandleDiscoverHelmReleases(t sh_task.Task, labels map[s defer trace.StartRegion(context.Background(), "DiscoverHelmReleases").End() logEntry := utils.EnrichLoggerWithLabels(op.Logger, labels) - logEntry.Debugf("Discover Helm releases state") + logEntry.Debug("Discover Helm releases state") state, err := op.ModuleManager.RefreshStateFromHelmReleases(t.GetLogLabels()) if err != nil { res.Status = queue.Fail - logEntry.Errorf("Discover helm releases failed, requeue task to retry after delay. Failed count is %d. Error: %s", t.GetFailureCount()+1, err) + logEntry.Error("Discover helm releases failed, requeue task to retry after delay.", + slog.Int("count", t.GetFailureCount()+1), + log.Err(err)) t.UpdateFailureMessage(err.Error()) t.WithQueuedAt(time.Now()) return @@ -1573,15 +1604,15 @@ func (op *AddonOperator) HandleModulePurge(t sh_task.Task, labels map[string]str defer trace.StartRegion(context.Background(), "ModulePurge").End() logEntry := utils.EnrichLoggerWithLabels(op.Logger, labels) - logEntry.Debugf("Module purge start") + logEntry.Debug("Module purge start") hm := task.HookMetadataAccessor(t) err := op.Helm.NewClient(op.Logger.Named("helm-client"), t.GetLogLabels()).DeleteRelease(hm.ModuleName) if err != nil { // Purge is for unknown modules, just print warning. - logEntry.Warnf("Module purge failed, no retry. Error: %s", err) + logEntry.Warn("Module purge failed, no retry.", log.Err(err)) } else { - logEntry.Debugf("Module purge success") + logEntry.Debug("Module purge success") } status = queue.Success @@ -1598,7 +1629,7 @@ func (op *AddonOperator) HandleModuleDelete(t sh_task.Task, labels map[string]st baseModule := op.ModuleManager.GetModule(hm.ModuleName) logEntry := utils.EnrichLoggerWithLabels(op.Logger, labels) - logEntry.Debugf("Module delete '%s'", hm.ModuleName) + logEntry.Debug("Module delete", slog.String("name", hm.ModuleName)) // Register module hooks to run afterHelmDelete hooks on startup. // It's a noop if registration is done before. @@ -1618,12 +1649,14 @@ func (op *AddonOperator) HandleModuleDelete(t sh_task.Task, labels map[string]st if err != nil { op.engine.MetricStorage.CounterAdd("{PREFIX}module_delete_errors_total", 1.0, map[string]string{"module": hm.ModuleName}) - logEntry.Errorf("Module delete failed, requeue task to retry after delay. Failed count is %d. Error: %s", t.GetFailureCount()+1, err) + logEntry.Error("Module delete failed, requeue task to retry after delay.", + slog.Int("count", t.GetFailureCount()+1), + log.Err(err)) t.UpdateFailureMessage(err.Error()) t.WithQueuedAt(time.Now()) status = queue.Fail } else { - logEntry.Debugf("Module delete success '%s'", hm.ModuleName) + logEntry.Debug("Module delete success", slog.String("name", hm.ModuleName)) status = queue.Success } @@ -1640,11 +1673,11 @@ func (op *AddonOperator) HandleModuleEnsureCRDs(t sh_task.Task, labels map[strin baseModule := op.ModuleManager.GetModule(hm.ModuleName) logEntry := utils.EnrichLoggerWithLabels(op.Logger, labels) - logEntry.Debugf("Module ensureCRDs '%s'", hm.ModuleName) + logEntry.Debug("Module ensureCRDs", slog.String("name", hm.ModuleName)) if appliedGVKs, err := op.EnsureCRDs(baseModule); err != nil { op.ModuleManager.UpdateModuleLastErrorAndNotify(baseModule, err) - logEntry.Errorf("ModuleEnsureCRDs failed. Error: %s", err) + logEntry.Error("ModuleEnsureCRDs failed.", log.Err(err)) t.UpdateFailureMessage(err.Error()) t.WithQueuedAt(time.Now()) res.Status = queue.Fail @@ -1666,7 +1699,8 @@ func (op *AddonOperator) HandleParallelModuleRun(t sh_task.Task, labels map[stri hm := task.HookMetadataAccessor(t) if hm.ParallelRunMetadata == nil { - logEntry.Errorf("Possible bug! Couldn't get task ParallelRunMetadata for a parallel task: %s", hm.EventDescription) + logEntry.Error("Possible bug! Couldn't get task ParallelRunMetadata for a parallel task.", + slog.String("description", hm.EventDescription)) res.Status = queue.Fail return res } @@ -1674,7 +1708,8 @@ func (op *AddonOperator) HandleParallelModuleRun(t sh_task.Task, labels map[stri i := 0 parallelChannel := make(chan parallelQueueEvent) op.parallelTaskChannels.Set(t.GetId(), parallelChannel) - logEntry.Debugf("ParallelModuleRun available parallel event channels %v", op.parallelTaskChannels.channels) + logEntry.Debug("ParallelModuleRun available parallel event channels", + slog.String("channels", fmt.Sprintf("%v", op.parallelTaskChannels.channels))) for moduleName, moduleMetadata := range hm.ParallelRunMetadata.GetModulesMetadata() { queueName := fmt.Sprintf(app.ParallelQueueNamePattern, i%(app.NumberOfParallelQueues-1)) newLogLabels := utils.MergeLabels(labels) @@ -1702,7 +1737,8 @@ L: for { select { case parallelEvent := <-parallelChannel: - logEntry.Debugf("ParallelModuleRun event '%v' received", parallelEvent) + logEntry.Debug("ParallelModuleRun event received", + slog.String("event", fmt.Sprintf("%v", parallelEvent))) if len(parallelEvent.errMsg) != 0 { if tasksErrors[parallelEvent.moduleName] != parallelEvent.errMsg { tasksErrors[parallelEvent.moduleName] = parallelEvent.errMsg @@ -1806,7 +1842,8 @@ func (op *AddonOperator) HandleModuleRun(t sh_task.Task, labels map[string]strin moduleRunErr = op.ModuleManager.RegisterModuleHooks(baseModule, labels) if moduleRunErr == nil { if hm.DoModuleStartup { - logEntry.Debugf("ModuleRun '%s' phase", baseModule.GetPhase()) + logEntry.Debug("ModuleRun phase", + slog.String("phase", string(baseModule.GetPhase()))) treg := trace.StartRegion(context.Background(), "ModuleRun-OnStartup") @@ -1826,7 +1863,7 @@ func (op *AddonOperator) HandleModuleRun(t sh_task.Task, labels map[string]strin } if baseModule.GetPhase() == modules.OnStartupDone { - logEntry.Debugf("ModuleRun '%s' phase", baseModule.GetPhase()) + logEntry.Debug("ModuleRun phase", slog.String("phase", string(baseModule.GetPhase()))) if baseModule.HasKubernetesHooks() { op.ModuleManager.SetModulePhaseAndNotify(baseModule, modules.QueueSynchronizationTasks) } else { @@ -1837,7 +1874,7 @@ func (op *AddonOperator) HandleModuleRun(t sh_task.Task, labels map[string]strin // Note: All hooks should be queued to fill snapshots before proceed to beforeHelm hooks. if baseModule.GetPhase() == modules.QueueSynchronizationTasks { - logEntry.Debugf("ModuleRun '%s' phase", baseModule.GetPhase()) + logEntry.Debug("ModuleRun phase", slog.String("phase", string(baseModule.GetPhase()))) // ModuleHookRun.Synchronization tasks for bindings with the "main" queue. mainSyncTasks := make([]sh_task.Task, 0) @@ -1910,7 +1947,8 @@ func (op *AddonOperator) HandleModuleRun(t sh_task.Task, labels map[string]strin for _, tsk := range parallelSyncTasksToWait { q := op.engine.TaskQueues.GetByName(tsk.GetQueueName()) if q == nil { - logEntry.Errorf("queue %s is not found while EnableKubernetesBindings task", tsk.GetQueueName()) + logEntry.Error("queue is not found while EnableKubernetesBindings task", + slog.String("queue", tsk.GetQueueName())) } else { thm := task.HookMetadataAccessor(tsk) q.AddLast(tsk) @@ -1923,7 +1961,8 @@ func (op *AddonOperator) HandleModuleRun(t sh_task.Task, labels map[string]strin for _, tsk := range parallelSyncTasks { q := op.engine.TaskQueues.GetByName(tsk.GetQueueName()) if q == nil { - logEntry.Errorf("queue %s is not found while EnableKubernetesBindings task", tsk.GetQueueName()) + logEntry.Error("queue is not found while EnableKubernetesBindings task", + slog.String("queue", tsk.GetQueueName())) } else { q.AddLast(tsk) } @@ -1939,7 +1978,7 @@ func (op *AddonOperator) HandleModuleRun(t sh_task.Task, labels map[string]strin op.ModuleManager.SetModulePhaseAndNotify(baseModule, modules.WaitForSynchronization) logEntry.With("module.state", "wait-for-synchronization"). - Debugf("ModuleRun wait for Synchronization") + Debug("ModuleRun wait for Synchronization") } // Put Synchronization tasks for kubernetes hooks before ModuleRun task. @@ -1961,10 +2000,14 @@ func (op *AddonOperator) HandleModuleRun(t sh_task.Task, labels map[string]strin } else { // Debug messages every fifth second: print Synchronization state. if time.Now().UnixNano()%5000000000 == 0 { - logEntry.Debugf("ModuleRun wait Synchronization state: moduleStartup:%v syncNeeded:%v syncQueued:%v syncDone:%v", hm.DoModuleStartup, baseModule.SynchronizationNeeded(), baseModule.Synchronization().HasQueued(), baseModule.Synchronization().IsCompleted()) + logEntry.Debug("ModuleRun wait Synchronization state", + slog.Bool("moduleStartup", hm.DoModuleStartup), + slog.Bool("syncNeeded", baseModule.SynchronizationNeeded()), + slog.Bool("syncQueued", baseModule.Synchronization().HasQueued()), + slog.Bool("syncDone", baseModule.Synchronization().IsCompleted())) baseModule.Synchronization().DebugDumpState(logEntry) } - logEntry.Debugf("Synchronization not completed, keep ModuleRun task in repeat mode") + logEntry.Debug("Synchronization not completed, keep ModuleRun task in repeat mode") t.WithQueuedAt(time.Now()) res.Status = queue.Repeat return @@ -1973,7 +2016,7 @@ func (op *AddonOperator) HandleModuleRun(t sh_task.Task, labels map[string]strin // Enable schedule events once at module start. if baseModule.GetPhase() == modules.EnableScheduleBindings { - logEntry.Debugf("ModuleRun '%s' phase", baseModule.GetPhase()) + logEntry.Debug("ModuleRun phase", slog.String("phase", string(baseModule.GetPhase()))) op.ModuleManager.EnableModuleScheduleBindings(hm.ModuleName) op.ModuleManager.SetModulePhaseAndNotify(baseModule, modules.CanRunHelm) @@ -1981,7 +2024,7 @@ func (op *AddonOperator) HandleModuleRun(t sh_task.Task, labels map[string]strin // Module start is done, module is ready to run hooks and helm chart. if baseModule.GetPhase() == modules.CanRunHelm { - logEntry.Debugf("ModuleRun '%s' phase", baseModule.GetPhase()) + logEntry.Debug("ModuleRun phase", slog.String("phase", string(baseModule.GetPhase()))) // run beforeHelm, helm, afterHelm valuesChanged, moduleRunErr = op.ModuleManager.RunModule(baseModule.Name, t.GetLogLabels()) } @@ -1989,14 +2032,17 @@ func (op *AddonOperator) HandleModuleRun(t sh_task.Task, labels map[string]strin op.ModuleManager.UpdateModuleLastErrorAndNotify(baseModule, moduleRunErr) if moduleRunErr != nil { res.Status = queue.Fail - logEntry.Errorf("ModuleRun failed in phase '%s'. Requeue task to retry after delay. Failed count is %d. Error: %s", baseModule.GetPhase(), t.GetFailureCount()+1, moduleRunErr) + logEntry.Error("ModuleRun failed. Requeue task to retry after delay.", + slog.String("phase", string(baseModule.GetPhase())), + slog.Int("count", t.GetFailureCount()+1), + log.Err(moduleRunErr)) op.engine.MetricStorage.CounterAdd("{PREFIX}module_run_errors_total", 1.0, map[string]string{"module": hm.ModuleName}) t.UpdateFailureMessage(moduleRunErr.Error()) t.WithQueuedAt(time.Now()) } else { res.Status = queue.Success if valuesChanged { - logEntry.Infof("ModuleRun success, values changed, restart module") + logEntry.Info("ModuleRun success, values changed, restart module") // One of afterHelm hooks changes values, run ModuleRun again: copy task, but disable startup hooks. hm.DoModuleStartup = false hm.EventDescription = "AfterHelm-Hooks-Change-Values" @@ -2010,7 +2056,7 @@ func (op *AddonOperator) HandleModuleRun(t sh_task.Task, labels map[string]strin res.AfterTasks = []sh_task.Task{newTask.WithQueuedAt(time.Now())} op.logTaskAdd(logEntry, "after", res.AfterTasks...) } else { - logEntry.Infof("ModuleRun success, module is ready") + logEntry.Info("ModuleRun success, module is ready") } } return @@ -2085,7 +2131,10 @@ func (op *AddonOperator) HandleModuleHookRun(t sh_task.Task, labels map[string]s } // Task 'tsk' will be combined, so remove it from the SynchronizationState. if thm.IsSynchronization() { - logEntry.Debugf("Synchronization task for %s/%s is combined, mark it as Done: id=%s", thm.HookName, thm.Binding, thm.KubernetesBindingId) + logEntry.Debug("Synchronization task is combined, mark it as Done", + slog.String("name", thm.HookName), + slog.String("binding", thm.Binding), + slog.String("id", thm.KubernetesBindingId)) baseModule.Synchronization().DoneForBinding(thm.KubernetesBindingId) } return false // do not stop combine process on this task @@ -2097,7 +2146,7 @@ func (op *AddonOperator) HandleModuleHookRun(t sh_task.Task, labels map[string]s if len(combineResult.MonitorIDs) > 0 { hm.MonitorIDs = append(hm.MonitorIDs, combineResult.MonitorIDs...) } - logEntry.Debugf("Got monitorIDs: %+v", hm.MonitorIDs) + logEntry.Debug("Got monitorIDs", slog.Any("monitorIDs", hm.MonitorIDs)) t.UpdateMetadata(hm) } } @@ -2119,12 +2168,14 @@ func (op *AddonOperator) HandleModuleHookRun(t sh_task.Task, labels map[string]s if err != nil { if hm.AllowFailure { allowed = 1.0 - logEntry.Infof("Module hook failed, but allowed to fail. Error: %v", err) + logEntry.Info("Module hook failed, but allowed to fail.", log.Err(err)) res.Status = queue.Success op.ModuleManager.UpdateModuleHookStatusAndNotify(baseModule, hm.HookName, nil) } else { errors = 1.0 - logEntry.Errorf("Module hook failed, requeue task to retry after delay. Failed count is %d. Error: %s", t.GetFailureCount()+1, err) + logEntry.Error("Module hook failed, requeue task to retry after delay.", + slog.Int("count", t.GetFailureCount()+1), + log.Err(err)) t.UpdateFailureMessage(err.Error()) t.WithQueuedAt(time.Now()) res.Status = queue.Fail @@ -2132,7 +2183,7 @@ func (op *AddonOperator) HandleModuleHookRun(t sh_task.Task, labels map[string]s } } else { success = 1.0 - logEntry.Debugf("Module hook success '%s'", hm.HookName) + logEntry.Debug("Module hook success", slog.String("name", hm.HookName)) res.Status = queue.Success op.ModuleManager.UpdateModuleHookStatusAndNotify(baseModule, hm.HookName, nil) @@ -2142,7 +2193,7 @@ func (op *AddonOperator) HandleModuleHookRun(t sh_task.Task, labels map[string]s switch hm.BindingType { case htypes.Schedule: if beforeChecksum != afterChecksum { - logEntry.Infof("Module hook changed values, will restart ModuleRun.") + logEntry.Info("Module hook changed values, will restart ModuleRun.") reloadModule = true eventDescription = "Schedule-Change-ModuleValues" } @@ -2150,9 +2201,9 @@ func (op *AddonOperator) HandleModuleHookRun(t sh_task.Task, labels map[string]s // Do not reload module on changes during Synchronization. if beforeChecksum != afterChecksum { if hm.IsSynchronization() { - logEntry.Infof("Module hook changed values, but restart ModuleRun is ignored for the Synchronization task.") + logEntry.Info("Module hook changed values, but restart ModuleRun is ignored for the Synchronization task.") } else { - logEntry.Infof("Module hook changed values, will restart ModuleRun.") + logEntry.Info("Module hook changed values, will restart ModuleRun.") reloadModule = true eventDescription = "Kubernetes-Change-ModuleValues" } @@ -2189,7 +2240,7 @@ func (op *AddonOperator) HandleModuleHookRun(t sh_task.Task, labels map[string]s op.engine.TaskQueues.GetMain().AddLast(newTask.WithQueuedAt(time.Now())) op.logTaskAdd(logEntry, "module values are changed, append", newTask) } else { - logEntry.With("task.flow", "noop").Infof("module values are changed, ModuleRun task already queued") + logEntry.With("task.flow", "noop").Info("module values are changed, ModuleRun task already queued") } } } @@ -2245,13 +2296,13 @@ func (op *AddonOperator) HandleGlobalHookRun(t sh_task.Task, labels map[string]s if isSynchronization { // Synchronization is not a part of v0 contract, skip hook execution. if taskHook.GetHookConfig().Version == "v0" { - logEntry.Infof("Execute on Synchronization ignored for v0 hooks") + logEntry.Info("Execute on Synchronization ignored for v0 hooks") shouldRunHook = false res.Status = queue.Success } // Check for "executeOnSynchronization: false". if !hm.ExecuteOnSynchronization { - logEntry.Infof("Execute on Synchronization disabled in hook config: ExecuteOnSynchronization=false") + logEntry.Info("Execute on Synchronization disabled in hook config: ExecuteOnSynchronization=false") shouldRunHook = false res.Status = queue.Success } @@ -2273,7 +2324,10 @@ func (op *AddonOperator) HandleGlobalHookRun(t sh_task.Task, labels map[string]s } // Task 'tsk' will be combined, so remove it from the GlobalSynchronizationState. if thm.IsSynchronization() { - logEntry.Debugf("Synchronization task for %s/%s is combined, mark it as Done: id=%s", thm.HookName, thm.Binding, thm.KubernetesBindingId) + logEntry.Debug("Synchronization task is combined, mark it as Done", + slog.String("name", thm.HookName), + slog.String("binding", thm.Binding), + slog.String("id", thm.KubernetesBindingId)) op.ModuleManager.GlobalSynchronizationState().DoneForBinding(thm.KubernetesBindingId) } return false // Combine tsk. @@ -2283,10 +2337,13 @@ func (op *AddonOperator) HandleGlobalHookRun(t sh_task.Task, labels map[string]s hm.BindingContext = combineResult.BindingContexts // Extra monitor IDs can be returned if several Synchronization binding contexts are combined. if len(combineResult.MonitorIDs) > 0 { - logEntry.Debugf("Task monitorID: %s, combined monitorIDs: %+v", hm.MonitorIDs, combineResult.MonitorIDs) + logEntry.Debug("Task monitorID. Combined monitorIDs.", + slog.Any("monitorIDs", hm.MonitorIDs), + slog.Any("combinedMonitorIDs", combineResult.MonitorIDs)) hm.MonitorIDs = combineResult.MonitorIDs } - logEntry.Debugf("Got monitorIDs: %+v", hm.MonitorIDs) + logEntry.Debug("Got monitorIDs", + slog.Any("monitorIDs", hm.MonitorIDs)) t.UpdateMetadata(hm) } } @@ -2295,7 +2352,7 @@ func (op *AddonOperator) HandleGlobalHookRun(t sh_task.Task, labels map[string]s // op.HelmResourcesManager.PauseMonitors() if shouldRunHook { - logEntry.Debugf("Global hook run") + logEntry.Debug("Global hook run") errors := 0.0 success := 0.0 @@ -2306,11 +2363,13 @@ func (op *AddonOperator) HandleGlobalHookRun(t sh_task.Task, labels map[string]s if err != nil { if hm.AllowFailure { allowed = 1.0 - logEntry.Infof("Global hook failed, but allowed to fail. Error: %v", err) + logEntry.Info("Global hook failed, but allowed to fail.", log.Err(err)) res.Status = queue.Success } else { errors = 1.0 - logEntry.Errorf("Global hook failed, requeue task to retry after delay. Failed count is %d. Error: %s", t.GetFailureCount()+1, err) + logEntry.Error("Global hook failed, requeue task to retry after delay.", + slog.Int("count", t.GetFailureCount()+1), + log.Err(err)) t.UpdateFailureMessage(err.Error()) t.WithQueuedAt(time.Now()) res.Status = queue.Fail @@ -2318,7 +2377,10 @@ func (op *AddonOperator) HandleGlobalHookRun(t sh_task.Task, labels map[string]s } else { // Calculate new checksum of *Enabled values. success = 1.0 - logEntry.Debugf("GlobalHookRun success, checksums: before=%s after=%s saved=%s", beforeChecksum, afterChecksum, hm.ValuesChecksum) + logEntry.Debug("GlobalHookRun success", + slog.String("beforeChecksum", beforeChecksum), + slog.String("afterChecksum", afterChecksum), + slog.String("savedChecksum", hm.ValuesChecksum)) res.Status = queue.Success reloadAll := false @@ -2326,28 +2388,28 @@ func (op *AddonOperator) HandleGlobalHookRun(t sh_task.Task, labels map[string]s switch hm.BindingType { case htypes.Schedule: if beforeChecksum != afterChecksum { - logEntry.Infof("Global hook changed values, will run ReloadAll.") + logEntry.Info("Global hook changed values, will run ReloadAll.") reloadAll = true eventDescription = "Schedule-Change-GlobalValues" } case htypes.OnKubernetesEvent: if beforeChecksum != afterChecksum { if hm.ReloadAllOnValuesChanges { - logEntry.Infof("Global hook changed values, will run ReloadAll.") + logEntry.Info("Global hook changed values, will run ReloadAll.") reloadAll = true eventDescription = "Kubernetes-Change-GlobalValues" } else { - logEntry.Infof("Global hook changed values, but ReloadAll ignored for the Synchronization task.") + logEntry.Info("Global hook changed values, but ReloadAll ignored for the Synchronization task.") } } case hookTypes.AfterAll: if !hm.LastAfterAllHook && afterChecksum != beforeChecksum { - logEntry.Infof("Global hook changed values, but ReloadAll ignored: more AfterAll hooks to execute.") + logEntry.Info("Global hook changed values, but ReloadAll ignored: more AfterAll hooks to execute.") } // values are changed when afterAll hooks are executed if hm.LastAfterAllHook && afterChecksum != hm.ValuesChecksum { - logEntry.Infof("Global values changed by AfterAll hooks, will run ReloadAll.") + logEntry.Info("Global values changed by AfterAll hooks, will run ReloadAll.") reloadAll = true eventDescription = "AfterAll-Hooks-Change-GlobalValues" } @@ -2357,7 +2419,7 @@ func (op *AddonOperator) HandleGlobalHookRun(t sh_task.Task, labels map[string]s // if helm3lib is in use - reinit helm action configuration to update helm capabilities (newly available apiVersions and resoruce kinds) if op.Helm.ClientType == helm.Helm3Lib { if err := helm3lib.ReinitActionConfig(op.Logger.Named("helm3-client")); err != nil { - logEntry.Errorf("Couldn't reinitialize helm3lib action configuration: %s", err) + logEntry.Error("Couldn't reinitialize helm3lib action configuration", log.Err(err)) t.UpdateFailureMessage(err.Error()) t.WithQueuedAt(time.Now()) res.Status = queue.Fail @@ -2400,7 +2462,7 @@ func (op *AddonOperator) HandleGlobalHookRun(t sh_task.Task, labels map[string]s if isSynchronization && res.Status == queue.Success { op.ModuleManager.GlobalSynchronizationState().DoneForBinding(hm.KubernetesBindingId) // Unlock Kubernetes events for all monitors when Synchronization task is done. - logEntry.Debugf("Synchronization done, unlock Kubernetes events") + logEntry.Debug("Synchronization done, unlock Kubernetes events") for _, monitorID := range hm.MonitorIDs { taskHook.GetHookController().UnlockKubernetesEventsFor(monitorID) } @@ -2445,7 +2507,8 @@ func (op *AddonOperator) CreateConvergeModulesTasks(state *module_manager.Module queuedAt := time.Now() // Add ModuleDelete tasks to delete helm releases of disabled modules. - log.Debugf("The following modules are going to be disabled: %v", state.ModulesToDisable) + log.Debug("The following modules are going to be disabled", + slog.Any("modules", state.ModulesToDisable)) for _, moduleName := range state.ModulesToDisable { ev := events.ModuleEvent{ ModuleName: moduleName, @@ -2468,7 +2531,8 @@ func (op *AddonOperator) CreateConvergeModulesTasks(state *module_manager.Module // Add ModuleRun tasks to install or reload enabled modules. newlyEnabled := utils.ListToMapStringStruct(state.ModulesToEnable) - log.Debugf("The following modules are going to be enabled/rerun: %v", state.AllEnabledModulesByOrder) + log.Debug("The following modules are going to be enabled/rerun", + slog.String("modules", fmt.Sprintf("%v", state.AllEnabledModulesByOrder))) for _, modules := range state.AllEnabledModulesByOrder { newLogLabels := utils.MergeLabels(logLabels) @@ -2550,7 +2614,8 @@ func (op *AddonOperator) CreateConvergeModulesTasks(state *module_manager.Module modulesTasks = append(modulesTasks, newTask.WithQueuedAt(queuedAt)) default: - log.Errorf("Invalid ModulesState %v", state) + log.Error("Invalid ModulesState", + slog.String("state", fmt.Sprintf("%v", state))) } } // as resultingTasks contains new ensureCRDsTasks we invalidate @@ -2614,7 +2679,8 @@ func (op *AddonOperator) CheckConvergeStatus(t sh_task.Task) { // Report modules left to process. if convergeTasks > 0 && (t.GetType() == task.ModuleRun || t.GetType() == task.ModuleDelete) { moduleTasks := ConvergeModulesInQueue(op.engine.TaskQueues.GetMain()) - log.Infof("Converge modules in progress: %d modules left to process in queue 'main'", moduleTasks) + log.Info("Converge modules in progress", + slog.Int("count", moduleTasks)) } } @@ -2632,7 +2698,7 @@ func (op *AddonOperator) UpdateFirstConvergeStatus(convergeTasks int) { case converge.FirstStarted: // Switch to 'done' state after first converge is started and when no 'converge' tasks left in the queue. if convergeTasks == 0 { - log.Infof("First converge is finished. Operator is ready now.") + log.Info("First converge is finished. Operator is ready now.") op.ConvergeState.SetFirstRunPhase(converge.FirstDone) } } diff --git c/pkg/addon-operator/operator_test.go i/pkg/addon-operator/operator_test.go index a1ffaa6..7ad550d 100644 --- c/pkg/addon-operator/operator_test.go +++ i/pkg/addon-operator/operator_test.go @@ -2,6 +2,7 @@ package addon_operator import ( "context" + "log/slog" "os" "path/filepath" "strings" @@ -513,7 +514,8 @@ func Test_HandleConvergeModules_global_changed(t *testing.T) { g.Eventually(convergeDone(op), "30s", "200ms").Should(BeTrue()) - log.Infof("Converge done, got %d tasks in history", len(taskHandleHistory)) + log.Info("Converge done, got tasks in history", + slog.Int("count", len(taskHandleHistory))) // Save current history length to ignore first converge tasks later. ignoreTasksCount := len(taskHandleHistory) @@ -534,7 +536,8 @@ func Test_HandleConvergeModules_global_changed(t *testing.T) { g.Expect(cmPatched.Data).Should(HaveKey("global")) g.Expect(cmPatched.Data["global"]).Should(Equal("param: newValue")) - log.Infof("ConfigMap patched, got %d tasks in history", len(taskHandleHistory)) + log.Info("ConfigMap patched, got tasks in history", + slog.Int("count", len(taskHandleHistory))) // Expect ConvergeModules appears in queue. g.Eventually(func() bool { diff --git c/pkg/addon-operator/queue.go i/pkg/addon-operator/queue.go index e365f94..4445113 100644 --- c/pkg/addon-operator/queue.go +++ i/pkg/addon-operator/queue.go @@ -1,6 +1,8 @@ package addon_operator import ( + "log/slog" + "github.com/deckhouse/deckhouse/pkg/log" "github.com/flant/addon-operator/pkg/addon-operator/converge" @@ -114,13 +116,21 @@ func RemoveCurrentConvergeTasks(convergeQueues []*queue.TaskQueue, logLabels map case task.ParallelModuleRun: if hm.ParallelRunMetadata == nil || hm.ParallelRunMetadata.CancelF == nil { - logEntry.Warnf("Couldn't get parallelRun metadata for the parallel task of type: %s, module: %s, description: %s, from queue %s", t.GetType(), hm.ModuleName, hm.EventDescription, queue.Name) + logEntry.Warn("Couldn't get parallelRun metadata for the parallel task", + slog.String("type", string(t.GetType())), + slog.String("module", hm.ModuleName), + slog.String("description", hm.EventDescription), + slog.String("queue", queue.Name)) } else { // cancel parallel task context hm.ParallelRunMetadata.CancelF() } } - logEntry.Debugf("Drained converge task of type: %s, module: %s, description: %s, from queue %s", t.GetType(), hm.ModuleName, hm.EventDescription, queue.Name) + logEntry.Debug("Drained converge task", + slog.String("type", string(t.GetType())), + slog.String("module", hm.ModuleName), + slog.String("description", hm.EventDescription), + slog.String("queue", queue.Name)) return false } return true @@ -164,7 +174,10 @@ func RemoveCurrentConvergeTasksFromId(q *queue.TaskQueue, afterId string, logLab stop = true } hm := task.HookMetadataAccessor(t) - logEntry.Debugf("Drained converge task of type: %s, module: %s, description: %s", t.GetType(), hm.ModuleName, hm.EventDescription) + logEntry.Debug("Drained converge task", + slog.String("type", string(t.GetType())), + slog.String("module", hm.ModuleName), + slog.String("description", hm.EventDescription)) return false } return true @@ -198,7 +211,9 @@ func RemoveAdjacentConvergeModules(q *queue.TaskQueue, afterId string, logLabels // Remove ConvergeModules after current. if t.GetType() == task.ConvergeModules { hm := task.HookMetadataAccessor(t) - logEntry.Debugf("Drained adjacent ConvergeModules task of type: %s, description: %s", t.GetType(), hm.EventDescription) + logEntry.Debug("Drained adjacent ConvergeModules task", + slog.String("type", string(t.GetType())), + slog.String("description", hm.EventDescription)) return false } diff --git c/pkg/helm/helm.go i/pkg/helm/helm.go index e08a9da..4c5a1e1 100644 --- c/pkg/helm/helm.go +++ i/pkg/helm/helm.go @@ -1,6 +1,7 @@ package helm import ( + "log/slog" "time" "github.com/deckhouse/deckhouse/pkg/log" @@ -51,7 +52,7 @@ func InitHelmClientFactory(helmopts *Options, extraLabels map[string]string) (*C }, helmopts.Logger, extraLabels) case Helm3: - log.Infof("Helm 3 detected (path is '%s')", helm3.Helm3Path) + log.Info("Helm 3 detected", slog.String("path", helm3.Helm3Path)) // Use helm3 client. factory.ClientType = Helm3 factory.NewClientFn = helm3.NewClient diff --git c/pkg/helm/helm3/helm3.go i/pkg/helm/helm3/helm3.go index 8eb59b8..8ec1a0a 100644 --- c/pkg/helm/helm3/helm3.go +++ i/pkg/helm/helm3/helm3.go @@ -3,6 +3,7 @@ package helm3 import ( "bytes" "fmt" + "log/slog" "os/exec" "sort" "strings" @@ -87,7 +88,7 @@ func (h *Helm3Client) initAndVersion() error { } stdout = strings.Join([]string{stdout, stderr}, "\n") stdout = strings.ReplaceAll(stdout, "\n", " ") - log.Infof("Helm 3 version: %s", stdout) + log.Info("Helm 3 version", slog.String("version", stdout)) return nil } @@ -161,12 +162,20 @@ func (h *Helm3Client) UpgradeRelease(releaseName string, chart string, valuesPat args = append(args, setValue) } - h.Logger.Infof("Running helm upgrade for release '%s' with chart '%s' in namespace '%s' ...", releaseName, chart, namespace) + h.Logger.Info("Running helm upgrade for release ...", + slog.String("release", releaseName), + slog.String("chart", chart), + slog.String("namespace", namespace)) stdout, stderr, err := h.cmd(args...) if err != nil { return fmt.Errorf("helm upgrade failed: %s:\n%s %s", err, stdout, stderr) } - h.Logger.Infof("Helm upgrade for release '%s' with chart '%s' in namespace '%s' successful:\n%s\n%s", releaseName, chart, namespace, stdout, stderr) + h.Logger.Info("Helm upgrade for release successful", + slog.String("release", releaseName), + slog.String("chart", chart), + slog.String("namespace", namespace), + slog.String("stdout", stdout), + slog.String("stderr", stderr)) return nil } @@ -191,7 +200,7 @@ func (h *Helm3Client) GetReleaseValues(releaseName string) (utils.Values, error) } func (h *Helm3Client) DeleteRelease(releaseName string) (err error) { - h.Logger.Debugf("helm release '%s': execute helm uninstall", releaseName) + h.Logger.Debug("helm release: execute helm uninstall", slog.String("release", releaseName)) args := []string{ "uninstall", releaseName, @@ -202,7 +211,7 @@ func (h *Helm3Client) DeleteRelease(releaseName string) (err error) { return fmt.Errorf("helm uninstall %s invocation error: %v\n%v %v", releaseName, err, stdout, stderr) } - h.Logger.Debugf("helm release %s deleted", releaseName) + h.Logger.Debug("helm release deleted", slog.String("release", releaseName)) return } @@ -230,9 +239,9 @@ func (h *Helm3Client) ListReleasesNames() ([]string, error) { return nil, fmt.Errorf("helm list failed: %v\n%s %s", err, stdout, stderr) } - list := []struct { + var list []struct { Name string `json:"name"` - }{} + } if err := k8syaml.Unmarshal([]byte(stdout), &list); err != nil { return nil, fmt.Errorf("helm list returned invalid json: %v", err) @@ -278,12 +287,15 @@ func (h *Helm3Client) Render(releaseName string, chart string, valuesPaths []str args = append(args, setValue) } - h.Logger.Debugf("Render helm templates for chart '%s' in namespace '%s' ...", chart, namespace) + h.Logger.Debug("Render helm templates for chart ...", + slog.String("chart", chart), + slog.String("namespace", namespace)) stdout, stderr, err := h.cmd(args...) if err != nil { return "", fmt.Errorf("helm upgrade failed: %s:\n%s %s", err, stdout, stderr) } - h.Logger.Infof("Render helm templates for chart '%s' was successful", chart) + h.Logger.Info("Render helm templates for chart was successful", + slog.String("chart", chart)) return stdout, nil } diff --git c/pkg/helm/helm3lib/helm3lib.go i/pkg/helm/helm3lib/helm3lib.go index bacfe50..c01115f 100644 --- c/pkg/helm/helm3lib/helm3lib.go +++ i/pkg/helm/helm3lib/helm3lib.go @@ -2,7 +2,9 @@ package helm3lib import ( "context" + "errors" "fmt" + "log/slog" "os" "sort" "strconv" @@ -18,7 +20,7 @@ import ( "helm.sh/helm/v3/pkg/releaseutil" "helm.sh/helm/v3/pkg/storage" "helm.sh/helm/v3/pkg/storage/driver" - "k8s.io/apimachinery/pkg/api/errors" + apierrors "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/cli-runtime/pkg/genericclioptions" "k8s.io/client-go/rest" @@ -125,7 +127,7 @@ func (h *LibClient) actionConfigInit() error { // If env is empty - default storage backend ('secrets') will be used helmDriver := os.Getenv("HELM_DRIVER") - err := ac.Init(getter, options.Namespace, helmDriver, h.Logger.Debugf) + err := ac.Init(getter, options.Namespace, helmDriver, h.Logger.Debug) if err != nil { return fmt.Errorf("init helm action config: %v", err) } @@ -141,7 +143,7 @@ func (h *LibClient) initAndVersion() error { return err } - log.Infof("Helm 3 version: %s", chartutil.DefaultCapabilities.HelmVersion.Version) + log.Info("Helm 3 version", slog.String("version", chartutil.DefaultCapabilities.HelmVersion.Version)) return nil } @@ -154,7 +156,7 @@ func (h *LibClient) LastReleaseStatus(releaseName string) (revision string, stat // return nil, errors.Errorf("no revision for release %q", name) // } // that's why we also check string representation - if err == driver.ErrReleaseNotFound || strings.HasPrefix(err.Error(), "no revision for release") { + if errors.Is(err, driver.ErrReleaseNotFound) || strings.HasPrefix(err.Error(), "no revision for release") { return "0", "", fmt.Errorf("release '%s' not found\n", releaseName) } return "", "", err @@ -173,7 +175,7 @@ func (h *LibClient) UpgradeRelease(releaseName string, chartName string, valuesP } return h.upgradeRelease(releaseName, chartName, valuesPaths, setValues, namespace) } - h.Logger.Debugf("helm release %s upgraded", releaseName) + h.Logger.Debug("helm release upgraded", slog.String("version", releaseName)) return nil } @@ -217,12 +219,15 @@ func (h *LibClient) upgradeRelease(releaseName string, chartName string, valuesP resultValues = chartutil.CoalesceTables(resultValues, m) } - h.Logger.Infof("Running helm upgrade for release '%s' with chart '%s' in namespace '%s' ...", releaseName, chartName, namespace) + h.Logger.Info("Running helm upgrade for release", + slog.String("release", releaseName), + slog.String("chart", chartName), + slog.String("namespace", namespace)) histClient := action.NewHistory(actionConfig) // Max is not working!!! Sort the final of releases by your own // histClient.Max = 1 releases, err := histClient.Run(releaseName) - if err == driver.ErrReleaseNotFound { + if errors.Is(err, driver.ErrReleaseNotFound) { instClient := action.NewInstall(actionConfig) if namespace != "" { instClient.Namespace = namespace @@ -237,7 +242,7 @@ func (h *LibClient) upgradeRelease(releaseName string, chartName string, valuesP _, err = instClient.Run(chart, resultValues) return err } - h.Logger.Debugf("%d old releases found", len(releases)) + h.Logger.Debug("old releases found", slog.Int("count", len(releases))) if len(releases) > 0 { // https://github.com/fluxcd/helm-controller/issues/149 // looking through this issue you can find the common error: another operation (install/upgrade/rollback) is in progress @@ -245,7 +250,10 @@ func (h *LibClient) upgradeRelease(releaseName string, chartName string, valuesP releaseutil.Reverse(releases, releaseutil.SortByRevision) latestRelease := releases[0] nsReleaseName := fmt.Sprintf("%s/%s", latestRelease.Namespace, latestRelease.Name) - h.Logger.Debugf("Latest release '%s': revision: %d has status: %s", nsReleaseName, latestRelease.Version, latestRelease.Info.Status) + h.Logger.Debug("Latest release info", + slog.String("release", nsReleaseName), + slog.Int("version", latestRelease.Version), + slog.String("status", string(latestRelease.Info.Status))) if latestRelease.Info.Status.IsPending() { objectName := fmt.Sprintf("%s.%s.v%d", storage.HelmStorageType, latestRelease.Name, latestRelease.Version) kubeClient, err := actionConfig.KubernetesClientSet() @@ -255,24 +263,36 @@ func (h *LibClient) upgradeRelease(releaseName string, chartName string, valuesP // switch between storage types (memory, sql, secrets, configmaps) - with secrets and configmaps we can deal a bit more straightforward than doing a rollback switch actionConfig.Releases.Name() { case driver.ConfigMapsDriverName: - h.Logger.Debugf("ConfigMap for helm revision %d of release %s in status %s, driver %s: will be deleted", latestRelease.Version, nsReleaseName, latestRelease.Info.Status, driver.ConfigMapsDriverName) + h.Logger.Debug("ConfigMap for helm", + slog.Int("version", latestRelease.Version), + slog.String("release", nsReleaseName), + slog.String("status", string(latestRelease.Info.Status)), + slog.String("driver", driver.ConfigMapsDriverName)) err := kubeClient.CoreV1().ConfigMaps(latestRelease.Namespace).Delete(context.TODO(), objectName, metav1.DeleteOptions{}) - if err != nil && !errors.IsNotFound(err) { + if err != nil && !apierrors.IsNotFound(err) { return fmt.Errorf("couldn't delete configmap %s of release %s: %w", objectName, nsReleaseName, err) } - h.Logger.Debugf("ConfigMap %s was deleted", objectName) + h.Logger.Debug("ConfigMap was deleted", slog.String("name", objectName)) case driver.SecretsDriverName: - h.Logger.Debugf("Secret for helm revision %d of release %s in status %s, driver %s: will be deleted", latestRelease.Version, nsReleaseName, latestRelease.Info.Status, driver.SecretsDriverName) + h.Logger.Debug("Secret for helm will be deleted", + slog.Int("version", latestRelease.Version), + slog.String("release", nsReleaseName), + slog.String("status", string(latestRelease.Info.Status)), + slog.String("driver", driver.ConfigMapsDriverName)) err := kubeClient.CoreV1().Secrets(latestRelease.Namespace).Delete(context.TODO(), objectName, metav1.DeleteOptions{}) - if err != nil && !errors.IsNotFound(err) { + if err != nil && !apierrors.IsNotFound(err) { return fmt.Errorf("couldn't delete secret %s of release %s: %w", objectName, nsReleaseName, err) } - h.Logger.Debugf("Secret %s was deleted", objectName) + h.Logger.Debug("Secret was deleted", slog.String("name", objectName)) default: // memory and sql storages a bit more trickier - doing a rollback is justified - h.Logger.Debugf("Helm revision %d of release %s in status %s, driver %s: will be rolledback", latestRelease.Version, nsReleaseName, latestRelease.Info.Status, actionConfig.Releases.Name()) + h.Logger.Debug("Helm will be rollback", + slog.Int("version", latestRelease.Version), + slog.String("release", nsReleaseName), + slog.String("status", string(latestRelease.Info.Status)), + slog.String("driver", driver.ConfigMapsDriverName)) h.rollbackLatestRelease(releases) } } @@ -282,7 +302,10 @@ func (h *LibClient) upgradeRelease(releaseName string, chartName string, valuesP if err != nil { return fmt.Errorf("helm upgrade failed: %s\n", err) } - h.Logger.Infof("Helm upgrade for release '%s' with chart '%s' in namespace '%s' successful", releaseName, chartName, namespace) + h.Logger.Info("Helm upgrade successful", + slog.String("release", releaseName), + slog.String("chart", chartName), + slog.String("namespace", namespace)) return nil } @@ -291,14 +314,16 @@ func (h *LibClient) rollbackLatestRelease(releases []*release.Release) { latestRelease := releases[0] nsReleaseName := fmt.Sprintf("%s/%s", latestRelease.Namespace, latestRelease.Name) - h.Logger.Infof("Trying to rollback '%s'", nsReleaseName) + h.Logger.Info("Trying to rollback", slog.String("release", nsReleaseName)) if latestRelease.Version == 1 || options.HistoryMax == 1 || len(releases) == 1 { rb := action.NewUninstall(actionConfig) rb.KeepHistory = false _, err := rb.Run(latestRelease.Name) if err != nil { - h.Logger.Warnf("Failed to uninstall pending release %s: %s", nsReleaseName, err) + h.Logger.Warn("Failed to uninstall pending release", + slog.String("release", nsReleaseName), + lo…
Sometimes helm releases are not installed because of this error:
In this case the helm release is stuck in pending status.
We have not found any corresponding log entry of the actual installation. Is this some concurrency bug?
The text was updated successfully, but these errors were encountered: