Panic Error: Invalid Memory Address or Nil Pointer Dereference during Taint Deletion in AWSManagedMachinePool #5021

nueavv · 2024-06-13T07:58:21Z

/kind bug

What steps did you take and what happened:
While managing an EKS cluster using AWSManagedMachinePool, a panic error occurs when trying to delete a taint after adding it. Below are the related log messages.

Initial Error Log

E0613 05:58:09.596065 1 controller.go:329] "Reconciler error" err="panic: runtime error: invalid memory address or nil pointer dereference [recovered]" controller="awsmanagedmachinepool" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSManagedMachinePool" AWSManagedMachinePool="capi-managed-cluster/test-cluster" namespace="capi-managed-cluster" name="test-cluster" reconcileID="12345678-1234-1234-1234-123456789012"
I0613 05:58:09.917851 1 awsmanagedmachinepool_controller.go:200] "Reconciling AWSManagedMachinePool"
I0613 05:58:09.918011 1 launchtemplate.go:73] "checking for existing launch template"
E0613 05:58:10.373797 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 464 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1234560?, 0x1234560})
/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/runtime/runtime.go:75 +0x85
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:108 +0xb2
panic({0x1234560?, 0x1234560})
/usr/local/go/src/runtime/panic.go:914 +0x21f
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/converters.TaintsFromSDK({0xc001604d60, 0x1, 0x25})
/workspace/pkg/cloud/converters/eks.go:115 +0x258
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/eks.(*NodegroupService).createTaintsUpdate(0xc002522d20, {0x0, 0x0, 0x0}, 0xc00075c5a0)
/workspace/pkg/cloud/services/eks/nodegroup.go:420 +0x21d
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/eks.(*NodegroupService).reconcileNodegroupConfig(0xc002522d20, 0xc00075c5a0)
/workspace/pkg/cloud/services/eks/nodegroup.go:469 +0x36e
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/eks.(*NodegroupService).reconcileNodegroup(0xc002522d20, {0x1234560, 0xc002528570})
/workspace/pkg/cloud/services/eks/nodegroup.go:571 +0x72c
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/eks.(*NodegroupService).ReconcilePool(0xc002522d20, {0x1234560, 0xc002528570})
/workspace/pkg/cloud/services/eks/eks.go:109 +0x11a
sigs.k8s.io/cluster-api-provider-aws/v2/exp/controllers.(*AWSManagedMachinePoolReconciler).reconcileNormal(0xc001541980, {0x1234560, 0xc002528570}, 0xc0010baf00, {0x1234560, 0xc002138100})
/workspace/exp/controllers/awsmanagedmachinepool_controller.go:239 +0x4da
sigs.k8s.io/cluster-api-provider-aws/v2/exp/controllers.(*AWSManagedMachinePoolReconciler).Reconcile(0xc001541980, {0x1234560, 0xc002528570}, {{{0xc000913320?, 0x0?}, {0xc000913380?, 0xc000d75d08?}}})
/workspace/exp/controllers/awsmanagedmachinepool_controller.go:192 +0x8b7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1234560?, {0x1234560?, 0xc002528570?}, {{{0xc000913320?, 0xb?}, {0xc000913380?, 0x0?}}})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0014f4500, {0x1234560, 0xc0014264b0}, {0x1234560?, 0xc002360ea0?})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:316 +0x3cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0014f4500, {0x1234560, 0xc0014264b0})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266 +0x1af
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg.internal.controller.(*Controller).Start.func2 in goroutine 321
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:223 +0x565

Updated Error Log

E0613 07:06:08.244153 1 controller.go:329] "Reconciler error" err=<
failed to reconcile machine pool for AWSManagedMachinePool capi-managed-cluster/test-cluster: failed to reconcile nodegroup config: creating taints update payload: converting taints: taint has nil fields: {
Effect: "PREFER_NO_SCHEDULE",
Key: "test"
}

What did you expect to happen:
Taints should be deleted without causing a panic error.

Anything else you would like to add:

The panic error occurs when trying to delete a taint after adding it.
The error message indicates that some fields of the taint are nil, but the provided taints are complete.

Current TaintsFromSDK function implementation:

import (
	"fmt"
	"log"
)

// TaintsFromSDK is used to convert an array of AWS SDK taints to CAPA Taints.
func TaintsFromSDK(taints []*eks.Taint) (expinfrav1.Taints, error) {
	converted := expinfrav1.Taints{}

	// If taints array is empty, log the event
	if len(taints) == 0 {
		log.Println("Taints array is empty")
		return converted, nil
	}

	for _, taint := range taints {
		if taint.Effect == nil || taint.Key == nil {
			return nil, fmt.Errorf("taint has nil fields: %+v", taint)
		}

		// Handle nil Value field by setting it to an empty string
		value := ""
		if taint.Value != nil {
			value = *taint.Value
		}

		convertedEffect, err := TaintEffectFromSDK(*taint.Effect)
		if err != nil {
			return nil, fmt.Errorf("converting taint effect %s: %w", *taint.Effect, err)
		}
		converted = append(converted, expinfrav1.Taint{
			Effect: convertedEffect,
			Key:    *taint.Key,
			Value:  value,
		})
	}

	return converted, nil
}

Suggestions:

Resolve the panic error that occurs during taint deletion.
When the taint array is empty, ensure that existing taints are deleted.
Review the logic handling empty taint arrays and nil fields.

Environment:

Cluster-api-provider-aws version: [version] 2.5.0
Kubernetes version: (use kubectl version): [version] 1.25.5 on local kubernetes
OS (e.g. from /etc/os-release): [version] ubuntu 20.04

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-06-13T07:58:30Z

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

nueavv · 2024-06-13T08:39:46Z

The issue appears to be caused by the value field in the taint having an empty string. The taint configuration is as follows:

This empty string in the value field might be leading to the observed behavior.

This commit addresses a panic error that occurs when attempting to delete a taint in AWSManagedMachinePool. The issue was caused by a nil pointer dereference when the taint's value was nil. By handling nil values for taint fields, this fix ensures that taints with nil values are correctly processed, preventing runtime errors. Fixes [kubernetes-sigs#5021](kubernetes-sigs#5021) Signed-off-by: nueavv <nuguni@kakao.com>

k8s-triage-robot · 2024-09-11T09:39:31Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

This commit addresses a panic error that occurs when attempting to delete a taint in AWSManagedMachinePool. The issue was caused by a nil pointer dereference when the taint's value was nil. By handling nil values for taint fields, this fix ensures that taints with nil values are correctly processed, preventing runtime errors. Fixes [kubernetes-sigs#5021](kubernetes-sigs#5021) Signed-off-by: nueavv <nuguni@kakao.com>

k8s-triage-robot · 2024-10-11T10:22:22Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority labels Jun 13, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 13, 2024

nueavv linked a pull request Jun 13, 2024 that will close this issue

🐛 fix(taint deletion): allow taint changes and prevent panic by handling nil values #5022

Open

5 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 11, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic Error: Invalid Memory Address or Nil Pointer Dereference during Taint Deletion in AWSManagedMachinePool #5021

Panic Error: Invalid Memory Address or Nil Pointer Dereference during Taint Deletion in AWSManagedMachinePool #5021

nueavv commented Jun 13, 2024 •

edited

Loading

k8s-ci-robot commented Jun 13, 2024

nueavv commented Jun 13, 2024

k8s-triage-robot commented Sep 11, 2024

k8s-triage-robot commented Oct 11, 2024

Panic Error: Invalid Memory Address or Nil Pointer Dereference during Taint Deletion in AWSManagedMachinePool #5021

Panic Error: Invalid Memory Address or Nil Pointer Dereference during Taint Deletion in AWSManagedMachinePool #5021

Comments

nueavv commented Jun 13, 2024 • edited Loading

Initial Error Log

Updated Error Log

Suggestions:

k8s-ci-robot commented Jun 13, 2024

nueavv commented Jun 13, 2024

k8s-triage-robot commented Sep 11, 2024

k8s-triage-robot commented Oct 11, 2024

nueavv commented Jun 13, 2024 •

edited

Loading