Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For EKS, CAPI truncates the version field in AWSManagedControlPlane spec after apply #3594

Closed
yogeek opened this issue Jul 13, 2022 · 14 comments · Fixed by #3682
Closed

For EKS, CAPI truncates the version field in AWSManagedControlPlane spec after apply #3594

yogeek opened this issue Jul 13, 2022 · 14 comments · Fixed by #3682
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@yogeek
Copy link
Contributor

yogeek commented Jul 13, 2022

/kind bug

What steps did you take and what happened:

When setting k8s version for nodegroups in CAPI YAML manifest (AWSManagedControlPlane.spec.template.spec.version), EKS only allows MAJOR.MINOR (e.g. 1.22) but ClusterAPI requires a "semantic version" MAJOR.MINOR.PATCH (e.g. 1.22.10)

  • when setting 1.22 in the manifest, clusterAPI error is : invalid: spec.template.spec.version: Invalid value: "v1.22": must be a valid semantic version
  • when setting 1.22.10, clusterAPI is ok but it modifies the final resources in K8S by truncating to 1.22 which causes a diff in gitops tool like ArgoCD (and if we ignore this diff, we will not be able to detect a version change later)

What did you expect to happen:

I expected ClusterAPI to not change a resource after it has been applied and to let version the same as declared in the initial manifest (in GIT).

Anything else you would like to add:
This issue has already been discussed orally with @richardcase

Environment:

  • Cluster-api-provider-aws version: 1.4.1
  • Kubernetes version: 1.22.9
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority labels Jul 13, 2022
@k8s-ci-robot
Copy link
Contributor

@yogeek: This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 13, 2022
@sedefsavas
Copy link
Contributor

MachinePool version looks like having a validation check for updates as well, so should not be changeable to v1.22 looking at this:
https://github.com/kubernetes-sigs/cluster-api/blob/ef7172a853344d3b047fb0ace3f349079f3eef1f/exp/api/v1beta1/machinepool_webhook.go#L149

Could you confirm if MachinePool.spec.template.spec.version is set to v1.22?

@yogeek
Copy link
Contributor Author

yogeek commented Jul 19, 2022

Hi @sedefsavas thanks for your help
Sorry, I mixed up resources and indeed, it is not MachinePool version which was truncated but AWSManagedControlPlane version
I am going to update the issue title accordingly

Here is the GIT manifest :

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: AWSManagedControlPlane
metadata:
  name: capi-eks-mycluster-control-plane
  namespace: mycluster-capi
spec:
  [...]
  version: v1.22.0

and the result after cluster api created the cluster /

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: AWSManagedControlPlane
metadata:
  creationTimestamp: '2022-07-18T10:18:10Z'
name: capi-mycluster-kubeflow-control-plane
  namespace: mycluster-capi
  ownerReferences:
    - apiVersion: cluster.x-k8s.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Cluster
      name: capi-eks-mycluster
[...]
spec:
  version: v1.22

@yogeek yogeek changed the title For EKS, CAPI truncates the version field in MachinePool spec after apply For EKS, CAPI truncates the version field in AWSManagedControlPlane spec after apply Jul 19, 2022
@richardcase
Copy link
Member

I'm going to re-create this so:

/assign
/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jul 20, 2022
@richardcase
Copy link
Member

If a version number is specified on the AWSManagedControlPlane then we "normalize" this here. In this instance "normalize" means converting the version number to MAJOR.MINOR format and storing the result back in the spec. So as @yogeek has seen if you specify v1.22.10 in the awsmcp spec it will get re-written to 1.22.

We only need to do this normalizing of the version number at the point of creating resources in EKS (i.e. EKS cluster). My suggestion is that we do the following:

  • Remove this defaulting behaviour from the awsmcp webhook
  • Check where the awsmcp.Spec.Version is used and ensure that its converted at time of use. Like we do when we create the EKS cluster in AWS

@sedefsavas, @Skarlso, @pydctw , @Ankitasw - wdyt?

@Skarlso
Copy link
Contributor

Skarlso commented Aug 19, 2022

@richardcase I just noticed this. :OOOO This was back in July 😁. I shell read this and answer.

@Skarlso
Copy link
Contributor

Skarlso commented Aug 19, 2022

Yeah, I agree with that fix. I can take that on. I'm sure that there was a good reason for that normalization?

@richardcase
Copy link
Member

/assign Skarlso

@Skarlso
Copy link
Contributor

Skarlso commented Aug 19, 2022

So, the validation also has to be updated, right? Otherwise, you can't set any value other than MAJOR.MINOR. Which might be okay, because we would do that anyways... I can't pass in a different value during validation because that's an internal process of the webhook service using the regex defined on the field:

// +kubebuilder:validation:Pattern:=^v(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)\.?$

So, what should we do about that?

@Skarlso
Copy link
Contributor

Skarlso commented Aug 19, 2022

This is this part in the Handler:

	// Default the object
	obj.Default()
	marshalled, err := json.Marshal(obj)
	if err != nil {
		return Errored(http.StatusInternalServerError, err)
	}

	// Create the patch
	return PatchResponseFromRaw(req.Object.Raw, marshalled)

It fails during marhasling. So I can't really influence that unless I overwrite the spec.Version which we don't want.

So we will have to actually require our restrictions. :D

@Skarlso
Copy link
Contributor

Skarlso commented Aug 19, 2022

It appears, just removing the normalise function, works.... gonna run some tests. At least there are no diffs.... So the managed controlplane will now require the version to be passed in correctly.

@Skarlso
Copy link
Contributor

Skarlso commented Aug 19, 2022

Yep, cluster appears to be working:

cluster clusterctl describe cluster managed-cluster
NAME                                                                   READY  SEVERITY  REASON                                SINCE  MESSAGE
Cluster/managed-cluster                                                True                                                   1s
├─ControlPlane - AWSManagedControlPlane/managed-cluster-control-plane  True                                                   1s
└─Workers
  └─MachineDeployment/managed-cluster-md-0                             False  Warning   WaitingForAvailableMachines           11m    Minimum availability requires 1 replicas, current 0 available
    └─Machine/managed-cluster-md-0-59f6c7774c-nzqz8                    False  Info      WaitingForBootstrapData               14s    0 of 2 completed
      └─BootstrapConfig - EKSConfig/managed-cluster-md-0-fgtzl         False  Info      WaitingForControlPlaneInitialization  14s    0 of 1 completed

With following config:

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: "managed-cluster"
spec:
  infrastructureRef:
    kind: AWSManagedControlPlane
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    name: "managed-cluster-control-plane"
  controlPlaneRef:
    kind: AWSManagedControlPlane
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    name: "managed-cluster-control-plane"
---
kind: AWSManagedControlPlane
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
metadata:
  name: "managed-cluster-control-plane"
spec:
  region: "eu-central-1"
  sshKeyName: "capa-key"
  version: "v1.22"
  addons:
    - name: "vpc-cni"
      version: "v1.11.0-eksbuild.1"
      conflictResolution: "overwrite"
    - name: "coredns"
      version: "v1.8.7-eksbuild.1"
    - name: "kube-proxy"
      version: "v1.22.6-eksbuild.1"

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: "managed-cluster-md-0"
spec:
  clusterName: "managed-cluster"
  replicas: 1
  selector:
    matchLabels:
  template:
    spec:
      clusterName: "managed-cluster"
      version: "v1.22.0"
      bootstrap:
        configRef:
          name: "managed-cluster-md-0"
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: EKSConfigTemplate
      infrastructureRef:
        name: "managed-cluster-md-0"
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AWSMachineTemplate
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
metadata:
  name: "managed-cluster-md-0"
spec:
  template:
    spec:
      instanceType: "t3.small"
      iamInstanceProfile: "nodes.cluster-api-provider-aws.sigs.k8s.io"
      sshKeyName: "capa-key"
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: EKSConfigTemplate
metadata:
  name: "managed-cluster-md-0"
spec:
  template: {}

Will create a PR later on.

@Skarlso
Copy link
Contributor

Skarlso commented Aug 20, 2022

Another option is to relax the regex and edit the version after it has been provided. I think that's what you suggested Richard but I might not have understood it.

So, changing the regex in the cluster config for the version to be different so existing configs don't break. And then we normalise the version internally whenever it's needed.

@Skarlso
Copy link
Contributor

Skarlso commented Aug 20, 2022

I'm going to relax the version regex to accommodate 4 scenarios:

  • 1.22
  • v1.22
  • 1.22.0
  • v1.22.0

These will make sure no existing configs will break, and internally, we normalise it to the needed format anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
5 participants