Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initiate update on machine template ref change #813

Merged
merged 2 commits into from
Nov 26, 2024

Conversation

makhov
Copy link
Contributor

@makhov makhov commented Nov 14, 2024

Fixes #808

Signed-off-by: Alexey Makhov <amakhov@mirantis.com>
@makhov makhov marked this pull request as ready for review November 14, 2024 12:00
@makhov makhov requested a review from a team as a code owner November 14, 2024 12:00
Copy link
Contributor

@apedriza apedriza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good but I think we have some issue in our machines sync algo. Following the new e2e test added in the PR, if a second update is applied, cluster update gets stuck in this state:
Screenshot 2024-11-18 at 11 56 17

I applied the following update:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerMachineTemplate
metadata:
 name: docker-test-cp-template-new2
 namespace: default
spec:
 template:
   spec:
     customImage: kindest/node:v1.31.1
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: K0sControlPlane
metadata:
 name: docker-test
spec:
 replicas: 3
 version: v1.31.2+k0s.0
 updateStrategy: Recreate
 k0sConfigSpec:
   k0s:
     apiVersion: k0s.k0sproject.io/v1beta1
     kind: ClusterConfig
     metadata:
       name: k0s
     spec:
       api:
         extraArgs:
           anonymous-auth: "true"
       telemetry:
         enabled: false
 machineTemplate:
   infrastructureRef:
     apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
     kind: DockerMachineTemplate
     name: docker-test-cp-template-new2
     namespace: default

@makhov makhov force-pushed the fix-808 branch 4 times, most recently from abc978e to d35adea Compare November 22, 2024 09:38
@makhov
Copy link
Contributor Author

makhov commented Nov 22, 2024

@apedriza yeah, thanks! I fixed the issues and added second update to the test case.


if oldMachines > 0 {
if clusterIsUpdating {
log.Log.Info("Cluster is updating", "currentVersion", currentVersion, "newVersion", kcp.Spec.Version, "strategy", kcp.Spec.UpdateStrategy)
clusterIsUpdating = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
clusterIsUpdating = true

at that point it is always true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks

machine := machines.Oldest()
machine := machines.Filter(func(m *clusterv1.Machine) bool {
return machineNamesToDelete[m.Name]
}).Newest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return newest or oldest ? Can you explain reason one instead of the other? Maybe a comment in this lines explaining why delete oldest/newest helps. (Currently, log message says found oldest and we are retrieving newest)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really doesn't matter. I change it back to oldest to keep it consistent

logger := log.FromContext(ctx).WithValues("controlNode", name)
//err := clientset.RESTClient().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove this lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks

Comment on lines +339 to +348
i := 0
for len(desiredMachineNames) < int(kcp.Spec.Replicas) {
name := machineName(kcp.Name, i)
log.Log.Info("desire machine", "name", len(desiredMachineNames))
_, ok := machineNamesToDelete[name]
if !ok {
_, exists := machines[name]
desiredMachineNames[name] = exists
}
i++
}
log.Log.Info("Desired machines", "count", len(desiredMachineNames))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about use random suffix at the end of the name instead the current integer? With the current approach we are creating new machines using same name as old ones (check e2e test added) had which maybe could led us in some weird scenario. i dont know if it is possible use random suffix but we definitely would not have these cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've been thinking about it and even tried it. Having predictable machine names keeps names consistent across reconciliation loops and it is convenient for tests. So I didn't proceed with those changes, at least for now.

Signed-off-by: Alexey Makhov <amakhov@mirantis.com>
@makhov makhov merged commit efc4a4c into k0sproject:main Nov 26, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Control plane contoller doesn't react on machineTemplate ref changes
2 participants