-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initiate update on machine template ref change #813
Conversation
Signed-off-by: Alexey Makhov <amakhov@mirantis.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks good but I think we have some issue in our machines sync algo. Following the new e2e test added in the PR, if a second update is applied, cluster update gets stuck in this state:
I applied the following update:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerMachineTemplate
metadata:
name: docker-test-cp-template-new2
namespace: default
spec:
template:
spec:
customImage: kindest/node:v1.31.1
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: K0sControlPlane
metadata:
name: docker-test
spec:
replicas: 3
version: v1.31.2+k0s.0
updateStrategy: Recreate
k0sConfigSpec:
k0s:
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
name: k0s
spec:
api:
extraArgs:
anonymous-auth: "true"
telemetry:
enabled: false
machineTemplate:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerMachineTemplate
name: docker-test-cp-template-new2
namespace: default
abc978e
to
d35adea
Compare
@apedriza yeah, thanks! I fixed the issues and added second update to the test case. |
|
||
if oldMachines > 0 { | ||
if clusterIsUpdating { | ||
log.Log.Info("Cluster is updating", "currentVersion", currentVersion, "newVersion", kcp.Spec.Version, "strategy", kcp.Spec.UpdateStrategy) | ||
clusterIsUpdating = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clusterIsUpdating = true | |
at that point it is always true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks
machine := machines.Oldest() | ||
machine := machines.Filter(func(m *clusterv1.Machine) bool { | ||
return machineNamesToDelete[m.Name] | ||
}).Newest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we return newest or oldest ? Can you explain reason one instead of the other? Maybe a comment in this lines explaining why delete oldest/newest helps. (Currently, log message says found oldest
and we are retrieving newest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really doesn't matter. I change it back to oldest to keep it consistent
logger := log.FromContext(ctx).WithValues("controlNode", name) | ||
//err := clientset.RESTClient(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we remove this lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks
i := 0 | ||
for len(desiredMachineNames) < int(kcp.Spec.Replicas) { | ||
name := machineName(kcp.Name, i) | ||
log.Log.Info("desire machine", "name", len(desiredMachineNames)) | ||
_, ok := machineNamesToDelete[name] | ||
if !ok { | ||
_, exists := machines[name] | ||
desiredMachineNames[name] = exists | ||
} | ||
i++ | ||
} | ||
log.Log.Info("Desired machines", "count", len(desiredMachineNames)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about use random suffix at the end of the name instead the current integer? With the current approach we are creating new machines using same name as old ones (check e2e test added) had which maybe could led us in some weird scenario. i dont know if it is possible use random suffix but we definitely would not have these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've been thinking about it and even tried it. Having predictable machine names keeps names consistent across reconciliation loops and it is convenient for tests. So I didn't proceed with those changes, at least for now.
Signed-off-by: Alexey Makhov <amakhov@mirantis.com>
Fixes #808