-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make bootstrap operations deterministic #403
Conversation
If you wish to write a test for this (using fakeclient), then it's possible with some helpers already in cass-operator. For example, https://github.com/k8ssandra/cass-operator/blob/master/pkg/reconciliation/reconcile_racks_test.go#L1625 calls startCassandra, but the more interesting for you is of course https://github.com/k8ssandra/cass-operator/blob/master/pkg/reconciliation/reconcile_racks_test.go#L1406 It shows how to mock the pod endpoint that the fakeclients then call, without doing too much coding (it's not intuitive at all, but copy-pasting parts of those existing tests usually ends up making one understand how to achieve something). And you can call a single test from the editor which makes iterating quite quick. |
b5ada74
to
c17f80f
Compare
@burmanm thank you. I managed to create a unit test that is not too lame. Would you mind reviewing this? It should be a quick review. |
@@ -617,11 +618,13 @@ func (rc *ReconciliationContext) CheckPodsReady(endpointData httphelper.CassMeta | |||
return result.RequeueSoon(5) | |||
} | |||
|
|||
needsMoreNodes, err := rc.startAllNodes(endpointData) | |||
// step 4 - make sure all nodes are ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the step 3 comment is wrong as it used to be this step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK will fix comment for step 3
|
||
labelSeedBeforeStart := readySeeds == 0 && len(rc.Datacenter.Spec.AdditionalSeeds) == 0 && externalSeedPoints == 0 | ||
for podRankWithinRack := 0; ; podRankWithinRack++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest, would you explain the benefit vs, lets say following algorithm (the problem here looks very much like a dynamic programming)
smallestRack := 0
for i := 0; i < len(rc.statefulSets); i++ {
if rc.statefulSets[i].Status.ReadyReplicas < rc.statefulSets[smallestRack].Status.ReadyReplicas {
smallestRack = i
}
}
fmt.Printf("Next rack to start is: %d\n", smallestRack)
Also, is .Replicas really the number you want to compare to vs. ReadyReplicas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, my understanding is that we want here to use the desired number of pods for each sts, thus we should use .Replicas
.
Second, switching to .ReadyReplicas
could indeed simplify the code, but can we guarantee that if a pod is counted as a ready replica, it is not only ready, but also started and has joined the ring? Because if that's not the case, we should NOT keep bootstrapping the next pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would Replicas help with that behavior? Isn't .Spec.Replicas updated when we add the pods? They don't have to be ready at all. Our startup process should take care of preventing next startup (there's one pod that's Started-but-not-Ready) before starting next ones. Though for that reason, perhaps it doesn't matter which one we use, since when we get to this point, we should have no starting pods anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than of course for the algorithm I mentioned, there the ReadyReplicas does matter since Replicas has wrong calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what should I do here? Should I change as you suggested above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not necessarily. I only want to ensure that whoever reads it next can easily see what it's doing (and so avoid anything that looks pretty and complex). Would you think from that perspective the code is self-explanatory enough (the comment on the method only describes what we want to achieve, not how we achieve it) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it is self-explanatory as it is now, however that's a subjective notion. You are never going to get everybody on the same page on such matters.
@@ -1676,15 +1675,21 @@ func TestFailedStart(t *testing.T) { | |||
|
|||
func TestReconciliationContext_startAllNodes(t *testing.T) { | |||
|
|||
// A boolean representing the state of a pod (started or not). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very helpful. Thank you!
What this PR does:
Makes bootstrap order 100% balanced and deterministic.
Which issue(s) this PR fixes:
Fixes #381 .
Checklist