Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requeue after the last node has its node-state label set to Started during cluster creation #77

Merged
merged 5 commits into from
May 8, 2020

Conversation

jsanda
Copy link
Collaborator

@jsanda jsanda commented May 6, 2020

This PR is for #72. It fixes an issue in which we basically see out of order status updates. Specifically, the .Status.SuperUserUpserted property is getting set before the last node is added to .Status.NodeStatuses. The actual order of operations is correct though. The super user is not created until all C* nodes are started. It is just that the status updates basically occur out of order.

This commit fixes an issue in which we basically see out of order status
updates. Specifically, the .Status.SuperUserUpserted property is getting set
before the last node is added to .Status.NodeStatuses. The actual order of
operations is correct though. The super user is not created until all C* nodes
are started. It is just that the status updates basically occur out of order.

var _ = Describe(testName, func() {
Context("when in a new cluster", func() {
Specify("the operator can scale up a datacenter", func() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to make this a bit more descriptive for what you are actually testing. We have a couple of other scale up tests that do nothing but increase the node count, and we probably want to distinguish the messages in this one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I just copied/pasted from the scale up test. I will update this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandoichi wdyt about the updated message?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

step = "checking that all nodes have been started"
nodeStatusesHostIds := ns.GetNodeStatusesHostIds(dcName)
Expect(len(nodeStatusesHostIds), 6)
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of our tests will typically test that the dc can be successfully deleted at the end of the test scenario. It probably isn't super important to have on every single one (especially since you really aren't doing any crazy patching or unusual things) so I'll let you decide if you want to add the logic to this test or not. See our scale up test for reference if you choose to (again, purely optional from my perspective).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the deletion check(s). I would rather be consistent with other tests.

Copy link
Collaborator

@sandoichi sandoichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments, otherwise looks good!

@@ -476,6 +476,9 @@ func (rc *ReconciliationContext) CheckPodsReady(endpointData httphelper.CassMeta
desiredSize := int(rc.Datacenter.Spec.Size)

if desiredSize == readyPodCount && desiredSize == startedLabelCount {
if nodeStarted {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add a one line comment here

also why not move this directly under

	nodeIsStarting, nodeStarted, err := rc.findStartingNodes()

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment added.

I hadn't really considered moving the if block. That will require some further discussion to ensure I don't wind up introducing some regressions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's not introduce instability, I can tackle this when refactoring some stuff in an upcoming PR

@jsanda
Copy link
Collaborator Author

jsanda commented May 8, 2020

I ran all integration tests locally. I get one failure:

• Failure [1244.484 seconds]
Add racks
/Users/jsanda/Development/gocode/src/github.com/datastax/cass-operator/tests/add_racks/add_racks_suite_test.go:40
  when in a new cluster
  /Users/jsanda/Development/gocode/src/github.com/datastax/cass-operator/tests/add_racks/add_racks_suite_test.go:41
    racks can be added if the size is increased accordingly [It]
    /Users/jsanda/Development/gocode/src/github.com/datastax/cass-operator/tests/add_racks/add_racks_suite_test.go:42

    Unexpected error:
        <*errors.errorString | 0xc000246390>: {
            s: "Timed out waiting for value. Expected to output to contain: true, but got .",
        }
        Timed out waiting for value. Expected to output to contain: true, but got .
    occurred

I also ran the tests in master and got the same failure.

@jimdickinson
Copy link
Collaborator

We'll see if thats a GKE vs KIND issue...

@jimdickinson jimdickinson merged commit b96bfd7 into datastax:master May 8, 2020
@bradfordcp bradfordcp added this to the 1.2 milestone Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants