-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-node cluster replication is racy (testing) #1556
Comments
Because these messages were being sent to replicas (data nodes) data nodes were sometimes receiving messages out-of-order from the broker, where order is defined strictly by the Raft log index. This causes our tests to fail, because the new "wait" endpoint on broker assumes this can never happen. When told to wait for, say, index 23, but sees it is at 25, it responds with 200. But the message with ID 23 has yet to arrive. This change also means that if unknown messages are seen by data nodes, they will instead panic, which should flag this issue much sooner. Fixes issue #1556
So here is a clearer example of what is going on. A 5 node cluster is created, and all nodes join. A large batch write is then sent to the node listening on port 8590. When the test finishes the node listening on port 8591 does not have the data, even though its index implies that it should. I patched the code simply as follows:
And see this output on 8591:
Note how the last message that 8591 receives is out of order, and is preceded by two broker-only messages of So this is what is happening, as far as I can see.
Because the implementation of the |
@benbjohnson has confirmed this is a real issue, that needs some thought. |
All fixed! |
docs(http): update authorization definition
Replication to a 3-node and 5-node cluster appears to be racy, so that test is currently disabled in the integration testing. 3-node does not appear to be racy locally, but both are racy in travis.
https://github.com/influxdb/influxdb/blob/master/cmd/influxd/server_integration_test.go
The text was updated successfully, but these errors were encountered: