-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Revisions Across Members ( v3.3.3 ) #10594
Comments
@davissp14 Startup logs would be good for verification of flags. The DB size range is very different. Have you been running defrag on some members but not all?
There were a few changes around compaction in 3.3.2 -> 3.3.3 a theory could be you have ran the cluster with a mixture on 3.3.2 and 3.3.3 members? Checkout the various changes https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.3.md#v332-2018-03-08 Specifically, this bug looks suspect |
Defrags are running daily per-node and I have confirmed this in the logs. It's the revisions that I am most concerned about though. Our users are seeing inconsistent key value read results across members. |
It is concerning that there is such difference between
Can you provide more details on this? In etcd, difference in [1] etcd/etcdserver/etcdserverpb/rpc.proto Lines 414 to 420 in 952b9e7
|
|
@davissp14 it would help if we could see some logs around compaction for these members so we can understand timing. Also as I asked before it would be nice to verify the startup logs and or flags for each member so that we could attempt to reproduce. That is unless you have a way to reproduce this already? 3.3.3 is fairly old I am curious because of the changes that were made in 3.3.2 and 3.3.3 WRT to compaction if this would correct itself with a newer version of etcd. Could you test this theory in a dev ENV? The last question is where did these etcd binaries come from? I hate to assume that these are release assets SHA from |
Hello, I'm helping @davissp14 get the logs that were needed here. So far we don't know how to reproduce the issue, so trying an upgraded etcd in a dev env is going to be difficult. We'll have to reach out to the user and see if he agrees to upgrade his instance. With that said, I think I got everything else that was needed, see below Compaction logsOnly one node shows auto-compaction logs, runs every hour on the clock Node 10.213.214.4:
Defrag logsNode 10.213.214.2:
Node 10.213.214.3:
Node 10.213.214.4:
Etcd versionNode 10.213.214.2:
Node 10.213.214.3:
Node 10.213.214.4:
Startup logsNode 10.213.214.2:
Node 10.213.214.3:
Node 10.213.214.4:
FlagsNode 10.213.214.2:
Node 10.213.214.3:
Node 10.213.214.4:
|
@anthonyalberto thanks a lot for the logs.
Curious that only one of the configs uses the
|
@hexfusion That was my fault, as I did not communicate this to @anthonyalberto. That discrepancy was due to my earlier testing. All nodes were consistently set to |
from the logs above |
Just to add more info, I don't see any change in behaviour after switching all 3 nodes to |
@hexfusion Compaction should only be running on the leader node, yeah? |
Yeah your right, does this cluster also have v2 data? |
I wonder if this flag is somehow causing the issue. Is this still required? |
Nah, we have been enabling that specifically for v2 API emulation. We were supporting the v2 storage backend for a while and this was enabled to help ease the migration to v3.
I would be surprised if this is causing the issue as it's enabled across all of our Etcd clusters. Hard to say if it's still required or not, it would depend on how the end-user is accessing their data. |
There are some similarities with #9630, just in case that could provide any other clues (in that case the backing store was inconsistent at time of upgrade, here there had also been an in-place upgrade from 3.3.2 -> 3.3.3). |
@davissp14 do you enable auth? if you enable auth, it is possible to encounter data inconsistency in all etcd3 version when you restart etcd(#11651). |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Etcd Version: 3.3.3
OS: Linux 14.04.1 Ubuntu
Health
Table output
JSON Ouput
As you can see the revisions across members are vastly different.
Each of these nodes are configured with
--auto-compaction-retention 1
.Any thoughts on what's going on here?
The text was updated successfully, but these errors were encountered: