Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove memberID from data corruption alarm #14849

Closed
ahrtr opened this issue Nov 24, 2022 · 6 comments
Closed

Remove memberID from data corruption alarm #14849

ahrtr opened this issue Nov 24, 2022 · 6 comments

Comments

@ahrtr
Copy link
Member

ahrtr commented Nov 24, 2022

What happened?

Only leader performs the corruption check, and it always assumes that it's one of the followers' data corrupted. It isn't correct, it's also possible that the leader data corrupted.

Please refer the discussion in #14828

What did you expect to happen?

Remove memberID from data corruption alarm or set it as 0.

How can we reproduce it (as minimally and precisely as possible)?

Trigger a data corruption.

Anything else we need to know?

We only need to fix this for 3.5 and 3.4.

Etcd version (please run commands below)

$ etcd --version
# paste output here

$ etcdctl version
# paste output here

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

@ahrtr
Copy link
Member Author

ahrtr commented Nov 24, 2022

The easiest solution might be intentionally set the memberID as 0, otherwise we need to update the alarm structure. WDYT? @serathius

@serathius
Copy link
Member

Sounds Good

@ahrtr
Copy link
Member Author

ahrtr commented Nov 24, 2022

Related to #14272

@serathius
Copy link
Member

Note, can develop changes first on main branch?

@ahrtr
Copy link
Member Author

ahrtr commented Nov 25, 2022

Note, can develop changes first on main branch?

Only 3.4 and 3.5 need this change. main will follow #14828

@ahrtr
Copy link
Member Author

ahrtr commented Nov 25, 2022

Resolved.

@ahrtr ahrtr closed this as completed Nov 25, 2022
ahrtr added a commit to ahrtr/etcd that referenced this issue Nov 25, 2022
…orrupted member

If quorum doesn't exist, we don't know which members data are
corrupted. In such situation, we intentionally set the memberID
as 0, it means it affects the whole cluster.
It's align with what we did for 3.4 and 3.5 in
etcd-io#14849

Signed-off-by: Benjamin Wang <wachao@vmware.com>
ahrtr added a commit to ahrtr/etcd that referenced this issue Nov 26, 2022
…orrupted member

If quorum doesn't exist, we don't know which members data are
corrupted. In such situation, we intentionally set the memberID
as 0, it means it affects the whole cluster.
It's align with what we did for 3.4 and 3.5 in
etcd-io#14849

Signed-off-by: Benjamin Wang <wachao@vmware.com>
ahrtr added a commit to ahrtr/etcd that referenced this issue Nov 26, 2022
…orrupted member

If quorum doesn't exist, we don't know which members data are
corrupted. In such situation, we intentionally set the memberID
as 0, it means it affects the whole cluster.
It's align with what we did for 3.4 and 3.5 in
etcd-io#14849

Signed-off-by: Benjamin Wang <wachao@vmware.com>
serathius pushed a commit to serathius/etcd that referenced this issue Dec 2, 2022
…orrupted member

If quorum doesn't exist, we don't know which members data are
corrupted. In such situation, we intentionally set the memberID
as 0, it means it affects the whole cluster.
It's align with what we did for 3.4 and 3.5 in
etcd-io#14849

Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants