-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to restore etcd from a snapshot due to resolving peer URL failure #14456
Comments
Thanks @xzycn for raising this ticket. It looks like an issue to me. The error is coming from VerifyBootstrap. Specifically, it's coming from netutil.URLStringsEqual. When The proposed fix is to add a flag something like " It should be an easy fix. Please anyone feel free to deliver a PR for this, and we can have more discussion under the PR. |
Why do we need to bypass during restore ? Is there any specific reason why the URL resolution will fail in this case ?
I would like to work on this. To replicate this, will etcd commands [1] alone should be enough or should there be other circumstances that causes the url resolution to fail ? |
Because the etcd POD isn't running when restoring from the snapshot, so the URL something like
Please feel free to deliver a PR. Please follow #14456 (comment) to reproduce and fix this issue. I think command alone should be enough to reproduce and fix this issue. But eventually we need to verify the real scenario raised by the reporter (@xzycn ). |
@ahrtr This command comes from the helm chart https://github.com/apache/apisix-helm-chart/tree/master/charts/apisix/charts,**etcd** is the subchart of chart called apisix. |
My below comment is wrt 3.5.* I did hit the issue. Adding a work around for time being without touching the chart. Assume you have snapshot and etcd cluster is down. Steps: |
@hasethuraman |
Correct. The restore command arguments I tried is same as in https://etcd.io/docs/v3.3/op-guide/recovery/#restoring-a-cluster |
I am having trouble replicating the problem. I created 2 etcd members (static configuration) in a cluster, similar to @xzycn commandline. When I restore it seems to work fine without creating the problem log message. Note that I used etcd version 3.5.5 and etcdutl (rather than etcdctl which is deprecated). I have given the command-line below. It could be because I am using IP addresses rather than hostnames. Also, I noticed that the message is a warning and not fatal, does it prevent etcd from completing ? etcd Version: 3.5.5 Create cluster (2 such instances)
Create snapshot
restore from snapshot
|
You need to reproduce this issue using unsolvable URL such as |
@ahrtr I want to work on this issue. could you please assign me this. |
Thanks @sanjeev98kumar @pchan are you still working on this issue? |
Yes, I will implement the following part. I expect to have a PR or an update soon.
|
Thanks @pchan for the update. @sanjeev98kumar Please find something else to work on. FYI. find-something-to-work-on |
@ahrtr I have created a PR (#14546 ) that attempts to fix this by adding a flag. Can you please review and add reviewers. I wasn't able to follow everything from Contributing guide. It passes |
I just realized that actually the |
The original PR is #13224 |
The fix that is back-ported front loads the URL comparison between advertise peer (--initial-advertise-peer-urls) and initial cluster (--initial-cluster) so that resolve is not called. So if a user gives different URLs that resolves to the same ip address, the issue will still be manifested and the only way to prevent that is to use the flag. I checked the reporter's description and the backport should be enough. |
The fix will be included in 3.5.6 and 3.4.22. @pchan please add a changelog item for both 3.4 and 3.5. FYI. #14573 (comment) |
I have done some procedures for this:
but the command has a problem: the pods have been shut down,therefore there is no pod-domain exists,get errors:
If I restore without extra options:
All things are OK except that the node start as a single-node,etcdctl member list only shows itself :(
So, how should I restore ETCD deployed in K8S, thank you in advance.
etcd Version: 3.4.16
Git SHA: d19fbe5
Go Version: go1.12.17
Go OS/Arch: linux/amd64
The text was updated successfully, but these errors were encountered: