-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETCD regenerated its certificate within 15 days of installation #10436
Comments
etcd has no mechanism to generate certs, so what changed them?
--
|
Well I take that back are you using auto-tls? Can you show your startup
flags?
|
We are not using auto-tls. [Service] Environment="ETCD_IMAGE_TAG=v3.3.9" ExecStartPre=/usr/bin/mkdir --parents /var/lib/coreos [Install] This is my startup script. Also one thing even though certs were changed 2 months back how etcd was working with old certs until yesterday? |
Did you recently update etcd? Before 3.3.4 we had a TLS reload bug
#9541.
--
|
I have not upgraded manually . Probably an auto upgrade? Its clear that we were running 3.3.4 before quay.io/coreos/etcd:v3.3.4 running 4 months ago 4 months ago does 3.3.4 also has this issue? |
The fix was released with v3.3.4. So it would affect etcd prior in 3.3.x. |
Ok then we were running the fixed version at that time. |
Well as I said etcd does not generate these certs, what changed them and can you be sure that was not the cause of the problem? There seemed some question as to what changed the certs hence the issue so can you elaborate on that?
Basically if you ran |
Also this could probably happen if DNS changed or was not available giving the impression it was the cert. |
We are not using SAN. Also no DNS updates. These are certs.
Old One
|
Yes |
So your certs have no X509v3 extensions:? I am most curious in the cert info below where you show. |
Sorry Pasting the complete certs here Certificate: Certificate: |
rkt was probably downloading etcd, but I am not a rkt expert.
Or maybe bootkube you have not talked about that. Basically etcd is reporting it sees "server" in SAN and it doesn't match the calling IP. this is expected, the rest of the equation is not etcd in my opinion so you need to figure that out. |
This is your CA we need your cert output |
Ok Here you go. There is the cert
|
Here is your SAN
|
"server" needs to resolve correctly to the calling IP dig around and see why this is failing.
|
Seems to be working fine. No issue with resolving to IP. and there is only one record.
This error was gone once i replaced with the new certs on calling service (kubernetes apiserver in my case). |
OK not sure what the issue is then. We can be sure of a few things.
Is there anything that etcd did that you do not understand? My theory with the bad certificate is some sort of DNS issue, perhaps intermittent not sure. |
@vpramod0310 have we answered your questions? |
@hexfusion Thanks. One thing still with 3.3.4 version tls auto reload should have been solved but why did this occur in our case. |
@vpramod0310 nothing here clearly tells me that etcd is to blame for the errors seen in the logs. The bug patched in 3.3.4 was for certs with IP only in SAN. Your certs have both IP and domain so would not be to blame. From what I see here I feel the blame is probably on bootkube https://coreos.com/tectonic/docs/latest/troubleshooting/bootkube_recovery_tool.html#bootkube-recover Unless you can recreate this I am afraid there is not much more we can do. Feel free to reopen if you get more details or have full reproduction steps. |
Problem:
I am running the etcd version 3.3.9 on top of coreos with rkt pods. ETCD was up and running from Nov 7th and it had 1 year of validity. But on Nov 22nd after 15 days etcd automatically regenerated its certificates and was accepting the connections with old client certs till yesterday. Yesterday at some point in time it started to reject connection with "tls:bad certificate" I don't have much logs at that point of time when certs are regenerated.
Some logs which i have are pasted below.
-- Logs begin at Thu 2018-11-22 15:46:05 UTC, end at Tue 2019-01-29 02:16:46 UTC. -- Nov 22 15:46:05 rkt[27757]: Downloading sha256:0dc0248c936 651 KB / 651 KB Nov 22 15:46:05 rkt[27757]: Downloading sha256:60d10af8752 369 B / 369 B Nov 22 15:46:05 rkt[27757]: Downloading sha256:c99d4145780 5.83 MB / 110 MB Nov 22 15:46:05 rkt[27757]: Downloading sha256:291f3eef6bb 5.59 MB / 45.7 MB Nov 22 15:46:05 rkt[27757]: Downloading sha256:d093dc0d702 8.04 MB / 17.9 MB Nov 22 15:46:05 rkt[27757]: Downloading sha256:5f2ca80930a 4.3 KB / 4.3 KB Nov 22 15:46:05 rkt[27757]: Downloading sha256:2b6ac55109d 398 B / 398 B Nov 22 15:46:05 rkt[27757]: Downloading sha256:7460c29cea5 7.47 MB / 15.7 MB Nov 22 15:46:05 rkt[27757]: Downloading sha256:0dc0248c936 651 KB / 651 KB Nov 22 15:46:05 rkt[27757]: Downloading sha256:60d10af8752 369 B / 369 B .... Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.000265 W | etcdserver: read-only range request "key:\"/registry/*******\" count_only:true " took too long (1.760117035s) to execute Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.000676 W | etcdserver: read-only range request "key:\"/registry/\" " took too long (1.012883935s) to execute Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.000692 W | etcdserver: read-only range request "key:\"/registry/*****" " took too long (1.039691933s) to execute Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.000733 W | etcdserver: read-only range request "key:\"/registry/*****\" " took too long (1.571963253s) to execute Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.000768 W | etcdserver: read-only range request "key:\"/registry/****\" " took too long (177.812904ms) to execute Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.000799 W | etcdserver: read-only range request "key:\"/registry/****** took too long (1.57192529s) to execute Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.000839 W | etcdserver: read-only range request "key:\"/registry/******\" " took too long (989.986857ms) to execute Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.000873 W | etcdserver: read-only range request "key:\"/registry/******\"/registry/validatingwebhookconfigurationt\" count_only> Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.000950 W | etcdserver: read-only range request "key:\"/registry/******\" " took too long (984.033524ms) to execute Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.001016 W | etcdserver: read-only range request "key:\"/registry/*****\" " took too long (1.57193116s) to e> Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.001043 W | etcdserver: read-only range request "key:\"/registry/******\" " took too long (1.010760633s) to execute Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.001095 W | etcdserver: read-only range request "key:\"/registry/****\" " took too long (1.57192198s) to execute Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.001148 W | etcdserver: read-only range request "key:\"/registry/******\" " took too long (984.109373ms) to execute Nov 22 15:47:23 etcd-wrapper[2167]: 2018-11-22 15:47:23.001190 W | etcdserver: read-only range request "key:\"/registry/**********\" " took too long (1.571964522s) to execute Nov 22 15:47:23 rkt[27757]: sha512-d20266172092543414f77999b0a43614 Nov 22 15:47:23 systemd[1]: Started Download bootkube. Nov 22 15:50:17 etcd-wrapper[2167]: 2018-11-22 15:50:17.085915 I | mvcc: store.index: compact 2792125 Nov 22 15:50:17 etcd-wrapper[2167]: 2018-11-22 15:50:17.088152 I | mvcc: finished scheduled compaction at 2792125 (took 1.327087ms) Nov 22 15:52:18 systemd-networkd[1116]: enp0s29u1u1u5: DHCP: No routes received from DHCP server: No data available Nov 22 15:55:17 etcd-wrapper[2167]: 2018-11-22 15:55:17.089263 I | mvcc: store.index: compact 2792622 Nov 22 15:55:17 etcd-wrapper[2167]: 2018-11-22 15:55:17.091663 I | mvcc: finished scheduled compaction at 2792622 (took 1.408709ms) Nov 22 15:57:18 systemd-networkd[1116]: enp0s29u1u1u5: DHCP: No routes received from DHCP server: No data available Nov 22 16:00:17 etcd-wrapper[2167]: 2018-11-22 16:00:17.092611 I | mvcc: store.index: compact 2793121 Nov 22 16:00:17 etcd-wrapper[2167]: 2018-11-22 16:00:17.094868 I | mvcc: finished scheduled compaction at 2793121 (took 1.368126ms)
etcd logs
Jan 28 05:00:10 etcd-wrapper[2167]: 2019-01-28 05:00:10.845957 I | embed: rejected connection from "x.x.x.x:52938" (error "remote error: tls: bad certificate", ServerName "server") Jan 28 05:00:42 etcd-wrapper[2167]: 2019-01-28 05:00:42.085582 I | embed: rejected connection from "x.x.x.x:53916" (error "remote error: tls: bad certificate", ServerName "server") Jan 28 05:01:15 etcd-wrapper[2167]: 2019-01-28 05:01:15.517653 I | embed: rejected connection from "x.x.x.x:55844" (error "remote error: tls: bad certificate", ServerName "server") Jan 28 05:02:04 etcd-wrapper[2167]: 2019-01-28 05:02:04.328073 I | embed: rejected connection from "x.x.x.x:36194" (error "remote error: tls: bad certificate", ServerName "server") Jan 28 05:03:17 etcd-wrapper[2167]: 2019-01-28 05:03:17.267019 I | embed: rejected connection from "x.x.x.x:52518" (error "remote error: tls: bad certificate", ServerName "server")
old cert
Validity Not Before: Nov 7 17:56:36 2018 GMT Not After : Nov 7 17:56:36 2019 GMT Subject: O=etcd, CN=etcd-ca
New cert
Validity Not Before: Nov 22 15:40:27 2018 GMT Not After : Nov 22 15:40:27 2019 GMT Subject: O=etcd, CN=etcd-ca
The text was updated successfully, but these errors were encountered: