-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[nginx-ingress-controller] Nginx aborts reloads w/ certificate errors #1314
Comments
@fcvarela do you have some step to replicate this error? |
@aledbf Unfortunately no, it requires setting up a few hundred ingress rules spanning a few dozen hosts all using tls secrets. I set logging level to --v=5 and noticed the nginx error interleaved with logging from ssl.go updating pem files from secrets, indicating the nginx reload signal was triggered while files were being written in a non atomic way, hence the rename from tmp instead of asynchronous write in place. |
@fcvarela how many certificates are you trying to use? |
Not that many, we are using 12 certificates covering a ~200 hosts (one of the certs is a wildcard) and ~500 ingress rules. My impression is that once the pem files are generated all will work fine. But if you have continuous activity generating ingress rules and adding hosts or containers crash-looping (and therefore registering/unregistering endpoints as upstream servers), then the pem generation will run continuously. This problem happens when nginx is reloading from a previous iteration and new pem files are being written over existing ones (same content). You should see the problem occurring if you keep the ingress generation loop going indefinitely or if you crashloop the destination backend. In any case, there is no locking in place to prevent nginx from reloading while pem files are being written, so they should be written atomically. That guarantees nginx will always "see" a complete pem file. |
Here's the error output (--v=5) A few seconds later, a manual The patched version is running on our cluster since I submitted the PR and we haven't seen that happen again. |
Addresses #1314 [nginx-ingress-controller ssl nginx reload abort]
Removed comment to be consistent w/ rest of code Fixes typo and string concat
Addresses kubernetes-retired#1314 [nginx-ingress-controller ssl nginx reload abort]
The ingress controller writes certificates to /etc/nginx-ssl on a goroutine and reload nginx on another. On a busy nginx controller, I've observed that nginx throws errors like:
2016/07/05 11:32:56 [emerg] 13#13: PEM_read_bio_X509("/etc/nginx-ssl/xxx-yyy") failed (SSL: error:0906D066:PEM routines:PEM_read_bio:bad end line)
After this happens any rc/pod/deployment restart that results in pods getting new IPs will make the entire upstream unavailable (503 on nginx) as the aborted reload renders it unaware of those changes.
Manual inspection of nginx.conf shows the upstream does have the new ips, that the certificates and keys are correct.
A second manual reload solves the issue.
This is caused by writing certificates/keys using ioutil (async) while nginx is trying to read them. I have a working and verified patch that changes it to tempfiles and atomic renames.
The text was updated successfully, but these errors were encountered: