Lua land OCSP stapling improvements #5397

ElvinEfendi · 2020-04-18T20:49:37Z

Since #5133 and #5387 ingress-nginx supports OCSP stapling (to replace vanilla Nginx's SSL stapling).

In this issue I'd like us to collect user feedback and track list of things to do to improve the OCSP support further:

Eagerly fetch OCSP responses for certificates to make sure first request can be stapled
Improve documentation
Cache better using nextUpdate attribute. This is blocked on OCSP nextupdate openresty/lua-resty-core#296
Use a lock in fetch_and_cache_ocsp_response to avoid a scenario where multiple Nginx workers send request to OCSP responder for the same certificate
Do negative caching in fetch_and_cache_ocsp_response. We might want to differentiate negative caching between validation failures and HTTP request failures - I think each should be handled differently. HTTP request failures is likely to be intermittent and should be cached for short term, while validation failures are long term and should be negatively cached for longer. We can even go further and differentiate between validation failure errors - for example validation fails when status is "revoked", in these cases we should cache forever right? In case status is "unauthorized" though I'm not sure - maybe we can cache forever again because it means certificate is borked and has an incorrect OCSP responder URL?
The implementation, contrary to vanilla Nginx, does not provide any alternative to ssl_stapling_verify and always passes OCSP_NOVERIFY flag when calling OCSP_basic_verify. There's also a test case for that: https://github.com/openresty/lua-resty-core/blob/1bd74f8f502a9dc0a7099110d0ee184187698d01/t/ocsp.t#L1120. https://www.openssl.org/docs/man1.1.1/man3/OCSP_basic_verify.html explains this further. My understanding is by passing OCSP_NOVERIFY we effectively skip verification of certificate that was used to sign the OCSP response, in other words OCSP response could have been served by an attacker and we would not know. But is this really an issue for OCSP stapling? Browser should do full verification upon receiving the response no?
tbd

The text was updated successfully, but these errors were encountered:

fejta-bot · 2020-07-18T17:11:14Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-08-17T17:52:49Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

wenzong · 2020-09-16T04:12:47Z

Hi @ElvinEfendi

I am using quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.33.0 and enable ocsp stapling using enable-ocsp: "true" in cm/nginx-configuration

I am also using cert-manager and use letsencrypt issuer, then I put cert-manager.io/cluster-issuer: "letsencrypt" annotation in ingress.

Everything worked as expected at the beginning, cert-manager can generate certificate for my domain and ingress-controller can fetch OCSP Response and do the OCSP Stapling for my domain.

The problem is, when cert-manager renew the certificate, ingress-controller still using the old OCSP Response.

Let's say the certificate's Validity is Not Before: Sep 16 00:00:00 2020 GMT but the OCSP Response is

    Produced At: Sep 13 01:00:00 2020 GMT
    Cert Status: good
    This Update: Sep 13 01:00:00 2020 GMT
    Next Update: Sep 20 01:00:00 2020 GMT

I guess that would cause inconsistence between the certificate and OCSP Response?

(fix me if I am wrong)
(I lose the environment and can't reproduce it in a short time, so I can't check the certificate serial name between the cert and OCSP Response)

From what I get on hand, Browsers and iOS Native Apps ignore that inconsistence, but Android would complain that Caused by: java.security.cert.CertPathValidatorException: OCSP response does not include a response for a certificate supplied in the OCSP request.(For more background information, I am using gRPC on Android, which underlying using Android Java.`)

Maybe ingress-controller should expire the old OCSP Response when a new certificate is generated? Or it's cert-manager's duty to notify the renewal of certificate?

I will try to understand rootfs/etc/nginx/lua/certificate.lua's behavior but i guess better put the information here first and let's people discuss about it.

ElvinEfendi · 2020-09-16T15:35:48Z

/remove-lifecycle rotten

ElvinEfendi · 2020-09-16T15:43:03Z

@wenzong thanks for the report. I think that's because we use k8s secret's UID as a cache key. And when cert-manager updates the certificate, the corresponding secret's cache is remains unchanged.

If you traceback uid from

ingress-nginx/rootfs/etc/nginx/lua/certificate.lua

Line 179 in d0b37d1

success, err, forcible = ocsp_response_cache:set(uid, ocsp_response, expiry)

you will see what I'm talking about.

I think the fix would be to use certificate's serial number as the cache key. Or if Lua does not expose any API to get serial number, we can also simply md5 hash the certificate payload itself and use it as a cache key.

I'd be happy to guide you if you want to take a stab at this.

wenzong · 2020-09-17T13:21:19Z

@wenzong thanks for the report. I think that's because we use k8s secret's UID as a cache key. And when cert-manager updates the certificate, the corresponding secret's cache is remains unchanged.

If you traceback uid from

ingress-nginx/rootfs/etc/nginx/lua/certificate.lua

Line 179 in d0b37d1

success, err, forcible = ocsp_response_cache:set(uid, ocsp_response, expiry)

you will see what I'm talking about.
I think the fix would be to use certificate's serial number as the cache key. Or if Lua does not expose any API to get serial number, we can also simply md5 hash the certificate payload itself and use it as a cache key.

I'd be happy to guide you if you want to take a stab at this.

Thanks for your reply.

I see, so ingress-controller watches secret's changes, and post certificate information via /configuration/servers

ingress-nginx/internal/ingress/controller/nginx.go

Line 1038 in 87aa96b

    
           statusCode, _, err := nginx.NewPostStatusRequest("/configuration/servers", "application/json", configuration)

Lua receive it in

ingress-nginx/rootfs/etc/nginx/lua/configuration.lua

Line 88 in 87aa96b

local success, set_err, forcible = certificate_servers:set(server, uid)

and save into ngx.shared.DICT

There are three relationships:

POST by ingress-controller
- server name -> UID: ngx.shared.certificate_servers
- UID -> certificate: ngx.shared.certificate_data
Fetch And Cache
- UID -> ocsp response: ngx.shared.ocsp_response_cache

Coming request will use hostname(server name) to find certificate and fetch_and_cache_ocsp_response to fill in certificate and do stapling.

I guess better change UID to something worldwide unique for one certificate? Serial number is only unique for one specific CA. The poster suggest either use issuer name + serial number or certificate's thumbprint, the latter is actually The SHA-1 hash of the DER encoding of the certificate.

I noticed that there is a PemSHA

ingress-nginx/internal/ingress/sslcert.go

Line 52 in 87aa96b

PemSHA string `json:"pemSha"`

in SSLCert struct. Maybe we can use that instead of UID, then the relationship is like:

server name -> PemSHA: ngx.shared.certificate_servers
PemSHA -> certificate: ngx.shared.certificate_data
PemSHA -> ocsp response: ngx.shared.ocsp_response_cache

Any thought?

ElvinEfendi · 2020-09-18T01:55:34Z

in SSLCert struct. Maybe we can use that instead of UID, then the relationship is like:

The problem with this is that we will pollute ngx.shared.certificate_data. Every certificate renewal will create a new entry in the dictionary without deleting the previous one. The benefit of using UID is that, it remains unchanged and we end up doing update instead of insert/add when a certificate is renewed.

--

Maybe in

ingress-nginx/rootfs/etc/nginx/lua/configuration.lua

Line 103 in 87aa96b

local success, set_err, forcible = certificate_data:set(uid, cert)

you can check if certificate is changed, then bust the OCSP cache corresponding to the uid under consideration?

wenzong · 2020-09-18T03:08:09Z

in SSLCert struct. Maybe we can use that instead of UID, then the relationship is like:

The problem with this is that we will pollute ngx.shared.certificate_data. Every certificate renewal will create a new entry in the dictionary without deleting the previous one. The benefit of using UID is that, it remains unchanged and we end up doing update instead of insert/add when a certificate is renewed.

--

Maybe in

ingress-nginx/rootfs/etc/nginx/lua/configuration.lua

Line 103 in 87aa96b

local success, set_err, forcible = certificate_data:set(uid, cert)

you can check if certificate is changed, then bust the OCSP cache corresponding to the uid under consideration?

You are right, do expire cache is more reasonable.

wenzong · 2020-09-18T17:12:04Z

@ElvinEfendi I just rethink the implementation and realized that there is a potential concurrency problem.

So in /configurations/servers:

after ocsp_response_cache:delete(uid)
before certificate_data:set(uid)

Since the two steps are not atomic operation, there is a small chance that some nginx worker missed the cache and read the old certificate, then fetch and cache the old ocsp_response again.

I will try to solve it tomorrow.

The PR is #6200

fejta-bot · 2020-12-18T15:49:59Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-01-17T16:34:59Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2021-02-16T17:21:05Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-02-16T17:21:15Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ElvinEfendi mentioned this issue Apr 19, 2020

staple only when OCSP response status is "good" #5403

Merged

8 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 18, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 17, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 16, 2020

wenzong mentioned this issue Sep 18, 2020

Delete OCSP Response cache when certificate renewed #6198

Merged

6 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 17, 2021

k8s-ci-robot closed this as completed Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lua land OCSP stapling improvements #5397

Lua land OCSP stapling improvements #5397

ElvinEfendi commented Apr 18, 2020 •

edited

Loading

fejta-bot commented Jul 18, 2020

fejta-bot commented Aug 17, 2020

wenzong commented Sep 16, 2020

ElvinEfendi commented Sep 16, 2020

ElvinEfendi commented Sep 16, 2020 •

edited

Loading

wenzong commented Sep 17, 2020 •

edited

Loading

ElvinEfendi commented Sep 18, 2020

wenzong commented Sep 18, 2020

wenzong commented Sep 18, 2020 •

edited

Loading

fejta-bot commented Dec 18, 2020

fejta-bot commented Jan 17, 2021

fejta-bot commented Feb 16, 2021

k8s-ci-robot commented Feb 16, 2021

Lua land OCSP stapling improvements #5397

Lua land OCSP stapling improvements #5397

Comments

ElvinEfendi commented Apr 18, 2020 • edited Loading

fejta-bot commented Jul 18, 2020

fejta-bot commented Aug 17, 2020

wenzong commented Sep 16, 2020

ElvinEfendi commented Sep 16, 2020

ElvinEfendi commented Sep 16, 2020 • edited Loading

wenzong commented Sep 17, 2020 • edited Loading

ElvinEfendi commented Sep 18, 2020

wenzong commented Sep 18, 2020

wenzong commented Sep 18, 2020 • edited Loading

fejta-bot commented Dec 18, 2020

fejta-bot commented Jan 17, 2021

fejta-bot commented Feb 16, 2021

k8s-ci-robot commented Feb 16, 2021

ElvinEfendi commented Apr 18, 2020 •

edited

Loading

ElvinEfendi commented Sep 16, 2020 •

edited

Loading

wenzong commented Sep 17, 2020 •

edited

Loading

wenzong commented Sep 18, 2020 •

edited

Loading