jwks: start attempting to refresh cache right away #264

dlwyatt · 2020-07-26T15:22:12Z

Existing code doesn't start attempting to refresh the remote keyset cache until it gets a call to Verify() that fails to verify with one of the existing cached keys. This can result in a situation where a new key has been present in the remote keyset for some time, but go-oidc doesn't know about it until the first time it encounters a JWT signed with that new key (even if the cache expired some time before that). If the HTTP request to refresh the cache fails at this point, the JWT validation also fails.

This small enhancement will still return successful validations right away based on cached keys, but will start attempting to renew the cache right away in the background. That way a spotty internet connection might still find the cache properly renewed before the new key is actually used on a JWT.

ericchiang · 2020-07-26T17:01:00Z

Can you take a look at #259 which I merged for the V3 branch that removes the expiry logic?

I think doing more frequent refreshes would be good, but some issues that come to mind are:

We don't want to swallow errors
This package currently doesn't log anything, all errors are returned to the caller

Ideally NewRemoteKeySet() would just create a background goroutine that occasionally refreshed the key set (maybe once every minute?) but then we'd need to figure out how to surface errors to the caller and potentially backoff. Maybe require the caller to pass a logger to create a remote key set?

As a hack, you can feed a token signed by an invalid key every now and again to force it to refresh the remote keys.

How are you using this package where you're seeing this issue? Is it through another project or are you using go-oidc directly?

dlwyatt · 2020-07-26T18:38:36Z

I'm just starting to use the package and reviewing the code; I'll be using go-oidc directly to verify tokens for an Azure AD app. I haven't actually run into any issues with this; the code just jumped out at me as a place where intermittent network blips hitting the jwks URI could cause easily avoidable errors. Interesting that the OIDC spec recommends the same approach though: The verifier knows to go back to the jwks_uri location to re-retrieve the keys when it sees an unfamiliar kid value.. Why wait that long to re-retrieve when the new keys have probably been published at the jwks_uri for a while before the provider actually started signing anything with them?

I agree that background refreshing the cache on a schedule would be better than having it driven by client requests. Wasn't sure if that type of a change would be accepted here; what I did was a minimal modification of the existing code. Refreshing every minute is excessive; refresh interval would typically be around once per day, but may depend on the provider.

I don't personally mind swallowing errors on the background cache refreshes because they're eventually going to surface if a caller encounters a signing key that isn't yet in the cache (same as it happens today). The background refresh attempts are just there to get ahead of that, so there are potentially multiple attempts to fetch the latest JWKS before a verification error eventually occurs. Logging is fine too, of course. :)

ericchiang · 2020-07-26T19:57:14Z

Good to hear that someone actually reviews their dependencies :)

This strategy was used because it's straightforward. We currently don't have to manage another a goroutine's lifetime, which would require more API. If you stop using the remote key set, it gets garbage collected.

I think different users will have different opinions about unhandled errors. But if we're going to make start making calls on the user's behalf we probably want a way to surface those to the caller. Some will want to log those with different logging packages, others may want to expose those as monitoring metrics.

If this isn't something you're actually hitting I'm inclined to not optimize this. I've got a limited amount of time I can use to maintain open source projects, and try to prioritize things that are actively blocking users.

Start attempting to refresh cache right away

9615958

dlwyatt force-pushed the cacheRefreshUpdate branch from 3859f28 to 9615958 Compare July 26, 2020 15:27

ericchiang closed this Jul 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jwks: start attempting to refresh cache right away #264

jwks: start attempting to refresh cache right away #264

dlwyatt commented Jul 26, 2020

ericchiang commented Jul 26, 2020

dlwyatt commented Jul 26, 2020

ericchiang commented Jul 26, 2020

jwks: start attempting to refresh cache right away #264

jwks: start attempting to refresh cache right away #264

Conversation

dlwyatt commented Jul 26, 2020

ericchiang commented Jul 26, 2020

dlwyatt commented Jul 26, 2020

ericchiang commented Jul 26, 2020