-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix JWK expiry time parsing from header on startup behaviour (fix #3655) #3779
fix JWK expiry time parsing from header on startup behaviour (fix #3655) #3779
Conversation
- fix when header parsing fails on startup, do not exit - change qualified import of aeson in Auth.JWT from A to J to be consistent with the rest of the codebase
- not sure why this was added, but I think it is not required
Review app for commit 18b1778 deployed to Heroku: https://hge-ci-pull-3779.herokuapp.com |
Review app for commit 1dfd928 deployed to Heroku: https://hge-ci-pull-3779.herokuapp.com |
b6c2070
to
32539bf
Compare
Review app for commit 32539bf deployed to Heroku: https://hge-ci-pull-3779.herokuapp.com |
When you say 'cannot be parsed', does it mean that the provided values for |
@0x777 the headers can have other directives, and nothing related to
Yes, basically this. For example Cache-Control header might just have |
What is the behaviour when the JWK server responds with |
During startup it will log that it could not parse any max age from the header and will not start any background refresh thread. From the above PR description:
In the case, when graphql-engine has started the background refresh thread, and in one of those cycles the JWK server responded with a header where we could not parse the expiry time, it will log and default to 1 min of refresh cycle. |
It feels a little inconsistent (continuing to refresh in 1min intervals when the server's caching strategy clearly changed), but I guess it is quite uncommon for a server to change its caching strategy for us to worry about these edge cases. Can you update the docs clearly documenting the behaviour when a |
Yeah, I was not sure what our strategy should be in this scenario. It did not make sense to cancel the refresh thread either, as it could be because of temporary errors from the server? (e.g. the actual JWK server is down and temp responses come from gateway/reverse proxy)
👍 |
Just to clarify, this will happen when we cannot parse any expiry time from the response headers [1]. If any subsequent response returns a valid expiry time, it will use that to refresh accordingly. [1] and presumably we could parse a expiry time before, hence the refresh thread is running. If we could not parse any expiry time in the beginning, the refresh thread is not started. I will document these scenarios in detail. |
Shouldn't this same logic apply during startup (some transient error causing the parse to fail)? Would it not be better to have the refresh thread running in all cases? ^ This is my one concern. If it's super easy to undo the Aeson import alias change that would be good to do in my opinion, but I'll leave that up to you! |
Deploy preview for hasura-docs ready! Built with commit 5da4021 |
Review app for commit d360920 deployed to Heroku: https://hge-ci-pull-3779.herokuapp.com |
Review app for commit 5da4021 deployed to Heroku: https://hge-ci-pull-3779.herokuapp.com |
I was thinking the same, but it also does seem wrong to keep refreshing a JWK which never changes. Our current 'trying to infer' the key's behaviour from the server's response headers can only go so far, which I think is not bad, covers typical cases but as you pointed out there are some edge cases. I think a better option would be to have a flag like |
@rikinsk Oops I didn't realize the console files changed. Probably because of the pre-commit hook. |
Review app https://hge-ci-pull-3779.herokuapp.com is deleted |
Description
Whenever there is an error on fetching or parsing the JWK, on startup, graphql-engine exits with an error.
This is not the correct behaviour when
Cache-Control
orExpires
header value cannot be parsed on startup. This PR fixes this.So with this PR -
Affected components
Related Issues
#3655
Solution and Design
The possible errors while fetching and parsing JWKs are more explicit now. And the
mkJwtCtx
function now explicitly pattern matches over the errors and ignores if it is only expiry parsing error (JFEExpiryParseError
). But throws error if it is any other error (consequently, that results in exit during startup).Due to the above error ADT changes, it changes the log schema (internal log) of
jwk-refresh-log
(more details are noted below). Should we make sure to not change the schema? This can be done, but not sure if the extra effort is required.Side note: This was the (incorrect) behaviour on graphql-engine previously as well. But after we introduced parsing cache-control header, this issue became apparent because
Cache-Control
header can have other directives than just specifying the expiry (or the caching) time. Previously, for theExpires
header, the only valid value it could have was expiry of the response, hence parse errors on it was never encountered (in my hypothesis).Steps to test and verify
Cache-Control
header which does not havemax-age
ors-maxage
directive.Limitations, known bugs & workarounds
Server checklist
Catalog upgrade
Does this PR change Hasura Catalog version?
Metadata
Does this PR add a new Metadata feature?
GraphQL
Breaking changes
There are breaking changes:
Logging
JSON
schema has changedOf the internal log type
jwk-refresh-log
. The old and new schemas are noted below. Do we need to ensure compatibility for this?type
names have changedThe breaking change on
jwk-refresh-log
schemaOn http exception
Old
New
On HTTP error (non-200 response)
Old
New
On JWK parsing error
Old
New
On header parsing errors
Old
New
Just info logs:
Old
New