Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make id_token mutator cache configurable #1177

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

David-Wobrock
Copy link
Contributor

@David-Wobrock David-Wobrock commented Aug 9, 2024

Make the id_token mutator cache configurable:

  • can be enabled/disabled
  • can set the max_cost

➡️ And set a TTL on the cached JWTs in the id_token mutator.
This part fixes a memory leak in Oathkeeper ⚠️
We are running internally a fork of Oathkeeper with this patch applied, and the resulting memory footprint:
image

Related issue(s)

Follow up of #1171

Related docs PR: ory/docs#1820

Checklist

  • I have read the contributing guidelines.
  • I have referenced an issue containing the design document if my change
    introduces a new feature.
  • I am following the
    contributing code guidelines.
  • I have read the security policy.
  • I confirm that this pull request does not address a security
    vulnerability. If this pull request addresses a security vulnerability, I
    confirm that I got the approval (please contact
    security@ory.sh) from the maintainers to push
    the changes.
  • I have added tests that prove my fix is effective or that my feature
    works.
  • I have added or changed the documentation.

Further Comments

Comment on lines 218 to 222
BufferItems: 64,
Cost: func(value interface{}) int64 {
return 1
},
IgnoreInternalCost: true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be discussed if this is really the best strategy for id_token.

I'm less familiar with other use cases of generated id_token and also how ristretto computes the "cost".

For now, this is copied from the AuthN OAuth2 introspection handler.
But perhaps saying that each id_token = 1 cost would cache too many id_tokens.

I'd like @aeneasr's opinion here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state it does not work, but we could at least do

                        NumCounters: 10000,
			BufferItems: 64,
                        MaxCost: cost,

to keep the same behaviour as previously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cost = 1 << 25

as the default is definitely too many now that the cost function is 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since JWTs can be quite long it may make more sense to calculate the cost based on the string length to have an approximation of storage use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could make sense indeed 👍
Then the default limit of 1 << 25 (~33.5 million), if we estimate an average length of JWT of a few thousands char, we should be able to store some tens of thousands of keys - which sounds reasonable.

@David-Wobrock David-Wobrock marked this pull request as ready for review August 9, 2024 15:11
@David-Wobrock David-Wobrock force-pushed the feat/id-token-mutator-cache-configurable branch from 6ad3106 to 7a46fd0 Compare August 22, 2024 11:57
@David-Wobrock David-Wobrock force-pushed the feat/id-token-mutator-cache-configurable branch from 7a46fd0 to 2b89d1e Compare August 29, 2024 12:56
Comment on lines 112 to 119
a.tokenCache.SetWithTTL(
key,
&idTokenCacheContainer{
TTL: ttl,
ExpiresAt: expiresAt,
Token: token,
},
0,
ttl,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another probably reason for the memory leak:

Keys are only deleted from cache when requested and expired.

I don't know if ristretto flushes other keys when the max cost is reached, but this could be another lead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We deployed this in production, and it fixed the memory leak indeed :)

@David-Wobrock David-Wobrock force-pushed the feat/id-token-mutator-cache-configurable branch from 2f91f5e to e78b048 Compare September 13, 2024 07:44
"cache": {
"additionalProperties": false,
"type": "object",
"properties": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bunch of other caches in the config schema already. Can you make your change so that it is more similar (identical) to those other cache configurations? The max_cost parameter in particular is really opaque and its impossible to come up with a reasonable value without knowing the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the remark on max_cost 👍
A value with similar semantics is used today for the OAuth2 introspection authenticator, so I mainly mimicked this one.
Else, there is also max_tokens, used by the OAuth2 client credentials authenticator.

I think the reason for the cost instead comes from the fact that the cached objects have variable lengths, so storing a certain number of objects will result in a different cache memory usage depending on your config.

However, one can also make the decision to let the user make this decision anyway :)
Let me know what makes most sense to you, and Ory's strategy around these questions (should the product have the same defaults for everyone, or is the user trusted to configure this accordingly).


For the enabled/disabled value, I didn't re-use the existing

    "handlerSwitch": {
      "title": "Enabled",
      "type": "boolean",
      "default": false,
      "examples": [true],
      "description": "En-/disables this component."
    },

to avoid introducing a breaking change.

Currently, the id_token cache is enabled by default, and didn't wanna change the default value to false - so I couldn't re-use this configuration.


And finally, this cache config has no TTL, because the id_token mutator already has a TTL config value for the JWT expiration date.
It make sense to me to re-use the same value => cache for 15 min if the JWT is valid 15min.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would probably have one namespaced cache for all of oathkeeper and then use that everywhere, but this is good for now!

Token: token,
},
0,
ttl,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other caches, I believe we set the TTL to min(TTL, time.Until(expiresAt). That would make sense here too IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be same normally. Since calling this method does:

	now := time.Now().UTC()
	exp := now.Add(ttl)
	[...]
	a.tokenToCache(c, session, templateClaims, ttl, exp, signed)

Or what am I missing? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

func (a *MutatorIDToken) SetCaching(token bool) {
a.tokenCacheEnabled = token
}

type idTokenCacheContainer struct {
ExpiresAt time.Time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at this code in detail, but ExpiresAt+TTL seem to be semantically the same thing? Do we even need to store these in the cache?

Copy link
Contributor Author

@David-Wobrock David-Wobrock Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, we don't!

I think that's what was buggy in the first place: we don't set an actual TTL in the cache, but instead cache the TTL along with the token.

I pushed an edit.

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only changes cache sizes and not the actual caching function itself, right? If so I think we’re very close!

@David-Wobrock
Copy link
Contributor Author

This only changes cache sizes and not the actual caching function itself, right? If so I think we’re very close!

I'm glad to read this 😁

At the time of writing this patch does:

  • behaviour changes:
    • set a TTL on keys in id_token mutator cache
    • set a cost function in the id_token mutator cache
  • new configuration:
    • allow disabling id_token mutator cache
    • allow changing the cost of id_okten mutator cache

@David-Wobrock David-Wobrock force-pushed the feat/id-token-mutator-cache-configurable branch from 8c5806e to d48aa10 Compare October 8, 2024 07:44
@David-Wobrock David-Wobrock force-pushed the feat/id-token-mutator-cache-configurable branch from d48aa10 to 4229ee6 Compare October 28, 2024 09:21
@David-Wobrock
Copy link
Contributor Author

Hey @aeneasr I hope you're well :)

Did you get a chance to have a look again? 😇
We are still running our forked and patched Oathkeeper, but we would obviously prefer running the Ory upstream version.

Perhaps this can make it into the next version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants