Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cache): mlcache invalidation use separate cluster event channels to avoid useless invalidation #12321

Merged
merged 7 commits into from
Mar 5, 2024

Conversation

windmgc
Copy link
Member

@windmgc windmgc commented Jan 9, 2024

Summary

Currently, the kong_db_cache and kong_core_db_cache use the same invalidations channel named "invalidations" in the cluster event hook. In a traditional cluster(in which multiple Kong nodes use the same database and communicate with each other through cluster events), whenever an invalidation happens on a node A, any other single node X in the cluster will call invalidate_local on both kong_db_cache and kong_core_db_cache although the entity usually exists in only one cache, thus generates useless worker events. Not sure if this is an intentional design.

The PR tries to separate every kong.cache instance to use its own invalidations channel to avoid generating useless "invalidations" worker events in a traditional cluster.

Checklist

Issue reference

Tries to alleviate FTI-5559

@windmgc windmgc marked this pull request as ready for review January 9, 2024 15:20
@windmgc
Copy link
Member Author

windmgc commented Jan 9, 2024

Single service creation test

Before:

2024/01/09 22:16:00 [debug] 15117#0: *236 [lua] init.lua:228: process_event(): [cluster_events] new event (channel: 'invalidations') data: 'services:c9ff5ac9-6833-4ba2-be5d-460c8aad47b5:::::ab2e831b-d916-4389-a714-caf1e558c300' nbf: 'none' shm exptime: 10
2024/01/09 22:16:00 [debug] 15117#0: *236 [lua] init.lua:144: [DB cache] kong_db_cache received invalidate event from cluster for key: 'services:c9ff5ac9-6833-4ba2-be5d-460c8aad47b5:::::ab2e831b-d916-4389-a714-caf1e558c300'
2024/01/09 22:16:00 [debug] 15117#0: *236 [lua] init.lua:233: invalidate_local(): [DB cache] kong_db_cache invalidating (local): 'services:c9ff5ac9-6833-4ba2-be5d-460c8aad47b5:::::ab2e831b-d916-4389-a714-caf1e558c300'
2024/01/09 22:16:00 [debug] 15117#0: *236 [lua] init.lua:144: [DB cache] kong_core_db_cache received invalidate event from cluster for key: 'services:c9ff5ac9-6833-4ba2-be5d-460c8aad47b5:::::ab2e831b-d916-4389-a714-caf1e558c300'
2024/01/09 22:16:00 [debug] 15117#0: *236 [lua] init.lua:233: invalidate_local(): [DB cache] kong_core_db_cache invalidating (local): 'services:c9ff5ac9-6833-4ba2-be5d-460c8aad47b5:::::ab2e831b-d916-4389-a714-caf1e558c300'

After:

2024/01/09 22:57:56 [debug] 93118#0: *41 [lua] init.lua:228: process_event(): [cluster_events] new event (channel: 'invalidations_kong_core_db_cache') data: 'services:44e977b8-00d9-4604-bf0b-36e2d121a40d:::::ab2e831b-d916-4389-a714-caf1e558c300' nbf: 'none' shm exptime: 10
2024/01/09 22:57:56 [debug] 93118#0: *41 [lua] init.lua:144: [DB cache] kong_core_db_cache received invalidate event from cluster for key: 'services:44e977b8-00d9-4604-bf0b-36e2d121a40d:::::ab2e831b-d916-4389-a714-caf1e558c300'
2024/01/09 22:57:56 [debug] 93118#0: *41 [lua] init.lua:243: invalidate_local(): [DB cache] kong_core_db_cache invalidating (local): 'services:44e977b8-00d9-4604-bf0b-36e2d121a40d:::::ab2e831b-d916-4389-a714-caf1e558c300'

@windmgc windmgc changed the title fix(cache): mlcache invalidation use split cluster event channels to avoid useless invalidation fix(cache): mlcache invalidation use separate cluster event channels to avoid useless invalidation Jan 9, 2024
@@ -140,8 +140,18 @@ function _M.new(opts)
neg_ttl = neg_ttl,
}

local ok, err = cluster_events:subscribe("invalidations_" .. shm_name, function(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will already result in a new channel for each cache - invalidations_kong_db_cache and invalidations_kong_core_db_cache. Can we remove the old channel - it isn't part of a public interface so we should be fine removing it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the old channel deliberately because I've seen some of the custom plugins using this channel to do their validations, considering that this old channel has been here for a long time I think perhaps removing it suddenly will cause some noise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I feel it would be better to only have one extra new channel, leaving the old one for the core cache, for example. Thoughts?

Copy link
Member Author

@windmgc windmgc Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think if any of these wild custom codes try to push events to the old channel, we won't know whether it is cache or core_cache entity they try to refresh. So maybe leaving the old channel for both is a better idea. How about I add a deprecation warning when the old channel receives events? And we can totally deprecate the old channel in next major
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is significant risk that removing the event handler for the older invalidations channel will break existing custom plugins, then I think I could be in favor of keeping it.

Just to sanity check, though: doesn't this argument also apply to our removal of self.cluster_events:broadcast("invalidations", ...) in favor of the newer channel name? If some custom plugin has registered an event handler for this channel, they will break, after all. Or is this a much less likely scenario?

kong/cache/init.lua Outdated Show resolved Hide resolved
@gszr
Copy link
Member

gszr commented Feb 26, 2024

@windmgc

A few thoughts...

  • The existing channel has historically been used all over by the kong.cache (the non-core cache);
  • No plugin currently bundled with Kong (both OSS and Enterprise) currently refer to the kong.core_cache. I understand there might be custom plugins out there that do so, although I doubt so. Further, the PDK docs only mention the kong.cache object, not the kong.core_cache;
  • For the vast majority of cases, an :invalidate call in plugin code will want to invalidate a plugin custom entity, not a core one;
  • Would it make sense for a plugin to invalidate a core entity (ie, an entity cached in kong.core_cache)? Maybe there are valid use-cases for that. In those (rare?) cases, the new core cache channel would have to be used -- changelog entry + docs change.

Given this, things we could do for now (patch):

  • Leave the existing channel only for kong.cache -- this is what most (> 95%?) of plugins want when they do :invalidate
  • Create a separate channel for the kong.core_cache
  • Proper changelog warnings; plus docs changes (if any)

Stretch goals - (3.8?):

  • Improve the core vs core cache abstraction; there's some attempt at doing that (see the constants file enumerating core entities for example);
  • Maybe add a check + warning log if a plugin tries to mess with the core cache? This is potentially dangerous, and the reason why those caches were split in 2.0
  • Maybe it would make sense to standardize the cache as part of the PDK -- a good place for those checks?

@gszr
Copy link
Member

gszr commented Feb 27, 2024

@windmgc - I see you pushed some changes, but there's still 2 channels being created; did you agree with this:

- Create a separate channel for the kong.core_cache

If not, let me know what your thoughts are.

@windmgc
Copy link
Member Author

windmgc commented Feb 27, 2024

Given this, things we could do for now (patch):
Leave the existing channel only for kong.cache -- this is what most (> 95%?) of plugins want when they do :invalidate
Create a separate channel for the kong.core_cache
Proper changelog warnings; plus docs changes (if any)

@gszr Yes! I agree with the patch plan, will update the PR accordingly

@windmgc windmgc force-pushed the fix-mlcache-invalidate-split-channel branch from 2a4bfd6 to 58c1473 Compare February 27, 2024 14:38
@pull-request-size pull-request-size bot added size/M and removed size/S labels Feb 27, 2024
@windmgc
Copy link
Member Author

windmgc commented Feb 28, 2024

@gszr @flrgh Please reivew again, thanks!

I'm now following GS's advice on leaving old channels for db_cache and creating a new channel for core_cache(and any other cache entry), as well as adding deprecation warning into the changelog.

For further action items I create KAG-3866 for tracking

@windmgc windmgc force-pushed the fix-mlcache-invalidate-split-channel branch from 5eb7e8d to fa8f18d Compare February 28, 2024 06:30
@windmgc windmgc requested review from gszr and flrgh March 4, 2024 07:46
kong/cache/init.lua Outdated Show resolved Hide resolved
@windmgc windmgc merged commit 49c1ea0 into master Mar 5, 2024
27 checks passed
@windmgc windmgc deleted the fix-mlcache-invalidate-split-channel branch March 5, 2024 06:21
@flrgh
Copy link
Contributor

flrgh commented Mar 5, 2024

@windmgc does this need a cherry-pick to kong-ee?

@flrgh flrgh added the cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee label Mar 14, 2024
@team-gateway-bot
Copy link
Collaborator

Successfully created cherry-pick PR for master:

@windmgc
Copy link
Member Author

windmgc commented Mar 15, 2024

@flrgh Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants