fix(cache): mlcache invalidation use separate cluster event channels to avoid useless invalidation #12321

windmgc · 2024-01-09T14:47:43Z

Summary

Currently, the kong_db_cache and kong_core_db_cache use the same invalidations channel named "invalidations" in the cluster event hook. In a traditional cluster(in which multiple Kong nodes use the same database and communicate with each other through cluster events), whenever an invalidation happens on a node A, any other single node X in the cluster will call invalidate_local on both kong_db_cache and kong_core_db_cache although the entity usually exists in only one cache, thus generates useless worker events. Not sure if this is an intentional design.

The PR tries to separate every kong.cache instance to use its own invalidations channel to avoid generating useless "invalidations" worker events in a traditional cluster.

Checklist

The Pull Request has tests
A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
[na] There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Issue reference

Tries to alleviate FTI-5559

windmgc · 2024-01-09T15:22:05Z

Single service creation test

Before:

2024/01/09 22:16:00 [debug] 15117#0: *236 [lua] init.lua:228: process_event(): [cluster_events] new event (channel: 'invalidations') data: 'services:c9ff5ac9-6833-4ba2-be5d-460c8aad47b5:::::ab2e831b-d916-4389-a714-caf1e558c300' nbf: 'none' shm exptime: 10
2024/01/09 22:16:00 [debug] 15117#0: *236 [lua] init.lua:144: [DB cache] kong_db_cache received invalidate event from cluster for key: 'services:c9ff5ac9-6833-4ba2-be5d-460c8aad47b5:::::ab2e831b-d916-4389-a714-caf1e558c300'
2024/01/09 22:16:00 [debug] 15117#0: *236 [lua] init.lua:233: invalidate_local(): [DB cache] kong_db_cache invalidating (local): 'services:c9ff5ac9-6833-4ba2-be5d-460c8aad47b5:::::ab2e831b-d916-4389-a714-caf1e558c300'
2024/01/09 22:16:00 [debug] 15117#0: *236 [lua] init.lua:144: [DB cache] kong_core_db_cache received invalidate event from cluster for key: 'services:c9ff5ac9-6833-4ba2-be5d-460c8aad47b5:::::ab2e831b-d916-4389-a714-caf1e558c300'
2024/01/09 22:16:00 [debug] 15117#0: *236 [lua] init.lua:233: invalidate_local(): [DB cache] kong_core_db_cache invalidating (local): 'services:c9ff5ac9-6833-4ba2-be5d-460c8aad47b5:::::ab2e831b-d916-4389-a714-caf1e558c300'

After:

2024/01/09 22:57:56 [debug] 93118#0: *41 [lua] init.lua:228: process_event(): [cluster_events] new event (channel: 'invalidations_kong_core_db_cache') data: 'services:44e977b8-00d9-4604-bf0b-36e2d121a40d:::::ab2e831b-d916-4389-a714-caf1e558c300' nbf: 'none' shm exptime: 10
2024/01/09 22:57:56 [debug] 93118#0: *41 [lua] init.lua:144: [DB cache] kong_core_db_cache received invalidate event from cluster for key: 'services:44e977b8-00d9-4604-bf0b-36e2d121a40d:::::ab2e831b-d916-4389-a714-caf1e558c300'
2024/01/09 22:57:56 [debug] 93118#0: *41 [lua] init.lua:243: invalidate_local(): [DB cache] kong_core_db_cache invalidating (local): 'services:44e977b8-00d9-4604-bf0b-36e2d121a40d:::::ab2e831b-d916-4389-a714-caf1e558c300'

gszr · 2024-02-21T12:20:59Z

kong/cache/init.lua

@@ -140,8 +140,18 @@ function _M.new(opts)
    neg_ttl        = neg_ttl,
  }

+  local ok, err = cluster_events:subscribe("invalidations_" .. shm_name, function(key)


This will already result in a new channel for each cache - invalidations_kong_db_cache and invalidations_kong_core_db_cache. Can we remove the old channel - it isn't part of a public interface so we should be fine removing it?

I left the old channel deliberately because I've seen some of the custom plugins using this channel to do their validations, considering that this old channel has been here for a long time I think perhaps removing it suddenly will cause some noise

In this case, I feel it would be better to only have one extra new channel, leaving the old one for the core cache, for example. Thoughts?

Hmm, I think if any of these wild custom codes try to push events to the old channel, we won't know whether it is cache or core_cache entity they try to refresh. So maybe leaving the old channel for both is a better idea. How about I add a deprecation warning when the old channel receives events? And we can totally deprecate the old channel in next major
What do you think?

If there is significant risk that removing the event handler for the older invalidations channel will break existing custom plugins, then I think I could be in favor of keeping it.

Just to sanity check, though: doesn't this argument also apply to our removal of self.cluster_events:broadcast("invalidations", ...) in favor of the newer channel name? If some custom plugin has registered an event handler for this channel, they will break, after all. Or is this a much less likely scenario?

kong/cache/init.lua

gszr · 2024-02-26T18:47:05Z

@windmgc

A few thoughts...

The existing channel has historically been used all over by the kong.cache (the non-core cache);
No plugin currently bundled with Kong (both OSS and Enterprise) currently refer to the kong.core_cache. I understand there might be custom plugins out there that do so, although I doubt so. Further, the PDK docs only mention the kong.cache object, not the kong.core_cache;
For the vast majority of cases, an :invalidate call in plugin code will want to invalidate a plugin custom entity, not a core one;
Would it make sense for a plugin to invalidate a core entity (ie, an entity cached in kong.core_cache)? Maybe there are valid use-cases for that. In those (rare?) cases, the new core cache channel would have to be used -- changelog entry + docs change.

Given this, things we could do for now (patch):

Leave the existing channel only for kong.cache -- this is what most (> 95%?) of plugins want when they do :invalidate
Create a separate channel for the kong.core_cache
Proper changelog warnings; plus docs changes (if any)

Stretch goals - (3.8?):

Improve the core vs core cache abstraction; there's some attempt at doing that (see the constants file enumerating core entities for example);
Maybe add a check + warning log if a plugin tries to mess with the core cache? This is potentially dangerous, and the reason why those caches were split in 2.0
Maybe it would make sense to standardize the cache as part of the PDK -- a good place for those checks?

gszr · 2024-02-27T11:32:11Z

@windmgc - I see you pushed some changes, but there's still 2 channels being created; did you agree with this:

- Create a separate channel for the kong.core_cache

If not, let me know what your thoughts are.

windmgc · 2024-02-27T13:16:19Z

Given this, things we could do for now (patch):
Leave the existing channel only for kong.cache -- this is what most (> 95%?) of plugins want when they do :invalidate
Create a separate channel for the kong.core_cache
Proper changelog warnings; plus docs changes (if any)

@gszr Yes! I agree with the patch plan, will update the PR accordingly

…avoid useless invalidation

windmgc · 2024-02-28T06:10:19Z

@gszr @flrgh Please reivew again, thanks!

I'm now following GS's advice on leaving old channels for db_cache and creating a new channel for core_cache(and any other cache entry), as well as adding deprecation warning into the changelog.

For further action items I create KAG-3866 for tracking

changelog/unreleased/kong/cluster_events_invalidation_channel_deprecation.yml

kong/cache/init.lua

flrgh · 2024-03-05T17:49:20Z

@windmgc does this need a cherry-pick to kong-ee?

team-gateway-bot · 2024-03-14T16:48:21Z

Successfully created cherry-pick PR for master:

https://github.com/kong/kong-ee/pull/8505

windmgc · 2024-03-15T02:03:04Z

@flrgh Thank you!

pull-request-size bot added the size/S label Jan 9, 2024

github-actions bot assigned windmgc Jan 9, 2024

windmgc marked this pull request as ready for review January 9, 2024 15:20

windmgc changed the title ~~fix(cache): mlcache invalidation use split cluster event channels to avoid useless invalidation~~ fix(cache): mlcache invalidation use separate cluster event channels to avoid useless invalidation Jan 9, 2024

kikito requested review from chronolaw, gszr and flrgh February 20, 2024 18:08

gszr reviewed Feb 21, 2024

View reviewed changes

flrgh reviewed Feb 21, 2024

View reviewed changes

kong/cache/init.lua Outdated Show resolved Hide resolved

windmgc commented Feb 23, 2024

View reviewed changes

kong/cache/init.lua Show resolved Hide resolved

windmgc added 2 commits February 27, 2024 21:50

fix(cache): mlcache invalidation use split cluster event channels to …

f27c6e6

…avoid useless invalidation

docs(changelog): add changelog

853b495

windmgc force-pushed the fix-mlcache-invalidate-split-channel branch from 2a4bfd6 to 58c1473 Compare February 27, 2024 14:38

pull-request-size bot added size/M and removed size/S labels Feb 27, 2024

windmgc added 2 commits February 28, 2024 14:29

fix(*): preserve old invalidate channel for kong_db_cache

28ec30f

docs(changelog): reword changelog

fa8f18d

windmgc force-pushed the fix-mlcache-invalidate-split-channel branch from 5eb7e8d to fa8f18d Compare February 28, 2024 06:30

gszr reviewed Feb 28, 2024

View reviewed changes

changelog/unreleased/kong/cluster_events_invalidation_channel_deprecation.yml Outdated Show resolved Hide resolved

kong/cache/init.lua Outdated Show resolved Hide resolved

windmgc added 2 commits March 4, 2024 15:42

fix(*): stash invalidation channel in cache obj

e19db49

docs(*): remove invalidate deprecate changelog

0ce6c92

windmgc requested review from gszr and flrgh March 4, 2024 07:46

gszr approved these changes Mar 4, 2024

View reviewed changes

kong/cache/init.lua Outdated Show resolved Hide resolved

flrgh approved these changes Mar 4, 2024

View reviewed changes

style(*): remove extra blank line

fe840bf

windmgc merged commit 49c1ea0 into master Mar 5, 2024
27 checks passed

windmgc deleted the fix-mlcache-invalidate-split-channel branch March 5, 2024 06:21

flrgh added the cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee label Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cache): mlcache invalidation use separate cluster event channels to avoid useless invalidation #12321

fix(cache): mlcache invalidation use separate cluster event channels to avoid useless invalidation #12321

windmgc commented Jan 9, 2024 •

edited

Loading

windmgc commented Jan 9, 2024

gszr Feb 21, 2024

windmgc Feb 23, 2024

gszr Feb 23, 2024

windmgc Feb 26, 2024 •

edited

Loading

flrgh Feb 26, 2024

gszr commented Feb 26, 2024 •

edited

Loading

gszr commented Feb 27, 2024

windmgc commented Feb 27, 2024 •

edited

Loading

windmgc commented Feb 28, 2024 •

edited

Loading

flrgh commented Mar 5, 2024

team-gateway-bot commented Mar 14, 2024

windmgc commented Mar 15, 2024

fix(cache): mlcache invalidation use separate cluster event channels to avoid useless invalidation #12321

fix(cache): mlcache invalidation use separate cluster event channels to avoid useless invalidation #12321

Conversation

windmgc commented Jan 9, 2024 • edited Loading

Summary

Checklist

Issue reference

windmgc commented Jan 9, 2024

gszr Feb 21, 2024

Choose a reason for hiding this comment

windmgc Feb 23, 2024

Choose a reason for hiding this comment

gszr Feb 23, 2024

Choose a reason for hiding this comment

windmgc Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

flrgh Feb 26, 2024

Choose a reason for hiding this comment

gszr commented Feb 26, 2024 • edited Loading

gszr commented Feb 27, 2024

windmgc commented Feb 27, 2024 • edited Loading

windmgc commented Feb 28, 2024 • edited Loading

flrgh commented Mar 5, 2024

team-gateway-bot commented Mar 14, 2024

windmgc commented Mar 15, 2024

windmgc commented Jan 9, 2024 •

edited

Loading

windmgc Feb 26, 2024 •

edited

Loading

gszr commented Feb 26, 2024 •

edited

Loading

windmgc commented Feb 27, 2024 •

edited

Loading

windmgc commented Feb 28, 2024 •

edited

Loading