Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] 🏷️ Tagging #319

Open
jodydonetti opened this issue Oct 20, 2024 · 43 comments
Open

[FEATURE] 🏷️ Tagging #319

jodydonetti opened this issue Oct 20, 2024 · 43 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@jodydonetti
Copy link
Collaborator

jodydonetti commented Oct 20, 2024

The Need

Time and time again, there have been requests from the community to support some form of "grouping together cache entries", primarily for multi-invalidation purposes.

Here are some examples:

On top of this, the upcoming HybridCache from Microsoft that will tentatively be released at around the .net 9 timeframe and for which FusionCache want to be an available implementation of (and actually the first one!), will seemingly support tagging.

So, it seems the time has come to finally deal with this monumental beast.

image

Shadow of the Colossus
Copyright © Sony, All Rights Reserved

Scenario

As we all know, cache invalidation is in general an uber complex beast to approach, and this is true even with "just" an L1 cache (only the first local level, in memory).

Add to this the fact that when we talk about an hybrid cache like FusionCache, we can have 2 levels (L1 + L2, memory + distributed) and multi-node invalidation (see: horizontal scalability) and it's even worse.

Finally, as a cherry on top, in the case of FusionCache add the fact that it automatically handles transient errors thanks to features like fail-safe, soft timeouts, auto-recovery and more, and you have a seemingly insurmountable task ahead.

Or is it?

Limitations

Aside from the complexity of the problem itself, which as said is already kind of crazy hard, we need to deal first and foremost with a design decision that sits at the foundation of FusionCache itself since the beginning (and the same is also true for the upcoming HybridCache from Microsoft): the available abstractions to work with, for both L1 but even more so for L2, are quite limited.
In particular for L2 that means IDistributedCache, with its very limited set of available functionalities.

The design decision of using IDistributedCache paid a lot of dividends along the years, because any implementation of IDistributedCache is automatically usable with FusionCache: since there are a lot of them readily available covering very different needs, this is a very powerful characteristic to have available.

On the other hand, as said, we basically have at our disposal only 3 methods:

  • Set
  • Get
  • Remove

That's it, and it's not a lot to work with.

So, what can we do?

R&D Experiments

Since as we can see there's no native support for tagging, along the years I've experimented multiple times with an approach which I called "Client-Assisted Tag Invalidation" which basically means "do something client-side only, with no server-side support and that for all intents and purposes would get us the same result from the outside".

This, in turn, translates to not actually do a real "evict by tag" on the underlying caches (eg: on Redis, Memcached, etc) but instead keep track of "something" only on the client-side to check before returning data to callers.
This would logically work as a sort of "barrier" or "low-pass filter" to "hide" data that is logically expired because of one or more of the associated tags.

There are different ways to try to achieve this, but in general it would have consisted of something like:

  • locally (in memory), an extra dictionary of tags-related invalidations timestamps
  • remotely, in the L2, an extra "special" cache entry dedicated to contain the data about all expired tags, like the timestamp at which each tag-based expiration occurred

So, "removing by tag" then means getting the special entry, add the new bit of information, and saving it back.

But, as I explained in my comment regarding a similar approach for the upcoming HybridCache from Microsoft, this can have severe limitations and practical problems, like:

  • SIZE: eccessive growth of a single cache entry for any real-world usage (eg: a lot of tag-based invalidations would require a ton of data squeezed in a single cache entry). This can hit even harder on some cache backends like Redis with the notorious conveyor-belt problem (eg: one big entry would block a lot of subsequent smaller entries), but even on different cache backends it's generally a bad thing to not be able to handle growth for something that we already know will realistically grow over time. As a different scenario but with similar characteristics, think about an hypothetical HTTP API endpoint that returns the list of products (that naturally grows over time) but where it's not possible to do some sort of paging
  • CONCURRENCY: concurrency issues with multiple tag evictions happening at the same time on different nodes (and, maybe, even on the same node at the same time) where, beacuse of the limited methods available on IDistributedCache, it's not possible to concurrently add 2 different pieces of information to the same cache entry at the same time. This typically results in a last-wins effect, basically cancelling the previous ones done in the same "update time-window". Even accounting for some special data structure like hashsets on Redis (on server-side cache backends that support such a thing), concurrency would be theoretically solved but the first point (size) would still remain
  • COLD STARTS: issues with cold starts, where the initial loading of that special cache entry with tag eviction data would either be blocking (and therefore potentially slowing things down, but avoiding dirty reads of expired data) or non-blocking (and therefore not slowing things down, but allowing for dirty reads). This, along with the size issue would mean loading a lot of data all at once, again the "list endpoint without paging" issue mentioned above

All of this is why, after multiple experimentations along the years, I was basically convinced that the only way to add proper tagging support would've been to go with a "Server-Assisted Tag Invalidation" approach, meaning creating a new abstraction like IFusionCacheLevel or something (either an interface or an abstract class, it's not the point here) to model a generic and more powerful "cache level", with native support for tagging and more.

This would simplify a lot of things but, at the same time, would take away the ability to use any existing IDistributedCache implementation out there. FusionCache though already works with vanilla IDistributedCache and this should not go away, so it means FusionCache must be able to work with both vanilla and extended at the same time: this would not be a problem per se, since I can check at runtime which abstraction the L2 implements and act accordingly, but it also means that for users NOT using an extended L2 implementation, extra features like tagging would NOT be available.

And I don't like this.

And I would really really like to give tagging to all FusionCache users, all of them.

Epiphany

Recently I went to South Korea for my (very late) summer vacations.

In Seoul there's a good jazz scene, with multiple places that deserve a visit like Libro for some live performances which is really beautiful or the nice and cozy Coltrane for some vinyl listening, both highly recommended.

One evening, while drinking a glass of Ardbeg at Coltrane, Land of Make Believe by Chuck Mangione started playing.

And I suddenly had an epiphany.

Why not look at it from a different angle, get to a delicate balance between absolute performance and features available, think about how it would actually be used in the real world from a statistical perspective, and "simply" use the pieces already there to find an overall equilibrium?

By not using a single cache entry to store all tag invalidation infos we would be able to guarantee scalability with whatever number of tags, virtually without limits.

Solution

I'm proposing a solution I call "Client-Assisted Tag Invalidation", meaning it does not requires extra assistance from the server-side.

On one hand it's true that by looking at an entire system in production we'll probably have a lot of tag invalidations along time, and this is a given.

On the other hand it's also true that, by their own nature, a lot of tags will be shared between cache entries: this is the whole point of it anyway.

On top of this, we can set some reasonable limits: for example when working with metrics in OTEL systems, it is a known best practice to not have a huge amount of tags and to not have tags with a huge amount of different values (known as "high cardinality").
So we can say the same here.

By accepting this small fact, by understanding the probabilistic nature of tags usage and sharing and by most importantly relying on all the existing plumbing that FusionCache already provides (like L1+L2 support, fail-safe, non-blocking background distributed operations, auto-recovery, etc) we can "simply" say that, for each tag, we'll automatically handle a cache entry with the data needed, basically meaning the timestamp of when the expiration has been requested the last time for that tag.

Regarding the probabilistic nature: basically a lot of tags will be shared between multiple cache entries, think the Birthday Paradox.

So, a RemoveByTag("tag123") would simply set internally an entry with a key like "__fc:t:tag123" or something like that, containing the current timestamp. Also note that the concrete cache key will also consider any potential cache-key prefix, so mutliple named caches on shared cache backends would automatically be supported, too.

Then when getting a cache entry, after getting it from L1/L2 but before returning it to the outside world, FusionCache would see if it has tags attached to it and, in that case and only in thase case (so no extra costs when not used), it would get the expiration timestamp for each tag to see if it's expired and when.

For each related tag, if an expiration timestamp is present and that is greater than the timestamp ai which the cache entry has been created, it then should be considered expired.

Regarding the Duration of such special entries with tag expiration data, a value would be configurable via options but a sensible default (like 24h) would be provided that would cover most cases.

This can be considered a "passive" approach (waiting for each read to see if it's expired) instead of an "active" one (actually go and massively expire data immediately everywhere).

When get-only methods (eg: TryGet, GetOrDefault) are called and a cache entry is found to be expired because of tags, it not only hide it from the outside but FusionCache will effectively expire it which, thanks to FusionCache normal behaviour, means both locally in the L1, on L2 and on each other node's L1 remotely (thanks to the backplane).

When get-set methods (eg: GetOrSet) is called and a cache entry is found to be expired because of tags, it just skip it internally and call the factory, since that would produce a new value and resolve the problem anyway, just in a different way: the internal set will again automatically save the new value locally in the L1, on L2 and on each other node's L1 remotely (thanks again to the backplane).

So the system would automatically updates internally based on actual usage, only if and when needed, without massive updates to be made when expiring by a tag.

Nice.

What about app restarts? No big deal, since everything is based on the common plumbing of FusionCache, all will work normally and tag-eviction data will get re-populated again automatically, lazily and based on only the effective usage.

Performance considerations

But wait, this is probably ringing a bell for a lot of people reding this: isn't this a variation of the dreaded "SELECT N+1 problem"?

No, at least realistically that is not the case, mostly because of probabilistic theory and adaptive loading based on concrete usage.

Let me explain.

A typical SELECT N+1 problem happens when, to get a piece of data, we do a first select that returns N elements and then, for each element, we do an additional SELECT.

Here this does not happen, because:

  • as soon as a tag returns a timestamp that marks the entry as expired, the process stops, reducing the SELECT amount
  • because of how tags are used (shared between different entries), one load of tag expiration data will be used for multiple entries, reducing again the SELECT amount

As an example if we are loading, either concurrently or one after the other, these cache entries:

  • key "foo", tagged "tag1" and "tag2"
  • key "bar", tagged "tag2" and "tag3"
  • key "baz", tagged "tag1" and "tag3"

The expiration data for "tag1" will be loaded lazily (only when needed) and only once, and automatically shared between the processing of cache entries for both "foo" and "baz".
And since as said tags are frequently shared between different cache entries, this means that the system will automatically load only what's needed, when it's needed, and only once.

Some extra reads would be needed, yes, but deinitely not the SELECT N+1 case which would only remain as a worst case scenario, and not for every single cache read.

What about needing tag expiration for "tag1" by 2 difference cache entries at the same time? Will it be loaded multiple times?
Nope, we are covered, thanks to the Cache Stampede protection.

What about tag expiration data being propagated to other ones?
We are covered, thanks to the Backplane.

And what if tags are based on the data returned from the factory, so that it is not known upfront?
No worries, Adaptive Caching will be extended to support tagging, too.

What about potential transient errors?
We are covered, thanks to Fail-Safe.

What about slow distributed operations?
Again we are covered, thanks to advanced Timeouts and Background Distributed Operations.

What about recovering from distributed errors? Should users need to handle them manually?
Nope, also covered, thanks to Auto-Recovery.

All of this because of the solid foundations that have been built in FusionCache for years 💪

What about Clear() ?

If all of this works out, and up until now it seems so, this same approach may also be used to finally implement something else: a proper Clear() method, one that actually supports all scenarios:

  • can work on L1
  • can work on L2
  • can work on multiple nodes
  • can work with multiple named caches and cache-key prefix

But how?

By simply adding support for a special "*" tag (star, meaning "all") we can achieve that.

This tag can also receive a special treatment, like being immediately read from L2 when an update notification is received, for performance reasons.

Server-Assisted Tag Invalidation?

Does this approach exclude an hypothetical "Server-Assisted Tag Invalidation" with an extended IFusionCacheLevel or similar?

No, actually not! But supporting tagging without that means that the feature can be available right now, for any existing IDistributedCache implementation, without requiring any extra assistance from 3rd party packages, and with maybe a couple of extra reads here and there.

In the future though I think I will also explore the server-assisted route, because it can lead to a good perf boost: the nice thing about doing the client-assisted approach first though is that the feature will be available in both ways, and when using the eventual extended abstraction you'll "just" get an extra perf boost, but in both cases no limitations at all.

I think this is the best approach overall.

Where are we now?

Right now I have an implementation working on a local branch, which is already something damn awesome to be honest.

I'm currently in the process of fine tuning it, benchmarking it, test edge cases, trace/log the hell out of it to also see the extra work required while simulating real-world scenarios and so on.

If all goes well this feature will be included in FusionCache v2.0, which would be released at around the same time as .NET 9 , including support for the new HybridCache from Microsoft.

Your help is needed

But, honestly, it still seems too good to be true, and I may be missing something.

So, here we go: can you, dear user, please reason about
the approach, about pros/cons, and try to see
if you can spot any problem?

It would be a really invaluable thing for me to have, and I thank you for that in advance.

Thanks 🙏

@aKzenT
Copy link

aKzenT commented Oct 20, 2024

Great work!

One thing I was curious about while reading is your decision to implement your own strategy first, with the plan of leveraging .NET 9's new tagging support later. Doesn’t this carry the risk that you might later discover FusionCache's tagging approach isn’t fully compatible with .NET 9? There could be subtle issues that aren't obvious at the start. Personally, I would have done it the other way around—starting with a version that only supports .NET 9, and then adding a more general approach afterward. This might also prevent users from switching from FusionCache to HybridCache in .NET 9 if they’re aiming for the best performance.

Another key point for me, in my own projects, is ensuring seamless support for named caches. In many cases, you may not even need multiple tags if your cache entries are stored in separate named caches. So my main interest is in optimizing the performance of the Clear() method—specifically making sure there's no interference between different named caches. You already mentioned that this will be supported, so I’m glad it’s covered. I'm also happy to hear you’re considering performance improvements for the Clear() method by special-casing the * tag.

What wasn’t entirely clear is whether this tagging feature will be opt-in or enabled by default. Since using the Clear() method already requires the * tag, I’m guessing there’s no way to enable this feature without some performance impact, even if tags or the Clear() method aren’t used at all?

Another scenario worth considering is making it easy to have a setup where most nodes use the cache normally, but one specific node handles invalidation. For example, in a web server connected to a CMS, you could have a hook in the CMS that triggers an Azure function or similar process to invalidate cached entries when content changes. This node would start with an empty cache but would immediately remove entries by key or call Clear(). While I assume this would work, it may be worth optimizing for this type of scenario.

Lastly, based on my experience implementing cache invalidation in a cluster environment, it's fairly easy to get it working 99.9% of the time. However, reaching 100% reliability can be difficult, and the last 0.1% often leads to nasty bugs like persistent stale caches and the need for manual cache flushes. So, I’d recommend dedicating time to addressing edge cases like node restarts, unexpected shutdowns, and parallel operations on tags from multiple nodes.

Thanks!

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Oct 20, 2024

Hi @aKzenT

One thing I was curious about while reading is your decision to implement your own strategy first, with the plan of leveraging .NET 9's new tagging support later.

No, not really: as explained here I plan to support the Microsoft HybridCache abstraction, not the implementation.
This means that people will be able to use FusionCache "as an implementation of" HybridCache, but FusionCache will keep its own design and implementation separate.

Doesn’t this carry the risk that you might later discover FusionCache's tagging approach isn’t fully compatible with .NET 9?

I don't think so, since what must be respected are the abstraction and behaviour, meaning the public api surface area + the end result which, in both cases, should be that when users "evict by tag FOO" -> every entry tagged FOO will be, for all intent and purposes, evicted (or look like evicted).

Also, I don't know the actual timing for the release of Microsoft HybridCache: it should've been with .NET 9, but it may have been delayed (here the question was about multi-node notifications, what in FusionCache is the backplane, and the answer is "no, we haven't done it yet and no, it won't be there on day zero"), so I really don't know.

There could be subtle issues that aren't obvious at the start.

This is true, but my main point is to give FusionCache users the feature, and later see how to make it work with the new Microsoft abstraction.

Personally, I would have done it the other way around—starting with a version that only supports .NET 9, and then adding a more general approach afterward. This might also prevent users from switching from FusionCache to HybridCache in .NET 9 if they’re aiming for the best performance.

I wouldn't assume the best performance is over there (may be, mind you, but may be not).
But again, just like with the IDistributedCache abstraction, api surface area + behaviour should be maintained, that's all (I think).

Opinions?

Another key point for me, in my own projects, is ensuring seamless support for named caches.

FYI: HybridCache from Microsoft wil not support multiple named caches, nor DI keyed services.

In many cases, you may not even need multiple tags if your cache entries are stored in separate named caches.

Agree, for most cases this is true.

So my main interest is in optimizing the performance of the Clear() method—specifically making sure there's no interference between different named caches. You already mentioned that this will be supported, so I’m glad it’s covered.
I'm also happy to hear you’re considering performance improvements for the Clear() method by special-casing the * tag.

Good 😬

What wasn’t entirely clear is whether this tagging feature will be opt-in or enabled by default. Since using the Clear() method already requires the * tag, I’m guessing there’s no way to enable this feature without some performance impact, even if tags or the Clear() method aren’t used at all?

On the contrary, the idea is always (as much as possible) pay-per-use, so if you will not do anything tagging related, no extra cost will be involved.

Now, to be even more precise, yes technically there will be a fixed "extra cost"... in the form of a null check to see if a cache entry has tags: no tags, no extra cost.
But I think we can agree that a single null check is basically free right?

Btw when I'll be done with the feature I'll profile it even more and will warn for any extra cost associated with it, even when not used at all, se anyone will be informed and can make an informed decision.

Another scenario worth considering is making it easy to have a setup where most nodes use the cache normally, but one specific node handles invalidation. For example, in a web server connected to a CMS, you could have a hook in the CMS that triggers an Azure function or similar process to invalidate cached entries when content changes. This node would start with an empty cache but would immediately remove entries by key or call Clear(). While I assume this would work, it may be worth optimizing for this type of scenario.

In general it already works like this (meaning one "cms node" can be the one triggering evictions, while the other frontend nodes just receive the evictions), but I'll add this to my list of things to check for the Clear() approach, thanks!

Lastly, based on my experience implementing cache invalidation in a cluster environment, it's fairly easy to get it working 99.9% of the time. However, reaching 100% reliability can be difficult, and the last 0.1% often leads to nasty bugs like persistent stale caches and the need for manual cache flushes.

Eh, tell me about it 😅
You are absolutely right, and I've been there too: for example that's why with Auto-Recovery I tried to give a ready-made solution that would automatically cover 90% or more of the recovery scenarios, but I'm always open to new inputs.

BUt there's always more that can be done: if you have some experience there, some edge case to cover or any info at all please share them with me, it would be helpful to cover even more.

Oh, also: the public preview I'll release should also be good for that, so anyone can play with it and see how it works.

So, I’d recommend dedicating time to addressing edge cases like node restarts, unexpected shutdowns, and parallel operations on tags from multiple nodes.

On one hand: yes, totally.
On the other hand: all the things you described are already handled by the plumbing in FusionCache (like fail-safe, soft timeout, cache stampede protection, auto-recovery, etc) and that is why the idea to build tagging on top of the existing features is so nice, I think.

Thanks!

@aKzenT
Copy link

aKzenT commented Oct 20, 2024

Hi @aKzenT

One thing I was curious about while reading is your decision to implement your own strategy first, with the plan of leveraging .NET 9's new tagging support later.

No, not really: as explained here I plan to support the Microsoft HybridCache abstraction, not the implementation. This means that people will be able to use FusionCache "as an implementation of" HybridCache, but FusionCache will keep its own design and implementation separate.

You are right, I think what I should have asked is, what about the planned IDistributedCacheInvalidation interface that is proposed here: dotnet/aspnetcore#55308 ? Will this be compatible with the design proposed here so that FusionCache can take advantage of it?

[...]

Another key point for me, in my own projects, is ensuring seamless support for named caches.

FYI: HybridCache from Microsoft wil not support multiple named caches, nor DI keyed services.

I know, I think it's a real bummer and something that should be there from the start.

[...]
Now, to be even more precise, yes technically there will be a fixed "extra cost"... in the form of a null check to see if a cache entry has tags: no tags, no extra cost. But I think we can agree that a single null check is basically free right?

But as I understood, the Clear() method requires a "*" tag to be present. Even if that tag does not really exist on the wire, you would still need to check for the expiration of the "*" tag, don't you? Since you can't be sure if any node has called Clear(), you would need to make the Clear() functionality opt-in or pay for the price of quering for "*" tag expiration regularly, unless I misunderstood something.

[...]
BUt there's always more that can be done: if you have some experience there, some edge case to cover or any info at all please share them with me, it would be helpful to cover even more.

I'm not sure that there is something specific that I could share with you. In our project we opted for another approach for cache invalidation. Basically we assumed that reading from the cache is a lot more frequent than writing, so we tried to optimize the reading path. In our approach, each time you write an entry to the cache, we add the key to a redis hash set with a specific key that represents this cache group (tag). When we want to invalidate the cache, we iterate through the list of keys and delete them one by one. Of course this requires us to use some redis commands beyond what IDistributedCache offers and it probably would not work together with all the other features of FusionCache, but it works well for us.

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Oct 21, 2024

You are right, I think what I should have asked is, what about the planned IDistributedCacheInvalidation interface that is proposed here: dotnet/aspnetcore#55308 ? Will this be compatible with the design proposed here so that FusionCache can take advantage of it?

Ah, I see what you were thinking about, good point.

My idea about that part is to add support for the new abstractions, like IDistributedCacheInvalidation or IBufferDistributedCache, and when possible support them automatically.

In particular, IDistributedCacheInvalidation is basically their version of the Backplane with the part about tags included: again I don't feel like the "single entry with all tag invalidation info" aproach is the best one, but apart from that I think I can support that interface too and, if the IDistributedCache passed to FusionCache supports it, it may use that instead of the normal one.

Since IDistributedCacheInvalidation comes from Microsoft, it will more probable that 3rd party IDistributedCache implementers would implement it then FusionCache own IFusionCacheBackplane, and that is why I'll add support for it: one thing to be sure about is if that will support the necessary core pieces. For example by not having support from the get go to multiple named caches, I don't know if it will be able to support notifications for different caches via different named Redis pub/sub channels (in the case of Redis).

Having said that, a nice thing is that this is not strictly necessary: if there's a benefit to it, good, otherwise users will simply have the feature with FusionCache as the other features already available, even when passing from the new HybridCache abstraction.

In general though there are a lot of moving parts, and we'll have to wait and see, but the general approach I think is sound is this:

  • provide useful features for FusionCache users
  • optionally add support for new abstractions down the road, if and when it makes sense
  • make some features available for when using FusionCache as an HybridCache implementation

One thing to remember: binary compatibility betweeen HybridCache and FusionCache is not there, and not needed: when using FusionCache you are using FusionCache, the only thing to respect is the api public surface area.
To be more clear: having on one node the Microsoft impl of HybridCache and on another node FusionCache and have them talk to each others is not something that will be supported, or that even makes sense (imho).

I know, I think it's a real bummer and something that should be there from the start.

Having been there, done that, I know it's a lot of work for them too, and everywhere there are time constraints, resources constraints, etc including at Microsoft, so I feel for them.

But as I understood, the Clear() method requires a "" tag to be present. Even if that tag does not really exist on the wire, you would still need to check for the expiration of the "" tag, don't you? Since you can't be sure if any node has called Clear(), you would need to make the Clear() functionality opt-in or pay for the price of quering for "*" tag expiration regularly, unless I misunderstood something.

Right, I now see what you meant.

Technically you are correct, but we are talking about a single cache entry, shared with the entire cache, so the cost of handling it I think is negligible. The cost of checking it will be in 99.999% of the cases a single in-memory lookup.
On top of that I'm thinking about some extra optimization for for it, so that is always immediately available even without a lookup.
Finally, as you mentioned, I can add an extra option in FusionCacheOptions to disable support for it, and squeeze some extra perf if you really don't need it.

All needs to be measured, of course, but as it stands: what do you think?

I'm not sure that there is something specific that I could share with you. In our project we opted for another approach for cache invalidation. Basically we assumed that reading from the cache is a lot more frequent than writing

Agree, this is true 99% of the cases in the real world: in write-heavy systems where writes are way more frequent than reads caches are way less useful.

so we tried to optimize the reading path. In our approach, each time you write an entry to the cache, we add the key to a redis hash set with a specific key that represents this cache group (tag). When we want to invalidate the cache, we iterate through the list of keys and delete them one by one.

Makes sense, and that would've been the other approach, the one I called Server-Assisted: as said I will still play with it in the future.

Thanks again, this is a very useful conversation!

@angularsen
Copy link

angularsen commented Oct 29, 2024

I don't have much feedback yet other than that this looks very interesting to help solve my usecase of invalidating all cache for a particular user 🤩

The Client-Assisted Tag Invalidation approach should work well for us I think, we don't have any strong performance requirements yet that would require a server-assisted approach although I see the benefits of that too. I like the reasoning of providing both, where client-side works with all IDistributedCache, and maybe plugins can add support for various server-side solutions?

I'm following this closely and am eager to test it 🙌

@jodydonetti
Copy link
Collaborator Author

Hi @aKzenT

But as I understood, the Clear() method requires a "*" tag to be present. Even if that tag does not really exist on the wire, you would still need to check for the expiration of the "*" tag, don't you? Since you can't be sure if any node has called Clear(), you would need to make the Clear() functionality opt-in or pay for the price of quering for "*" tag expiration regularly, unless I misunderstood something.

Update on this: currently on my experimental branch the Clear() support has been special cased as I planned, so right now it's just a very fast long > long check 🚀

Also, I added support for a real Clear() underneath, when FusionCache detects that it's physically possible to do that: basically when there's only L1 (no L2 or backplane) and total ownership of the inner MemoryCache. By special casing that too it now don't even need to use the "*" tag in those scenarios, which is a huge win.

Will update more in the next few days, and a preview version is right around the corner 😬

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Oct 29, 2024

Hi @angularsen

The Client-Assisted Tag Invalidation approach should work well for us I think, we don't have any strong performance requirements yet that would require a server-assisted approach although I see the benefits of that too.

One thing to notice is that the performance impact would likely be there for the server-assisted version, too, just in a different way: in short it would equate to a massive UPDATE/DELETE FROM cache WHERE ... which not a lot of caches can do, and even when they can - like with Redis + HSET - it would still not be cheap.

The nice advantage of the client-assisted approach is that it's automatically balanced between all nodes/caches, distributed over time, lazy (only when in fact needed) and self-cleaning.

I keep thinking about the details and behavior of such approach, and it may very well be the nicest, most balanced one all things considered.

Will post more of my considerations soon.

I'm following this closely and am eager to test it 🙌

That's great: a preview version will be out soon, thanks!

@b-twis
Copy link

b-twis commented Oct 30, 2024

Hi @jodydonetti,

This is excellent news as I have been wanting to use Fusion Cache for some time and this was considered a blocker based on how we currently utilise our in-house L1 (MemoryCache) + L2 (Redis) system.

Since we are using Client-Assisted invalidation, would it make sense to consider using expiry tokens in the local cache? Or would that cause too many complications with other FusionCache functionality like Fail-Safe?

In regards to Clear(), the proposed Client-Assisted approach makes the * essentially a filter on the cached data which is still held in L1.

Also, I added support for a real Clear() underneath, when FusionCache detects that it's physically possible to do that: basically when there's only L1 (no L2 or backplane) and total ownership of the inner MemoryCache. By special casing that too it now don't even need to use the "*" tag in those scenarios, which is a huge win.

What is the primary driver for the above mentioned special case? Is it that the expiry timestamp values are stored in the same cache as the actual data, and clear would also remove these?

As per my understanding, it would not be safe to remove the L1 instance of these timestamps as they are required to determine if the L2 values are marked as expired.

If that is the case, can the L1 expiry timestamps be stored in a separate location within IFusionCache (like a singleton, or an isolated MemoryCache). That way Clear may not need so much special handling and it could proactively trigger a .Clear() on all other nodes in the backplane when the * value timestamp is changed.

Regardless, of the above the use of L2 means that the * tag will always be needed as Clear functionality is not available, nor safe on a shared L2 cache.

-B

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Oct 30, 2024

Hi @b-twis

This is excellent news as I have been wanting to use Fusion Cache for some time and this was considered a blocker based on how we currently utilise our in-house L1 (MemoryCache) + L2 (Redis) system.

That is nice to know 😬

Since we are using Client-Assisted invalidation, would it make sense to consider using expiry tokens in the local cache? Or would that cause too many complications with other FusionCache functionality like Fail-Safe?

I think that would be, as you guessed it, problematic. Not really for fail-safe, at least not at first, but mostly related to differences between L1 and L2, different internal update flows and so on.
But I will give it a try nonetheless, will think about it.

In regards to Clear(), the proposed Client-Assisted approach makes the * essentially a filter on the cached data which is still held in L1.

Yes, but also not just that: it will act as a filter, yes, but also it will automatically clean up entries as they are discovered to be expired by a tag. This means that the system will automatically clean up as it is being used, which I think is a really nice additional bonus.

What is the primary driver for the above mentioned special case?

To effectively release memory when the scenario allows for that: normally, data would either expire after Duration or be cleaned up one by one as stated above. This would do more, when possible.

Is it that the expiry timestamp values are stored in the same cache as the actual data, and clear would also remove these?

Eheh also that, you spotted it 😬

As per my understanding, it would not be safe to remove the L1 instance of these timestamps as they are required to determine if the L2 values are marked as expired.

Correct, and that's why I stated (look for the bold part):

Also, I added support for a real Clear() underneath, when FusionCache detects that it's physically possible to do that: basically when there's only L1 (no L2 or backplane) and total ownership of the inner MemoryCache. By special casing that too it now don't even need to use the "*" tag in those scenarios, which is a huge win.

If that is the case, can the L1 expiry timestamps be stored in a separate location within IFusionCache (like a singleton or an isolated MemoryCache).

I saw others are exploring the dictionary approach (I think HybridCache is doing this), but that means that the dictionary would grow forever until restarts, which is not good imho.
And if you add an expiration to a dictionary you get... a separate cache.
But a purely memory cache would also need to be maintained separately, and also store data in a (separate?) L2, and also notify evictions other nodes, and... and that is basically another FusionCache, which is why I picked the route of internal cache entries in the same cache, to basically share the same plumbing and configuration of the already used FusionCache.

Thoughts?

That way Clear may not need so much special handling and it could proactively trigger a .Clear() on all other nodes in the backplane when the * value timestamp is changed.

You should also consider the case of a shared memory cache as the L1: this is also used, and it's why I stated above "and total ownership of the inner MemoryCache", exactly to avoid problems in this case.

To give you an idea, for every feature I basically need to consider these possible scenarios:

  • L1 only
  • L1 + L2
  • L1 + backplane (yes, somebody uses a backplane without an L2...)
  • L1 + L2 + backplane
  • all of the above, for both a shared L1 and an owned/isolated L1
  • all of the above that has an L2, for both a shared L2 (eg: same Redis instance and database) and an owned/isolated L2
  • maybe throw a cache key prefix into the mix, or not

Yeah, I know, the permutations of all possible scenarios is quite daunting 😅

Note that for people using an L2 but not a backplane, the solution is normally to have a low L1 Duration and a higher L2 Duration.

Regardless, of the above the use of L2 means that the * tag will always be needed as Clear functionality is not available, nor safe on a shared L2 cache.

I'm not sure I totally understood this last part, so I'll try with 2 different meanings:

  • if you mean doing a real Clear() with an L2, then the answer as said is that I will not do that (limited to L1 only + other restrictions)
  • if you mean the simplified long > long check, the idea is that yes I would still need the special cache entry for "*" tag, but only to be consistent and have an L2 fallback for cold starts (L1 emtpy) and in case of transient issues, but I am also updating the in-memory "clear timestamp" via backplane notifications immediately, and basically get the best of both worlds

Again, building on the existing plumbing and features would make this really a good way, imho.

Thoughts?

Thanks for sharing, this is really important for me to validate and fix my approach for Tagging!

@jodydonetti
Copy link
Collaborator Author

Hi all, v2.0.0-preview-1 is out 🥳
This includes Tagging and Clear() support!

🙏 Please, if you can try it out and let me know what you think, how it feels to use it or anything else really: your contribution is essential, thanks!

@jrlost
Copy link

jrlost commented Nov 11, 2024

Have been looking forward to this release for a while. I pulled it into our platform today to start experimenting. First, the interfaces all make sense and incorporating it into existing implementations was reasonably seamless.
I am currently running into an issue where some of my objects do not seem to be persisting the tags into L2 (so likely are not in L1); this brings me to my recommendation. Is there any chance we can get the logging expanded to include the tags? That would help me try to diagnose my issue.

Again, just wanted to say thank you for putting this together.

@jodydonetti
Copy link
Collaborator Author

Hi @jrlost first of all thank you for trying it out.

Have been looking forward to this release for a while. I pulled it into our platform today to start experimenting. First, the interfaces all make sense and incorporating it into existing implementations was reasonably seamless.

This is really good to know, I tried to make the design seamless with existing code, so it's good to know that.

I am currently running into an issue where some of my objects do not seem to be persisting the tags into L2 (so likely are not in L1); this brings me to my recommendation. Is there any chance we can get the logging expanded to include the tags? That would help me try to diagnose my issue.

I left this note in my code:

image

I think I have an answer then 😬

So yeah I'll add tags in the next preview.

Out of curiosity, and to help me test things out, which L2 and serializer are you using?

Again, just wanted to say thank you for putting this together.

Thank you again for trying it out!

@jrlost
Copy link

jrlost commented Nov 11, 2024 via email

@jodydonetti
Copy link
Collaborator Author

System.text.json and redis.

Thanks, are you also using the backplane?

@jrlost
Copy link

jrlost commented Nov 12, 2024 via email

@jodydonetti
Copy link
Collaborator Author

Hi @jrlost I just enabled tags logging locally and it's working well, will release a new preview version soon.

Meanwhile: are you able to come up with a MRE of ti not working as expected?

Thanks!

@jrlost
Copy link

jrlost commented Nov 14, 2024 via email

@jodydonetti
Copy link
Collaborator Author

Awesome. Unfortunately I haven't had time to dig into it any further to understand why it works sometimes and not other times.

Ok this is already an indication, good to know.

After you push out the logging stuff, that should help me isolate what's special about it. From a high level, my best guess is that it seems to work with setAsync but not getOrSetAsync.

Another piece of info, good.

I'll try to look into it to see if I find something.

Thanks!

@jodydonetti
Copy link
Collaborator Author

Hi @jrlost I just released preview-2.

Just set IncludeTagsInLogs to true in the options and do some tests.

Let me know, thanks!

@jrlost
Copy link

jrlost commented Nov 14, 2024 via email

@angelofb
Copy link

Is it possible to update one or more tags of an existing cached Item?

@jodydonetti
Copy link
Collaborator Author

Is it possible to update one or more tags of an existing cached Item?

Hi @angelofb , partial updates of a cache entry's data are not supported.
Think of it like this: every cache entry, which is the combination of value + tags + metadata (like expiration, etc), can only be updated atomically.

To update tags for a cache entry you need to do a SET-like operation (eg: Set/GetOrSet) and overwrite it all, since this will allow FusionCache to do its things like events, backplane notifications, etc...

Any use case in particular you'd like to share?

@angelofb
Copy link

thank you, I don't have an use case, I was just wondering.
great job, so far no problem with 2.0.0-preview-2.

@jrlost
Copy link

jrlost commented Nov 15, 2024

@jodydonetti , I pulled preview-2 down and tried it out, thanks again BTW. I can confirm that all entries where a tag was added via GetOrSetAsync result in [T=] in the logs; which aligns with what I was seeing in Redis. All entries where a tag was added via SetAsync contain a value in the logs as well as in L2.

@jodydonetti
Copy link
Collaborator Author

Awesome!

I'm now adding specific SkipMemoryCacheRead, SkipMemoryCacheWrite, SkipDistributedCacheRead and SkipDistributedCacheWrite as discussed here since they have been asked, and also to be aligned with HybridCache which will be released soon.

Also, talking about HybridCache, I'm working on the compatible version which is coming along very nicely.

Damn, I also need to create specific issues to track those activities 🥲

Anyway will update soon.

@jodydonetti
Copy link
Collaborator Author

Hi all, I just published a dedicated issue for the Clear() feature.

It contains some details about the mechanics behind it, the design of it, performance considerations and more.

@Jaben
Copy link

Jaben commented Dec 5, 2024

@jodydonetti , I pulled preview-2 down and tried it out, thanks again BTW. I can confirm that all entries where a tag was added via GetOrSetAsync result in [T=] in the logs; which aligns with what I was seeing in Redis. All entries where a tag was added via SetAsync contain a value in the logs as well as in L2.

@jodydonetti to be clear: I'm seeing the same issue with preview-2 -- TAGS don't work with GetOrSetAsync() call. Switched to using SetAsync() (which isn't ideal) but the tags appear in Redis.

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Dec 5, 2024

@jodydonetti to be clear: I'm seeing the same issue with preview-2 -- TAGS don't work with GetOrSetAsync() call. Switched to using SetAsync() (which isn't ideal) but the tags appear in Redis.

Damn, I only now realized I misinterpreted @jrlost comment!

Oh dear, I even answered "awesome!" 🤣

Sorry all, anyway I'm about to get out with a new preview with a lot of new stuff, and I think I even know why that was happening: you are probably passing the tags directly to the GetOrSet method call, right?
If you try to set the tags via the factory's ctx it'll probably works.

Anyway I'll update you soon, sorry again.

@jrlost
Copy link

jrlost commented Dec 5, 2024 via email

@Jaben
Copy link

Jaben commented Dec 5, 2024

@jodydonetti to be clear: I'm seeing the same issue with preview-2 -- TAGS don't work with GetOrSetAsync() call. Switched to using SetAsync() (which isn't ideal) but the tags appear in Redis.

Damn, I only now realized I misinterpreted @jrlost comment!

Oh dear, I even answered "awesome!" 🤣

Sorry all, anyway I'm about to get out with a new preview with a lot of new stuff, and I think I even know why that was happening: you are probably passing the tags directly to the GetOrSet method call, right? If you try to set the tags via the factory's ctx it'll probably works.

Anyway I'll update you soon, sorry again.

Bug appears to be here in both FusionCache_Async and Sync:

image

The tags param passed into the ExecuteEagerRefreshWithAsyncFactory method isn't being used.

@jodydonetti
Copy link
Collaborator Author

Ah, so it was about eager refresh, good catch.

I'll make a test for that right now, will update later.

@jodydonetti
Copy link
Collaborator Author

Update:

image

Will publish a new preview version this weekend!

@jodydonetti
Copy link
Collaborator Author

Hi all, v2.0.0-preview-3 is out 🥳

@gerwim
Copy link

gerwim commented Dec 11, 2024

@jodydonetti Awesome feature! Are there also plans to add methods like GetByTags?

@jrlost
Copy link

jrlost commented Dec 12, 2024

@jodydonetti , I can confirm that preview 3 has solved my issues. I'm now able to see tags in both the L2 and logs for every object, not just the ones added via SetAsync. Thank you for pushing this out, I'll continue to poke at it, but initial impressions are great.

@onionhammer
Copy link
Contributor

@jodydonetti Awesome feature! Are there also plans to add methods like GetByTags?

Wouldnt that be a heterogenous types of results potentially?

@jodydonetti
Copy link
Collaborator Author

@jodydonetti , I can confirm that preview 3 has solved my issues. I'm now able to see tags in both the L2 and logs for every object, not just the ones added via SetAsync. Thank you for pushing this out, I'll continue to poke at it, but initial impressions are great.

Awesome, thanks for the great feedback, really appreciated!

@gerwim
Copy link

gerwim commented Dec 12, 2024

@jodydonetti Awesome feature! Are there also plans to add methods like GetByTags?

Wouldnt that be a heterogenous types of results potentially?

Yes, you are absolutely right. What I actually meant was GetKeysByTags.

We have a very specific use case, where we log health check results and cache their results for a short time (and tag them if they fail). While this is a very specific use case and thinking about it, I'm not sure if it's worth implementing my suggested method.

@jodydonetti
Copy link
Collaborator Author

Hi @gerwim

@jodydonetti Awesome feature! Are there also plans to add methods like GetByTags?

They recently asked the same thing over on Twitter, see here.
The answer is no, it's not possible and probably never will be because of concrete limitations of distributed caches out there (eg: Redis, Memcached, etc).
Even Marc Gravell (see HybridCache from Microsoft) thinks the same.

Hope this helps.

@Abdragiz
Copy link

Abdragiz commented Dec 25, 2024

I’d like to share some thoughts about the tag support feature in FusionCache. This feature is also used in the Output Caching middleware in ASP.NET Core, which could make it worth considering adding an implementation of IOutputCacheStore for FusionCache.

I believe that adding tag support would make it possible for FusionCache to implement the IOutputCacheStore interface, similar to the Redis implementation provided in the ASP.NET Core repository (see: RedisOutputCacheStore.cs). This would allow FusionCache to seamlessly integrate with ASP.NET Core applications that use output caching. More details about the Output Caching middleware can be found here: Output Caching documentation.

Interestingly, FusionCache also has a client-side counterpart to output caching called Conditional Refresh. By implementing IOutputCacheStore, it might be possible to use a single Redis instance for both Conditional Refresh and the Output Caching middleware, reducing the need to maintain separate Redis instances for client-side and server-side caching.

@jodydonetti
Copy link
Collaborator Author

I’d like to share some thoughts about the tag support feature in FusionCache. This feature is also used in the Output Caching middleware in ASP.NET Core, which could make it worth considering adding an implementation of IOutputCacheStore for FusionCache.

I believe that adding tag support would make it possible for FusionCache to implement the IOutputCacheStore interface, similar to the Redis implementation provided in the ASP.NET Core repository (see: RedisOutputCacheStore.cs).

Yup, I know: stay tuned 😬

@DGibbsCrafted
Copy link

Hey @jodydonetti - this is a really useful feature so kudos to you for bringing it to fruition!

I just have a question around invalidating many tags at the same time.

I have a use case where I will need to evict ~30 tags in one go and I'm just looping the collection of tags and calling RemoveByTag. Is there a better way of doing this, e.g. can I pass a collection of tags and have it happen in a single call? Unfortunately I am limited to using the sync version of this method and it's blocking for ~2 secs. Not the end of the world but just wondering if there's a better way to achieve what I'm trying to do?

Apologies if this isn't the right place to ask.

Thanks!

@jodydonetti
Copy link
Collaborator Author

Hi @DGibbsCrafted

I just have a question around invalidating many tags at the same time.

I have a use case where I will need to evict ~30 tags in one go and I'm just looping the collection of tags and calling RemoveByTag. Is there a better way of doing this, e.g. can I pass a collection of tags and have it happen in a single call? Unfortunately I am limited to using the sync version of this method and it's blocking for ~2 secs. Not the end of the world but just wondering if there's a better way to achieve what I'm trying to do?

The HybridCache abstraction has a RemoveByTag overload that takes a list of tags, but as you can see here what it does (except for some minor special-case optimizations) is just a foreach over the tags.

One thing you can try, since I suppose you are using an L2 so the bottleneck may be the IO and everything is thread-safe anyway, is to do a Parallel.ForEach over the tags so they can go in parallel.

Having said that, I would try to see what is the real benefit for it by measuring on a real system, to be sure it's not just overkill.

Anyway, one last thing I may do before going GA is to add to FusionCache the same RemoveByTag overload with multiple tags, but let me check first for edge cases or strange things.

Will update you asap.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests