Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] use Timer in MemoryCache #45842

Closed
wants to merge 8 commits into from

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Dec 9, 2020

This is my last proposal for improving MemoryCache, at least for a while (I've run out of ideas).

After #45281, #45280, #45410, #45563, aspnet/Benchmarks#1603, and aspnet/Benchmarks#1607

we got to the place where the RPS for TE Caching benchmark has increased by 15% (from ~230k to ~270k).

Now, most of the time related to caching (most of the time, in general, is spent in JSON serialization) and not related to the usage of Concurrent Dictionary is spent in getting current time (to tell if the given entry has expired or not):

obraz

In the case of the TE Caching benchmark, all cache entries have no expiration date. So this is kind of redundant and could be detected before getting the current time.

The problem is that MemoryCache uses every public method to check if it has to scan existing cache items for expired ones. To be able to do that, it needs the current time.

We could store the information about the fact that all items in the given cache can't expire, but it would rather be a hack..

Instead of this, we can introduce a Timer to MemoryCache and ask it to perform the scan in the background at configured time interval. And this is what this proposal is all about.

I assume that most of our users have a single instance of the cache and they use the default expiration scan frequency (1 minute).

This change gives us a 10% boost for the TE caching benchmark (from 274 to 307k RPS). After it, the Concurrent Dictionary becomes a bottleneck. But even if we replace the cache with an array, the RPS is around 330k so it puts us very close to the maximum limit.

FWIW the microbenchmark results:

Method Toolchain Mean Ratio
GetHit \after\CoreRun.exe 22.46 ns 0.21
GetHit \before\CoreRun.exe 108.11 ns 1.00
TryGetValueHit \after\CoreRun.exe 21.71 ns 0.20
TryGetValueHit \before\CoreRun.exe 111.43 ns 1.00
GetMiss \after\CoreRun.exe 14.41 ns 0.15
GetMiss \before\CoreRun.exe 92.94 ns 1.00
TryGetValueMiss \after\CoreRun.exe 14.35 ns 0.15
TryGetValueMiss \before\CoreRun.exe 93.17 ns 1.00
SetOverride \after\CoreRun.exe 296.49 ns 0.99
SetOverride \before\CoreRun.exe 297.80 ns 1.00
AddThenRemove_NoExpiration \after\CoreRun.exe 46,419.16 ns 0.93
AddThenRemove_NoExpiration \before\CoreRun.exe 50,281.66 ns 1.00
AddThenRemove_AbsoluteExpiration \after\CoreRun.exe 51,470.07 ns 0.97
AddThenRemove_AbsoluteExpiration \before\CoreRun.exe 52,867.99 ns 1.00
AddThenRemove_RelativeExpiration \after\CoreRun.exe 51,951.99 ns 0.95
AddThenRemove_RelativeExpiration \before\CoreRun.exe 54,729.02 ns 1.00
AddThenRemove_SlidingExpiration \after\CoreRun.exe 44,456.62 ns 0.86
AddThenRemove_SlidingExpiration \before\CoreRun.exe 51,965.55 ns 1.00

@eerhardt @davidfowl @maryamariyan @Tratcher

@adamsitnik adamsitnik added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) tenet-performance Performance related issue discussion area-Extensions-Caching labels Dec 9, 2020
@ghost
Copy link

ghost commented Dec 9, 2020

Tagging subscribers to this area: @eerhardt, @maryamariyan
See info in area-owners.md if you want to be subscribed.

Issue Details

This is my last proposal for improving MemoryCache, at least for a while (I've run out of ideas).

After #45281, #45280, #45410, #45563, aspnet/Benchmarks#1603 and aspnet/Benchmarks#1607

we got to the place where the RPS for TE Caching benchmark has increased by 15% (from ~230k to ~270k).

Now, most of the time related to caching (most of the time in general is spent in JSON serialization) and not related to usage of Concurrent Dictionary is spent in getting current time (to tell if given entry has expired or not):

obraz

In case of TE Caching benchmark all cache entries have no expiration date. So this is kind of redundant and could be detected before getting current time.

The problem is that MemoryCache uses every public method to check if it has to scan existing cache items for expired ones. To be able to do that, it needs the current time.

We could store the information about the fact that all items in given cache can't expire, but it would rather be a hack..

Instead of this, we can introduce a Timer to MemoryCache and ask it to perfom the scan in background at configured time interval. And this is what this proposal is all about.

I assume that most of our users have a single instance of the cache and they use the default expiration scan frequency (1 minute).

This change gives us a 10% boost for the TE caching benchmark (from 274 to 307k RPS). After it, the Concurrent Dictionary becomes a bottleneck. But even if we replace the cache with an array, the RPS is around 330k so it put's us very close to the maximum limit.

FWIW the microbenchmark results:

Method Toolchain Mean Ratio
GetHit \after\CoreRun.exe 22.46 ns 0.21
GetHit \before\CoreRun.exe 108.11 ns 1.00
TryGetValueHit \after\CoreRun.exe 21.71 ns 0.20
TryGetValueHit \before\CoreRun.exe 111.43 ns 1.00
GetMiss \after\CoreRun.exe 14.41 ns 0.15
GetMiss \before\CoreRun.exe 92.94 ns 1.00
TryGetValueMiss \after\CoreRun.exe 14.35 ns 0.15
TryGetValueMiss \before\CoreRun.exe 93.17 ns 1.00
SetOverride \after\CoreRun.exe 296.49 ns 0.99
SetOverride \before\CoreRun.exe 297.80 ns 1.00
AddThenRemove_NoExpiration \after\CoreRun.exe 46,419.16 ns 0.93
AddThenRemove_NoExpiration \before\CoreRun.exe 50,281.66 ns 1.00
AddThenRemove_AbsoluteExpiration \after\CoreRun.exe 51,470.07 ns 0.97
AddThenRemove_AbsoluteExpiration \before\CoreRun.exe 52,867.99 ns 1.00
AddThenRemove_RelativeExpiration \after\CoreRun.exe 51,951.99 ns 0.95
AddThenRemove_RelativeExpiration \before\CoreRun.exe 54,729.02 ns 1.00
AddThenRemove_SlidingExpiration \after\CoreRun.exe 44,456.62 ns 0.86
AddThenRemove_SlidingExpiration \before\CoreRun.exe 51,965.55 ns 1.00

@eerhardt @davidfowl @maryamariyan @Tratcher

Author: adamsitnik
Assignees: -
Labels:

* NO MERGE *, Discussion, area-Extensions-Caching, tenet-performance

Milestone: -

Comment on lines +64 to +66
_timer = new System.Timers.Timer(Math.Max(1, _options.ExpirationScanFrequency.TotalMilliseconds));
_timer.Elapsed += OnTimerElapsed;
_timer.Start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bad for testability. Any timing mechanism needs to be abstracted for the unit tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but it's still doable

@Tratcher
Copy link
Member

Tratcher commented Dec 9, 2020

Now, most of the time related to caching (most of the time, in general, is spent in JSON serialization) and not related to the usage of Concurrent Dictionary is spent in getting current time (to tell if the given entry has expired or not):

Have you tried making the current time check more efficient?

@adamsitnik
Copy link
Member Author

Have you tried making the current time check more efficient?

I did: #45281 Now it's just a syscall and I can't see a way to optimize it further

@Clockwork-Muse
Copy link
Contributor

...because these should always be absolute stamps, in theory you could do something like get the current process/system time on cache initialization, via something like Environment.TickCount64 or similar (obviously you'd need an abstraction for testing purposes, and TickCount64 is in milliseconds, which might not be sufficient), then use that for tracking expiration. This should remove the verification when getting the current time, although I don't know how much that would help.

Copy link
Member

@Tratcher Tratcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding a timer in the implementation makes the component impossible to test reliably. Tests must be in full control of any timing. This is a blocking design point to resolve.

@drieseng
Copy link
Contributor

drieseng commented Dec 9, 2020

@adamsitnik From having a quick look at MemoryCache there a few (small) quick wins left:

Don't hesitate to correct me!

@@ -24,10 +22,10 @@ public class MemoryCache : IMemoryCache

private readonly MemoryCacheOptions _options;
private readonly ConcurrentDictionary<object, CacheEntry> _entries;
private readonly System.Timers.Timer _timer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the abstraction thing, why not use System.Threading.Timer? It should have less overhead, as System.Timers.Timer also relies on Threading.Timer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, nothing should be using System.Timers.Timer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used this particular timer because it was the first one VS has suggested ;)

I wanted to have a proof of concept to get some numbers and get feedback before I invest more in the implementation.

@adamsitnik
Copy link
Member Author

@drieseng you are right, but the places that you have pointed are not on the hot path that I am optimizing. So even if I optimize them it won't have any effects

@adamsitnik
Copy link
Member Author

Embedding a timer in the implementation makes the component impossible to test reliably. Tests must be in full control of any timing. This is a blocking design point to resolve.

It's harder, but not impossible to mock.

The question is whether the testing of MemoryCache should be exposed to the users? This would require me to introduce an official public abstraction for Timer similar to what you did with ISystemClock.

@adamsitnik
Copy link
Member Author

This should remove the verification when getting the current time

I've already removed the redundant verification in #45281 Whatever Environment.TickCount64 uses, it's most probably a syscall and would not solve our problem

@drieseng
Copy link
Contributor

drieseng commented Dec 9, 2020

@drieseng you are right, but the places that you have pointed are not on the hot path that I am optimizing. So even if I optimize them it won't have any effects

It should affect the AddThenRemove and SetOverride benchmark results you've mentioned in the description... BUT... I'm saying this without having actually seen the benchmark code and without having tested the proposed changes :p

If you can guarantee that I can have the dotnet runtime built on my Windows box in 30 minutes, I definitely want to try to prove you wrong (or admit my defeat).

@edevoogd
Copy link

edevoogd commented Dec 9, 2020

I may have missed some crucial implementation details, but could this work instead to achieve the objective?

  • Change the signature of CacheEntry.CheckExpired() to take a Lazy<DateTimeOffset> instead of a DateTimeOffset. The implementation of CacheEntry.CheckForExpiredTime() would only invoke the factory method (now would be changed to now.Value) if AbsoluteExpiration.HasValue is true or _slidingExpiration.HasValue is true.
  • Set CacheEntry.LastAccessed only if _slidingExpiration.HasValue is true.

@filipnavara
Copy link
Member

filipnavara commented Dec 10, 2020

Whatever Environment.TickCount64 uses, it's most probably a syscall

Not necessarily, Windows and some versions of Linux have a special memory page read-only mapped into all processes and this value is directly readable from it without a syscall.

It is also monolithically increasing unlike the real clock (which may be an attack vector for cache performance degradation).

@Clockwork-Muse
Copy link
Contributor

I've already removed the redundant verification in #45281

... and, at least on windows, there's a whole bunch of stuff that happens when calling DateTime.UtcNow, like leap second handling.

@adamsitnik
Copy link
Member Author

at least on windows, there's a whole bunch of stuff that happens when calling DateTime.UtcNow, like leap second handling

I've optimized everything except leap seconds in the past there: dotnet/coreclr#26046

And now @GrabYourPitchforks is working on leap second handling perf: #44771

@gfoidl gfoidl mentioned this pull request Dec 14, 2020
@edevoogd
Copy link

edevoogd commented Jan 6, 2021

@adamsitnik
In a previous comment, I did not take into account CacheEntry.LastAccessed being updated on a lot of the public cache operations. You may want to have a look at a suggested direction for further optimization that I outlined here.

@adamsitnik
Copy link
Member Author

@edevoogd big thanks for your proposal! I really like the idea described in https://github.com/edevoogd/ClockQuantization!

@adamsitnik
Copy link
Member Author

For reasons described by @DamianEdwards on Twitter, I’ve decided to not pursue this idea. If we ever decide that we want to be in the Top 5 for this benchmark, we should reconsider it.

@adamsitnik adamsitnik closed this Jan 15, 2021
@davidfowl
Copy link
Member

Yes we do want to do this.

@filipnavara
Copy link
Member

I think you should still re-evaluate using Environment.TickCount64 instead of DateTime.UtcNow. The performance characteristics ought to be slightly different.

@edevoogd
Copy link

edevoogd commented Jan 27, 2021

FWIW, local experimentation with changes in edevoogd/ClockQuantization#1 shows the additional benefit of using Environment.TickCount64.

@filipnavara @adamsitnik

@ghost ghost locked as resolved and limited conversation to collaborators Feb 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Extensions-Caching discussion NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants