Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce CacheEntry size #45962

Merged
merged 19 commits into from
Jan 7, 2021
Merged

Conversation

adamsitnik
Copy link
Member

Based on a customer feedback received in #45592 (comment) I decided to try to shrink CacheEntry a little bit more. I was able to reduce the managed memory needed for allocating a CacheEntry from 280 to 224 bytes (on x64).

Main changes:

  • remove two enum fields (2x sizeof(int)), store the information in an union struct called CacheEntryState. It now consists of two bytes for the mentioned enums and one byte for all the boolean flags. There is one more that is still not used, so we can add more stuff in the future ;). This looks like a magic and @roji is responsible for teaching me this kind of crazy stuff ;)
  • move 3 lists related to tokens to a separate type called CacheEntryTokens. This allows us for saving space if they are not used (the most common scenario)
  • remove the _lock field, use the field that stores CacheEntryTokens for locking instead (it guards the access to the tokens and the lock is needed only to work with the tokens)

The CPU time has not changed, but allocations did:

Method Toolchain Allocated
SetOverride \after_size\CoreRun.exe 224 B
SetOverride \before_size\CoreRun.exe 280 B
CreateEntry \after_size\CoreRun.exe 224 B
CreateEntry \before_size\CoreRun.exe 280 B
AddThenRemove_NoExpiration \after_size\CoreRun.exe 42,370 B
AddThenRemove_NoExpiration \before_size\CoreRun.exe 47,058 B
AddThenRemove_AbsoluteExpiration \after_size\CoreRun.exe 41,265 B
AddThenRemove_AbsoluteExpiration \before_size\CoreRun.exe 47,490 B
AddThenRemove_RelativeExpiration \after_size\CoreRun.exe 54,482 B
AddThenRemove_RelativeExpiration \before_size\CoreRun.exe 61,090 B
AddThenRemove_SlidingExpiration \after_size\CoreRun.exe 41,313 B
AddThenRemove_SlidingExpiration \before_size\CoreRun.exe 46,002 B
AddThenRemove_ExpirationTokens \after_size\CoreRun.exe 53,706 B
AddThenRemove_ExpirationTokens \before_size\CoreRun.exe 54,634 B

@ghost
Copy link

ghost commented Dec 11, 2020

Tagging subscribers to this area: @eerhardt, @maryamariyan
See info in area-owners.md if you want to be subscribed.

Issue Details

Based on a customer feedback received in #45592 (comment) I decided to try to shrink CacheEntry a little bit more. I was able to reduce the managed memory needed for allocating a CacheEntry from 280 to 224 bytes (on x64).

Main changes:

  • remove two enum fields (2x sizeof(int)), store the information in an union struct called CacheEntryState. It now consists of two bytes for the mentioned enums and one byte for all the boolean flags. There is one more that is still not used, so we can add more stuff in the future ;). This looks like a magic and @roji is responsible for teaching me this kind of crazy stuff ;)
  • move 3 lists related to tokens to a separate type called CacheEntryTokens. This allows us for saving space if they are not used (the most common scenario)
  • remove the _lock field, use the field that stores CacheEntryTokens for locking instead (it guards the access to the tokens and the lock is needed only to work with the tokens)

The CPU time has not changed, but allocations did:

Method Toolchain Allocated
SetOverride \after_size\CoreRun.exe 224 B
SetOverride \before_size\CoreRun.exe 280 B
CreateEntry \after_size\CoreRun.exe 224 B
CreateEntry \before_size\CoreRun.exe 280 B
AddThenRemove_NoExpiration \after_size\CoreRun.exe 42,370 B
AddThenRemove_NoExpiration \before_size\CoreRun.exe 47,058 B
AddThenRemove_AbsoluteExpiration \after_size\CoreRun.exe 41,265 B
AddThenRemove_AbsoluteExpiration \before_size\CoreRun.exe 47,490 B
AddThenRemove_RelativeExpiration \after_size\CoreRun.exe 54,482 B
AddThenRemove_RelativeExpiration \before_size\CoreRun.exe 61,090 B
AddThenRemove_SlidingExpiration \after_size\CoreRun.exe 41,313 B
AddThenRemove_SlidingExpiration \before_size\CoreRun.exe 46,002 B
AddThenRemove_ExpirationTokens \after_size\CoreRun.exe 53,706 B
AddThenRemove_ExpirationTokens \before_size\CoreRun.exe 54,634 B
Author: adamsitnik
Assignees: -
Labels:

area-Extensions-Caching, tenet-performance

Milestone: -

@hopperpl
Copy link

Question: why is the atomic interlocked-exchange needed?

If you update the whole 4-byte field, it needs to be atomic, yes. But if you just update the individual field, shouldn't the JIT guarantee that only the partial bytes are changed in an atomic way?

            internal EvictionReason EvictionReasonSimple
            {
                get => (EvictionReason)_evictionReason;
                set
                {
                        _evictionReason = (byte)value;
                }
            }

results in ...

Microsoft.Extensions.Caching.Memory.CacheEntry+CacheEntryState.set_EvictionReasonSimple(Microsoft.Extensions.Caching.Memory.EvictionReason)
    L0000: mov [rcx+1], dl
    L0003: ret

which is just a single byte update... and does not change the surrounding bytes. It's an atomic operation on the CPU. Fields can be changed independently and don't interfere with another.

I would get rid of _state and update the fields normally. The JIT should account for the correct atomic field update.

Unless I'm overlooking something here.

@adamsitnik
Copy link
Member Author

Question: why is the atomic interlocked-exchange needed?

@hopperpl I've analysed the code again (and the way state can be updated) and you were right, it was not needed. I've added a simple test to ensure that.

@hopperpl
Copy link

Question: why is the atomic interlocked-exchange needed?

@hopperpl I've analysed the code again (and the way state can be updated) and you were right, it was not needed. I've added a simple test to ensure that.

If you need to avoid a race condition when different flags are set/unset at the same time, use Interlocked.Or, Interlocked.And (on the 32-bit _state value). This atomic operation won't interfere with the other fields.

  private void SetFlag(Flags option, bool value)
  {
    // _flags must be at offset 0, otherwise shift "options" into position
    // also needs a BigEndian platform check

    if (value)
      Interlocked.Or(ref _state, (byte) option);
    else 
      Interlocked.And(ref _state, (byte) ~option);
  }

@adamsitnik
Copy link
Member Author

If you need to avoid a race condition when different flags are set/unset at the same time, use Interlocked.Or, Interlocked.And

@hopperpl thanks! I've analysed the code and there is no need for it (and this particular lib targets .NET Standard 2.0 and these methods were introduced in .NET 5)


internal bool CheckForExpiredTokens(CacheEntry cacheEntry)
{
if (_expirationTokens != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this need a lock like the other two methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
lock (this)
{
lock (parentEntry.GetOrCreateTokens())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where else is this locked on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how that answers my question. My primary concern would be if it'd ever be possible for something to take the second lock and then try to take the first, creating a priority inversion and deadlock potential. I'm simply wondering if that's possible. Does the child/parent nature of this guarantee locks are always acquired in the same order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the child/parent nature of this guarantee locks are always acquired in the same order?

I think it does in the case when the cache item is being added to the cache (by calling the Dispose method):

if (_previous != null && CanPropagateOptions())
{
PropagateOptions(_previous);

but I am not 100% sure if it would not be possible to create a code that does this and abuses the second possible call to this method from the Cache.TryGetValue method (which is very unintuitive for me):

// When this entry is retrieved in the scope of creating another entry,
// that entry needs a copy of these expiration tokens.
entry.PropagateOptions(CacheEntryHelper.Current);

FWIW I have not changed the locking order, it's the same as it used to be so far:

lock (_lock)
{
lock (parent._lock)
{
foreach (IChangeToken expirationToken in _expirationTokens)
{
parent.AddExpirationToken(expirationToken);
}
}
}

{
lock (parentEntry.GetOrCreateTokens())
{
foreach (IChangeToken expirationToken in _expirationTokens)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why in some cases are we using a for loop with direct indexing to enumerate the contents and in others we're using foreach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know as I am not the author of the original code. To minimize the diff and make the review easier I've decided to change as few things as possible

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice improvement here!

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
@adamsitnik adamsitnik merged commit 533a807 into dotnet:master Jan 7, 2021
@adamsitnik adamsitnik deleted the reduceCacheEntrySize branch January 7, 2021 19:57
@ghost ghost locked as resolved and limited conversation to collaborators Feb 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants