Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in MemoryCache will crash the process #61032

Closed
ayende opened this issue Oct 30, 2021 · 9 comments · Fixed by #61187
Closed

Race condition in MemoryCache will crash the process #61032

ayende opened this issue Oct 30, 2021 · 9 comments · Fixed by #61187

Comments

@ayende
Copy link
Contributor

ayende commented Oct 30, 2021

Description

I'm running some benchmark on MemoryCache, and I got the following error from a background thread, which killed the process.

It looks like a race condition in the background compaction process, and I'm assuming that the following line is responsible for that:

https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Caching.Memory/src/MemoryCache.cs#L451

The issue here is that we are are updating the LastAccessed value on TryGetValue, but if this is running concurrently with the Sort() call, this means that the sort order of an entry has changed, leading to this issue.

The error is:

Unhandled exception. System.ArgumentException: Unable to sort because the IComparer.Compare() method returns inconsistent results. Either a value does not compare equal to itself, or one value repeatedly compared to another value yields different results. IComparer: 'System.Comparison`1[Microsoft.Extensions.Caching.Memory.CacheEntry]'.
   at System.Collections.Generic.ArraySortHelper`1.Sort(Span`1 keys, Comparison`1 comparer) in System.Private.CoreLib.dll:token 0x60066cd+0x1d
   at System.Collections.Generic.List`1.Sort(Comparison`1 comparison) in System.Private.CoreLib.dll:token 0x600688b+0x3
   at Microsoft.Extensions.Caching.Memory.MemoryCache.<Compact>g__ExpirePriorityBucket|27_0(Int64& removedSize, Int64 removalSizeTarget, Func`2 computeEntrySize, List`1 entriesToRemove, List`1 priorityEntries) in Microsoft.Extensions.Caching.Memory.dll:token 0x6000061+0x21
   at Microsoft.Extensions.Caching.Memory.MemoryCache.Compact(Int64 removalSizeTarget, Func`2 computeEntrySize) in Microsoft.Extensions.Caching.Memory.dll:token 0x600005b+0xff
   at Microsoft.Extensions.Caching.Memory.MemoryCache.OvercapacityCompaction(MemoryCache cache) in Microsoft.Extensions.Caching.Memory.dll:token 0x6000059+0xad
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in System.Private.CoreLib.dll:token 0x6002b7c+0x110
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in System.Private.CoreLib.dll:token 0x6002c66+0x67
   at System.Threading.Thread.StartCallback() in System.Private.CoreLib.dll:token 0x600280f+0xe

Reproduction Steps

The following code will reproduce the behavior in about 5 or so minutes of runtime.
This is actually a reproduction for another issue, but I run into the exception and it very much looks like a serious problem for consumers of MemoryCache.

using Microsoft.Extensions.Caching.Memory;
using System;
using System.Buffers;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Security.Cryptography.X509Certificates;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApp5
{
    class Program
    {
        static async Task Main(string[] args)
        {
            var list = new List<string>();

            for (int i = 0; i < 1024; i++)
            {
                list.Add(i.ToString());
            }

            var hasher = new FileHasher();
            var tasks = new List<Task>();  
            for (int i = 0; i < 512; i++)
            {
                var t = Task.Run(() => ValidateHashEntries(list, hasher));
                tasks.Add(t);
            }

            Task.WaitAll(tasks.ToArray());


        }

        private static void ValidateHashEntries(List<string> list, FileHasher hasher)
        {
            for (int i = 0; i < list.Count * 32; i++)
            {
                var item = list[Random.Shared.Next(list.Count)];
                var hash = hasher.ComputeHash(item);
                if (hash.All(x => x == 0))
                {
                    Console.WriteLine("Found invalid value");
                }
            }
        }
    }

    public class FileHasher
    {
        private MemoryCache _cache;
        public FileHasher()
        {
            _cache = new MemoryCache(new MemoryCacheOptions
            {
                SizeLimit = 1024
            });
        }

        [ThreadStatic]
        private static SpinWait _sleep;

        public byte[] ComputeHash(string file)
        {
            var hash = (byte[])_cache.Get(file);
            if(hash != null)
            {
                _sleep.SpinOnce();
            }
            return ComputeHashAndPutInCache(file);
        }

        private byte[] ComputeHashAndPutInCache(string file)
        {
            byte[] hash = ArrayPool<byte>.Shared.Rent(32);
            HashTheFile(file, hash);
            _cache.Set(file, hash, new MemoryCacheEntryOptions
            {
                Size = 32,
                PostEvictionCallbacks =
                    {
                        new PostEvictionCallbackRegistration
                        {
                            EvictionCallback = EvictionCallback
                        }
                    }
            });
            return hash;
        }

        private void EvictionCallback(object key, object value, EvictionReason reason, object state)
        {
            Array.Clear((byte[])value);
            ArrayPool<byte>.Shared.Return((byte[])value);
        }

        private static void HashTheFile(string file, byte[] hash)
        {
            // let's pretend to hash the file here
            var _ = file;
            Random.Shared.NextBytes(hash);
        }
    }
}

Expected behavior

There should not be a crash.

Actual behavior

There is a crash and the process dies.

Regression?

No response

Known Workarounds

none

Configuration

No response

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added area-Extensions-Caching untriaged New issue has not been triaged by the area owner labels Oct 30, 2021
@ghost
Copy link

ghost commented Oct 30, 2021

Tagging subscribers to this area: @eerhardt, @maryamariyan, @michaelgsharp
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I'm running some benchmark on MemoryCache, and I got the following error from a background thread, which killed the process.

It looks like a race condition in the background compaction process, and I'm assuming that the following line is responsible for that:

https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Caching.Memory/src/MemoryCache.cs#L451

The issue here is that we are are updating the LastAccessed value on TryGetValue, but if this is running concurrently with the Sort() call, this means that the sort order of an entry has changed, leading to this issue.

The error is:

Unhandled exception. System.ArgumentException: Unable to sort because the IComparer.Compare() method returns inconsistent results. Either a value does not compare equal to itself, or one value repeatedly compared to another value yields different results. IComparer: 'System.Comparison`1[Microsoft.Extensions.Caching.Memory.CacheEntry]'.
   at System.Collections.Generic.ArraySortHelper`1.Sort(Span`1 keys, Comparison`1 comparer) in System.Private.CoreLib.dll:token 0x60066cd+0x1d
   at System.Collections.Generic.List`1.Sort(Comparison`1 comparison) in System.Private.CoreLib.dll:token 0x600688b+0x3
   at Microsoft.Extensions.Caching.Memory.MemoryCache.<Compact>g__ExpirePriorityBucket|27_0(Int64& removedSize, Int64 removalSizeTarget, Func`2 computeEntrySize, List`1 entriesToRemove, List`1 priorityEntries) in Microsoft.Extensions.Caching.Memory.dll:token 0x6000061+0x21
   at Microsoft.Extensions.Caching.Memory.MemoryCache.Compact(Int64 removalSizeTarget, Func`2 computeEntrySize) in Microsoft.Extensions.Caching.Memory.dll:token 0x600005b+0xff
   at Microsoft.Extensions.Caching.Memory.MemoryCache.OvercapacityCompaction(MemoryCache cache) in Microsoft.Extensions.Caching.Memory.dll:token 0x6000059+0xad
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in System.Private.CoreLib.dll:token 0x6002b7c+0x110
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in System.Private.CoreLib.dll:token 0x6002c66+0x67
   at System.Threading.Thread.StartCallback() in System.Private.CoreLib.dll:token 0x600280f+0xe

Reproduction Steps

The following code will reproduce the behavior in about 5 or so minutes of runtime.
This is actually a reproduction for another issue, but I run into the exception and it very much looks like a serious problem for consumers of MemoryCache.

using Microsoft.Extensions.Caching.Memory;
using System;
using System.Buffers;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Security.Cryptography.X509Certificates;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApp5
{
    class Program
    {
        static async Task Main(string[] args)
        {
            var list = new List<string>();

            for (int i = 0; i < 1024; i++)
            {
                list.Add(i.ToString());
            }

            var hasher = new FileHasher();
            var tasks = new List<Task>();  
            for (int i = 0; i < 512; i++)
            {
                var t = Task.Run(() => ValidateHashEntries(list, hasher));
                tasks.Add(t);
            }

            Task.WaitAll(tasks.ToArray());


        }

        private static void ValidateHashEntries(List<string> list, FileHasher hasher)
        {
            for (int i = 0; i < list.Count * 32; i++)
            {
                var item = list[Random.Shared.Next(list.Count)];
                var hash = hasher.ComputeHash(item);
                if (hash.All(x => x == 0))
                {
                    Console.WriteLine("Found invalid value");
                }
            }
        }
    }

    public class FileHasher
    {
        private MemoryCache _cache;
        public FileHasher()
        {
            _cache = new MemoryCache(new MemoryCacheOptions
            {
                SizeLimit = 1024
            });
        }

        [ThreadStatic]
        private static SpinWait _sleep;

        public byte[] ComputeHash(string file)
        {
            var hash = (byte[])_cache.Get(file);
            if(hash != null)
            {
                _sleep.SpinOnce();
            }
            return ComputeHashAndPutInCache(file);
        }

        private byte[] ComputeHashAndPutInCache(string file)
        {
            byte[] hash = ArrayPool<byte>.Shared.Rent(32);
            HashTheFile(file, hash);
            _cache.Set(file, hash, new MemoryCacheEntryOptions
            {
                Size = 32,
                PostEvictionCallbacks =
                    {
                        new PostEvictionCallbackRegistration
                        {
                            EvictionCallback = EvictionCallback
                        }
                    }
            });
            return hash;
        }

        private void EvictionCallback(object key, object value, EvictionReason reason, object state)
        {
            Array.Clear((byte[])value);
            ArrayPool<byte>.Shared.Return((byte[])value);
        }

        private static void HashTheFile(string file, byte[] hash)
        {
            // let's pretend to hash the file here
            var _ = file;
            Random.Shared.NextBytes(hash);
        }
    }
}

Expected behavior

There should not be a crash.

Actual behavior

There is a crash and the process dies.

Regression?

No response

Known Workarounds

none

Configuration

No response

Other information

No response

Author: ayende
Assignees: -
Labels:

untriaged, area-Extensions-Caching

Milestone: -

@ayende ayende changed the title MemoryCache error crash the process Race condition in MemoryCache will crash the process Oct 30, 2021
eerhardt added a commit to eerhardt/runtime that referenced this issue Nov 3, 2021
During cache compaction, we are sorting entries based on their LastAccessed time. However, since the cache entries can still be used concurrently on other threads, the LastAccessed time may be updated in the middle of sorting the entries. This leads to exceptions in a background thread, crashing the process.

The fix is to cache the LastAccessed time outside of the entry when we are adding it to the list. This will ensure the time is stable during the compaction process.

Fix dotnet#61032
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 3, 2021
@eerhardt
Copy link
Member

eerhardt commented Nov 3, 2021

Thanks for the bug report, @ayende. With your above description, I was easily able to reproduce the problem myself and you were correct in your analysis. The issue is that we are sorting based on LastAccessed time, but that value may be updated concurrently by a different thread. This causes the Sort to fail, and since it happens in a background thread, the app crashes.

I have created a fix for this in 7.0. This may be a candidate we can backport to 6.0. How urgent do you see this issue being?

@eerhardt eerhardt added this to the 7.0.0 milestone Nov 3, 2021
@eerhardt eerhardt removed the untriaged New issue has not been triaged by the area owner label Nov 3, 2021
@ayende
Copy link
Contributor Author

ayende commented Nov 4, 2021

I'm actually using a copy of the code, so I can patch this locally. However, I would estimate that this is likely to be causing application crashes in production (and nearly impossible to reproduce ones).
My estimation is that this is a serious issue, and should be backported.

@davidfowl
Copy link
Member

Yes we need to back port this one

@eerhardt eerhardt modified the milestones: 7.0.0, 6.0.0, 6.0.x Nov 4, 2021
eerhardt added a commit that referenced this issue Dec 2, 2021
* Cache LastAccessed during MemoryCache compaction

During cache compaction, we are sorting entries based on their LastAccessed time. However, since the cache entries can still be used concurrently on other threads, the LastAccessed time may be updated in the middle of sorting the entries. This leads to exceptions in a background thread, crashing the process.

The fix is to cache the LastAccessed time outside of the entry when we are adding it to the list. This will ensure the time is stable during the compaction process.

Fix #61032
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Dec 2, 2021
@eerhardt
Copy link
Member

eerhardt commented Dec 2, 2021

Re-opening to track porting this fix to 6.0.

@eerhardt eerhardt reopened this Dec 2, 2021
eerhardt added a commit to eerhardt/runtime that referenced this issue Dec 2, 2021
* Cache LastAccessed during MemoryCache compaction

During cache compaction, we are sorting entries based on their LastAccessed time. However, since the cache entries can still be used concurrently on other threads, the LastAccessed time may be updated in the middle of sorting the entries. This leads to exceptions in a background thread, crashing the process.

The fix is to cache the LastAccessed time outside of the entry when we are adding it to the list. This will ensure the time is stable during the compaction process.

Fix dotnet#61032
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 2, 2021
safern pushed a commit that referenced this issue Dec 15, 2021
* Cache LastAccessed during MemoryCache compaction (#61187)

* Cache LastAccessed during MemoryCache compaction

During cache compaction, we are sorting entries based on their LastAccessed time. However, since the cache entries can still be used concurrently on other threads, the LastAccessed time may be updated in the middle of sorting the entries. This leads to exceptions in a background thread, crashing the process.

The fix is to cache the LastAccessed time outside of the entry when we are adding it to the list. This will ensure the time is stable during the compaction process.

Fix #61032

* Update fix for 6.0.x servicing.

1. Remove the dependency on ValueTuple and use a custom struct instead.
2. Add servicing package changes.
3. In the tests, move the DisableParallelization collection declaration in the test project, since it is only in "Common" in main.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Dec 15, 2021
@deeprobin
Copy link
Contributor

Is this issue still relevant (#62286 is merged)?

@eerhardt
Copy link
Member

eerhardt commented Feb 2, 2022

This can now be closed. The fix will ship with version 6.0.2.

@eerhardt eerhardt closed this as completed Feb 2, 2022
@ErikApption
Copy link

@eerhardt I just had that error with 6.0.200-preview.22055.15 - is that error going to be fixed in the next version?

@eerhardt
Copy link
Member

eerhardt commented Feb 3, 2022

Yes, it will be fixed in the next servicing release of .NET. The runtime version will be 6.0.2. The version you have above is the SDK version - 6.0.200-preview.22055.15, which includes the 6.0.1 runtime version.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants