Skip to content

Atomic GetOrAdd

Alex Peck edited this page Nov 13, 2023 · 7 revisions

Use a cache builder to create a cache with atomic GetOrAdd.

Atomic GetOrAdd

To mitigate cache stampedes, caches can be configured with atomic GetOrAdd. When multiple threads attempt to insert a value for the same key, the first caller will invoke the valueFactory and store the value. Subsequent callers will be blocked and wait until the new value is generated.

Atomic GetOrAdd is enabled by creating a cache using a cache builder.

Without Atomic GetOrAdd cache behavior matches ConcurrentDictionary

By default, both ConcurrentLru and ConcurrentLfu behave the same as ConcurrentDictionary: writes are not atomic. Modifications to the internal hash map are protected by locks. However, the valueFactory delegate is called outside the locks to avoid the problems that can arise from executing unknown code under a lock.

Since a key/value can be inserted by another thread while valueFactory is generating a value, you cannot trust that just because valueFactory executed, its produced value will be inserted into the dictionary and returned. When calling GetOrAdd simultaneously on different threads, valueFactory may be called multiple times, but only one key/value pair will be added to the dictionary. The first write wins.

Under load, this can result in a cascading failure known as a cache stampede.

Under the hood

For sync caches, atomic GetOrAdd is functionally equivalent to caching a Lazy, except exceptions are not cached. For async caches, subsequent callers will await the first caller's task. If the valueFactory throws an exception, all callers concurrently awaiting the task will see the same exception. Once the valueFactory has failed, a subsequent call to GetOrAddAsync will start fresh.

The valueFactory delegate is not cached internally and values created atomically do not hold references to the factory delegate. This avoids rooting object graphs unexpectedly and causing leaks.

Exceptions store partially initialized entries

Let's assume the cache is populated then the value factory throws an exception like below:

var cache = new ConcurrentLfuBuilder<string, bool>().WithAtomicGetOrAdd().WithCapacity(3).Build();

cache.GetOrAdd("a", _ => true);
cache.GetOrAdd("b", _ => true);
cache.GetOrAdd("b", _ => true);
cache.GetOrAdd("c", _ => true);
cache.GetOrAdd("c", _ => true);

string keys = string.Join(", ", cache.Keys); // "c, b, a"

// Now throw!
cache.GetOrAdd("d", k => throw new Exception(k));

Underneath, a value wrapper similar to Lazy<T> is added to the cache. During GetOrAdd, first the wrapper is added (there is exactly one wrapper per key in the cache), and then the factory method is invoked against the wrapper within a lock similar to Lazy (factory method invocation is serialized by a lock and completes atomically).

When the factory method throws, the wrapper has already been stored in the cache. Since in this example, the cache capacity is 3, attempting to add d to the cache evicts a because the cache already contains 3 entries, even though creating d did not succeed. Inspecting the cache keys shows this to be the case:

string keys = string.Join(", ", cache.Keys); // "c, b"

The underlying cache contains a partially initialized entry for d, but it is not observable through the public API of the cache.

This approach favors high concurrent throughput and lower latency in the normal case, at the expense of cache pollution in the exception case. Locks are only taken when executing the value factory, and there is always one lock per key avoiding unnecessary contention.