Skip to content

Conversation

@bitfaster
Copy link
Owner

@bitfaster bitfaster commented Nov 20, 2024

LruItem.WasAccessed was previously volatile to ensure that thread A marking an item as accessed is visible to thread B that is cycling the cache. Under the covers, volatile equates to half fence for reads and writes.

From the .NET memory model:

  • Volatile reads have "acquire semantics" - no read or write that is later in the program order may be speculatively executed ahead of a volatile read.
  • Volatile writes have "release semantics" - the effects of a volatile write will not be observable before effects of all previous, in program order, reads and writes become observable.
  • Full-fence operations have "full-fence semantics" - effects of reads and writes must be observable no later or no earlier than a full-fence operation according to their relative program order.

Immediately before calling ConcurrentLruCore.Cycle, there is always an interlocked call. We can thus piggy-back on interlocked and avoid the half fences.

Without the check in MarkAccessed, this does not result in the same throughput boost as #643 because x64 has a strong memory model (the write has release semantics and generates traffic to ensure CPU cache coherence).

Before

Results_Read_500_base

After

Results_Read_500

@coveralls
Copy link

coveralls commented Nov 20, 2024

Coverage Status

coverage: 99.218% (+0.07%) from 99.149%
when pulling 7c55eba on users/alexpeck/barrier
into aeae236 on main.

@bitfaster
Copy link
Owner Author

7c55ebaab8329906cf9d5d5eeead46f2f266be4c
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.2314)
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
Method Runtime Mean Error Ratio Code Size
ConcurrentDictionary .NET 6.0 7.375 ns 0.0925 ns 1.00 1,521 B
FastConcurrentLru .NET 6.0 8.569 ns 0.0252 ns 1.16 7,039 B
ConcurrentLru .NET 6.0 15.214 ns 0.0378 ns 2.06 7,286 B
AtomicFastLru .NET 6.0 27.285 ns 0.0655 ns 3.70 NA
FastConcurrentTLru .NET 6.0 11.830 ns 0.0299 ns 1.60 6,222 B
FastConcLruAfterAccess .NET 6.0 12.148 ns 0.2415 ns 1.65 8,001 B
FastConcLruAfter .NET 6.0 13.938 ns 0.1153 ns 1.89 8,083 B
ConcurrentTLru .NET 6.0 16.870 ns 0.0669 ns 2.29 7,752 B
ConcurrentLfu .NET 6.0 27.989 ns 0.5687 ns 3.80 NA
ClassicLru .NET 6.0 43.475 ns 0.0806 ns 5.90 NA
RuntimeMemoryCacheGet .NET 6.0 111.069 ns 0.3001 ns 15.06 89 B
ExtensionsMemoryCacheGet .NET 6.0 47.346 ns 0.3871 ns 6.42 119 B
ConcurrentDictionary .NET Framework 4.8 15.274 ns 0.1652 ns 1.00 4,127 B
FastConcurrentLru .NET Framework 4.8 15.951 ns 0.0542 ns 1.04 27,388 B
ConcurrentLru .NET Framework 4.8 20.185 ns 0.1386 ns 1.32 27,692 B
AtomicFastLru .NET Framework 4.8 37.835 ns 0.2130 ns 2.48 358 B
FastConcurrentTLru .NET Framework 4.8 28.312 ns 0.2128 ns 1.85 27,572 B
FastConcLruAfterAccess .NET Framework 4.8 30.603 ns 0.1348 ns 2.00 358 B
FastConcLruAfter .NET Framework 4.8 32.583 ns 0.2912 ns 2.13 358 B
ConcurrentTLru .NET Framework 4.8 32.897 ns 0.0534 ns 2.15 27,924 B
ConcurrentLfu .NET Framework 4.8 52.025 ns 0.4951 ns 3.41 NA
ClassicLru .NET Framework 4.8 56.101 ns 0.5643 ns 3.67 NA
RuntimeMemoryCacheGet .NET Framework 4.8 297.775 ns 1.1600 ns 19.50 79 B
ExtensionsMemoryCacheGet .NET Framework 4.8 93.033 ns 0.3655 ns 6.09 129 B

@bitfaster bitfaster changed the title Remove volatile from LruItem Optimize ConcurrentLru read throughput Nov 20, 2024
@bitfaster bitfaster marked this pull request as ready for review November 20, 2024 04:02
@bitfaster
Copy link
Owner Author

bitfaster commented Nov 20, 2024

Adds 2 instructions to GetOrAdd:

image

@bitfaster bitfaster merged commit 25ea2bd into main Nov 20, 2024
13 checks passed
@bitfaster bitfaster deleted the users/alexpeck/barrier branch November 20, 2024 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants