Optimize ConcurrentLru read throughput #645

bitfaster · 2024-11-20T02:19:26Z

LruItem.WasAccessed was previously volatile to ensure that thread A marking an item as accessed is visible to thread B that is cycling the cache. Under the covers, volatile equates to half fence for reads and writes.

From the .NET memory model:

Volatile reads have "acquire semantics" - no read or write that is later in the program order may be speculatively executed ahead of a volatile read.
Volatile writes have "release semantics" - the effects of a volatile write will not be observable before effects of all previous, in program order, reads and writes become observable.
Full-fence operations have "full-fence semantics" - effects of reads and writes must be observable no later or no earlier than a full-fence operation according to their relative program order.

Immediately before calling ConcurrentLruCore.Cycle, there is always an interlocked call. We can thus piggy-back on interlocked and avoid the half fences.

Without the check in MarkAccessed, this does not result in the same throughput boost as #643 because x64 has a strong memory model (the write has release semantics and generates traffic to ensure CPU cache coherence).

Before

After

coveralls · 2024-11-20T02:41:53Z

coverage: 99.218% (+0.07%) from 99.149%
when pulling 7c55eba on users/alexpeck/barrier
into aeae236 on main.

bitfaster · 2024-11-20T03:16:47Z

7c55ebaab8329906cf9d5d5eeead46f2f266be4c
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.2314)
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores

Method	Runtime	Mean	Error	Ratio	Code Size
ConcurrentDictionary	.NET 6.0	7.375 ns	0.0925 ns	1.00	1,521 B
FastConcurrentLru	.NET 6.0	8.569 ns	0.0252 ns	1.16	7,039 B
ConcurrentLru	.NET 6.0	15.214 ns	0.0378 ns	2.06	7,286 B
AtomicFastLru	.NET 6.0	27.285 ns	0.0655 ns	3.70	NA
FastConcurrentTLru	.NET 6.0	11.830 ns	0.0299 ns	1.60	6,222 B
FastConcLruAfterAccess	.NET 6.0	12.148 ns	0.2415 ns	1.65	8,001 B
FastConcLruAfter	.NET 6.0	13.938 ns	0.1153 ns	1.89	8,083 B
ConcurrentTLru	.NET 6.0	16.870 ns	0.0669 ns	2.29	7,752 B
ConcurrentLfu	.NET 6.0	27.989 ns	0.5687 ns	3.80	NA
ClassicLru	.NET 6.0	43.475 ns	0.0806 ns	5.90	NA
RuntimeMemoryCacheGet	.NET 6.0	111.069 ns	0.3001 ns	15.06	89 B
ExtensionsMemoryCacheGet	.NET 6.0	47.346 ns	0.3871 ns	6.42	119 B

ConcurrentDictionary	.NET Framework 4.8	15.274 ns	0.1652 ns	1.00	4,127 B
FastConcurrentLru	.NET Framework 4.8	15.951 ns	0.0542 ns	1.04	27,388 B
ConcurrentLru	.NET Framework 4.8	20.185 ns	0.1386 ns	1.32	27,692 B
AtomicFastLru	.NET Framework 4.8	37.835 ns	0.2130 ns	2.48	358 B
FastConcurrentTLru	.NET Framework 4.8	28.312 ns	0.2128 ns	1.85	27,572 B
FastConcLruAfterAccess	.NET Framework 4.8	30.603 ns	0.1348 ns	2.00	358 B
FastConcLruAfter	.NET Framework 4.8	32.583 ns	0.2912 ns	2.13	358 B
ConcurrentTLru	.NET Framework 4.8	32.897 ns	0.0534 ns	2.15	27,924 B
ConcurrentLfu	.NET Framework 4.8	52.025 ns	0.4951 ns	3.41	NA
ClassicLru	.NET Framework 4.8	56.101 ns	0.5643 ns	3.67	NA
RuntimeMemoryCacheGet	.NET Framework 4.8	297.775 ns	1.1600 ns	19.50	79 B
ExtensionsMemoryCacheGet	.NET Framework 4.8	93.033 ns	0.3655 ns	6.09	129 B

bitfaster · 2024-11-20T23:12:58Z

Adds 2 instructions to GetOrAdd:

rem volatile

4c89a35

ckd

7c55eba

bitfaster changed the title ~~Remove volatile from LruItem~~ Optimize ConcurrentLru read throughput Nov 20, 2024

bitfaster marked this pull request as ready for review November 20, 2024 04:02

bitfaster merged commit 25ea2bd into main Nov 20, 2024
13 checks passed

bitfaster deleted the users/alexpeck/barrier branch November 20, 2024 23:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize ConcurrentLru read throughput #645

Optimize ConcurrentLru read throughput #645

Uh oh!

bitfaster commented Nov 20, 2024 •

edited

Loading

Uh oh!

coveralls commented Nov 20, 2024 •

edited

Loading

Uh oh!

bitfaster commented Nov 20, 2024

Uh oh!

bitfaster commented Nov 20, 2024 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize ConcurrentLru read throughput #645

Optimize ConcurrentLru read throughput #645

Uh oh!

Conversation

bitfaster commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Uh oh!

coveralls commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bitfaster commented Nov 20, 2024

Uh oh!

bitfaster commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bitfaster commented Nov 20, 2024 •

edited

Loading

coveralls commented Nov 20, 2024 •

edited

Loading

bitfaster commented Nov 20, 2024 •

edited

Loading