-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReaderWriterLockSlim scales badly if high contention is generated across lots of processors #8539
Labels
Milestone
Comments
kouvel
referenced
this issue
in kouvel/coreclr
Aug 6, 2017
Fixes #12780 - Introduced some fairness to _myLock acquisition by deprioritizing attempts to acquire _myLock that are not likely to make progress on the RW lock - Limited spinning in some cases where it is very unlikely that spinning would help ``` Baseline (left) vs Changed (right) Left score Right score ∆ Score % ------------------------------------------------------- --------------- --------------- --------- Concurrency_12Readers 22576.33 ±0.61% 22485.13 ±0.84% -0.40% Concurrency_12Writers 22661.57 ±0.36% 23159.64 ±0.41% 2.20% Concurrency_12Readers_4Writers 1363.40 ±11.54% 7721.94 ±2.66% 466.37% Concurrency_48Readers_8Writers 6.01 ±3.81% 5495.41 ±4.76% 91334.09% Concurrency_192Readers_16Writers 6.99 ±2.55% 4071.60 ±6.46% 58152.74% Concurrency_768Readers_32Writers 15.23 ±3.74% 1296.81 ±4.83% 8412.51% Contention_AcquireReleaseReadLock_When_BgWriteLock 3.81 ±0.81% 3.65 ±0.68% -4.25% Contention_AcquireReleaseWriteLock_When_BgReadLock 3.78 ±0.63% 3.66 ±0.69% -3.20% Contention_UpgradeDowngradeLock_When_BgReadLock 3.76 ±0.72% 3.59 ±0.80% -4.63% Micro_AcquireReleaseReadLock_When_BgReadLock 2578.81 ±0.41% 2566.67 ±0.04% -0.47% Micro_AcquireReleaseReadLock_When_NoLock 2586.77 ±0.12% 2592.37 ±0.03% 0.22% Micro_AcquireReleaseReadLock_When_ReadLock 2777.25 ±0.03% 2770.76 ±0.37% -0.23% Micro_AcquireReleaseUpgradeableReadLock_When_BgReadLock 2595.37 ±0.18% 2605.86 ±0.49% 0.40% Micro_AcquireReleaseUpgradeableReadLock_When_NoLock 2646.20 ±0.12% 2609.14 ±0.13% -1.40% Micro_AcquireReleaseWriteLock_When_NoLock 2548.90 ±0.04% 2621.03 ±0.18% 2.83% Micro_AcquireReleaseWriteLock_When_WriteLock 2650.63 ±0.32% 2660.43 ±0.37% 0.37% Micro_UpgradeDowngradeLock_When_ReadLock 2535.17 ±0.10% 2512.07 ±0.09% -0.91% Timeout_AcquireReleaseReadLock_When_BgWriteLock 4016.95 ±1.00% 4012.96 ±0.23% -0.10% Timeout_AcquireReleaseWriteLock_When_BgReadLock 4014.43 ±0.14% 4119.19 ±0.24% 2.61% Timeout_UpgradeDowngradeLock_When_BgReadLock 3979.77 ±0.07% 3893.49 ±0.04% -2.17% ------------------------------------------------------- --------------- --------------- --------- Total 532.66 ±1.40% 1395.93 ±1.24% 162.07% ``` @ChrisAhna's repro from #12780 ------------------------------ Baseline ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5008ms FinalCount=146778099 CountsPerSecond=29303915.9706614 ThreadCount=0002: Elapsed=5013ms FinalCount=144566761 CountsPerSecond=28832836.5040624 ThreadCount=0004: Elapsed=5030ms FinalCount=142987794 CountsPerSecond=28424230.6837138 ThreadCount=0008: Elapsed=5046ms FinalCount=140895614 CountsPerSecond=27919285.270997 ThreadCount=0016: Elapsed=5061ms FinalCount=131126565 CountsPerSecond=25904896.8847266 ThreadCount=0032: Elapsed=5046ms FinalCount=118985913 CountsPerSecond=23578687.3922634 ThreadCount=0064: Elapsed=5077ms FinalCount=87382990 CountsPerSecond=17209966.7389088 ThreadCount=0128: Elapsed=5046ms FinalCount=13983552 CountsPerSecond=2771044.65190866 ThreadCount=0256: Elapsed=5061ms FinalCount=926020 CountsPerSecond=182954.410859513 ThreadCount=0512: Elapsed=5061ms FinalCount=554880 CountsPerSecond=109633.000133682 ThreadCount=1024: Elapsed=5047ms FinalCount=403372 CountsPerSecond=79907.8598817266 ThreadCount=2048: Elapsed=5158ms FinalCount=427853 CountsPerSecond=82937.9986349298 ThreadCount=4096: Elapsed=6843ms FinalCount=1454282 CountsPerSecond=212510.216120728 ``` Changed ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5011ms FinalCount=146023913 CountsPerSecond=29136243.0844929 ThreadCount=0002: Elapsed=5029ms FinalCount=148085765 CountsPerSecond=29443444.9191416 ThreadCount=0004: Elapsed=5030ms FinalCount=147435037 CountsPerSecond=29307748.3158908 ThreadCount=0008: Elapsed=5046ms FinalCount=135669584 CountsPerSecond=26884746.2829748 ThreadCount=0016: Elapsed=5046ms FinalCount=117172253 CountsPerSecond=23219237.5243882 ThreadCount=0032: Elapsed=5077ms FinalCount=123019081 CountsPerSecond=24227324.0685217 ThreadCount=0064: Elapsed=5061ms FinalCount=114036461 CountsPerSecond=22528225.1624931 ThreadCount=0128: Elapsed=5061ms FinalCount=114874563 CountsPerSecond=22694305.2318717 ThreadCount=0256: Elapsed=5077ms FinalCount=111656891 CountsPerSecond=21990952.9701927 ThreadCount=0512: Elapsed=5092ms FinalCount=108080691 CountsPerSecond=21224516.1624796 ThreadCount=1024: Elapsed=5091ms FinalCount=101505410 CountsPerSecond=19936231.0295513 ThreadCount=2048: Elapsed=5168ms FinalCount=90210271 CountsPerSecond=17452847.1281253 ThreadCount=4096: Elapsed=5448ms FinalCount=70413247 CountsPerSecond=12923136.9608753 Monitor Running StandardObjectLock write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5003ms FinalCount=256096538 CountsPerSecond=51188037.8514373 ThreadCount=0002: Elapsed=5014ms FinalCount=247492238 CountsPerSecond=49357843.8844401 ThreadCount=0004: Elapsed=5015ms FinalCount=233682614 CountsPerSecond=46594332.7385551 ThreadCount=0008: Elapsed=5014ms FinalCount=202084181 CountsPerSecond=40295977.2812599 ThreadCount=0016: Elapsed=5015ms FinalCount=160327931 CountsPerSecond=31967286.7931112 ThreadCount=0032: Elapsed=5015ms FinalCount=159973407 CountsPerSecond=31896067.0536476 ThreadCount=0064: Elapsed=5016ms FinalCount=159925779 CountsPerSecond=31881983.1519616 ThreadCount=0128: Elapsed=5018ms FinalCount=160565171 CountsPerSecond=31994598.5148935 ThreadCount=0256: Elapsed=5027ms FinalCount=160346276 CountsPerSecond=31893699.5203601 ThreadCount=0512: Elapsed=5059ms FinalCount=160314101 CountsPerSecond=31688269.2933108 ThreadCount=1024: Elapsed=5106ms FinalCount=160400801 CountsPerSecond=31409342.086444 ThreadCount=2048: Elapsed=5198ms FinalCount=160176757 CountsPerSecond=30814118.2536987 ThreadCount=4096: Elapsed=5398ms FinalCount=160109421 CountsPerSecond=29660512.8891984 ```
kouvel
referenced
this issue
in kouvel/coreclr
Aug 6, 2017
The _myLock spin lock runs into some bad scalability issues. For example: - Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. - Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. Fixes #12780: - Introduced some fairness to _myLock acquisition by deprioritizing attempts to acquire _myLock that are not likely to make progress on the RW lock - Limited spinning in some cases where it is very unlikely that spinning would help ``` Baseline (left) vs Changed (right) Left score Right score ∆ Score % ------------------------------------------------------- --------------- --------------- --------- Concurrency_12Readers 22576.33 ±0.61% 22485.13 ±0.84% -0.40% Concurrency_12Writers 22661.57 ±0.36% 23159.64 ±0.41% 2.20% Concurrency_12Readers_4Writers 1363.40 ±11.54% 7721.94 ±2.66% 466.37% Concurrency_48Readers_8Writers 6.01 ±3.81% 5495.41 ±4.76% 91334.09% Concurrency_192Readers_16Writers 6.99 ±2.55% 4071.60 ±6.46% 58152.74% Concurrency_768Readers_32Writers 15.23 ±3.74% 1296.81 ±4.83% 8412.51% Contention_AcquireReleaseReadLock_When_BgWriteLock 3.81 ±0.81% 3.65 ±0.68% -4.25% Contention_AcquireReleaseWriteLock_When_BgReadLock 3.78 ±0.63% 3.66 ±0.69% -3.20% Contention_UpgradeDowngradeLock_When_BgReadLock 3.76 ±0.72% 3.59 ±0.80% -4.63% Micro_AcquireReleaseReadLock_When_BgReadLock 2578.81 ±0.41% 2566.67 ±0.04% -0.47% Micro_AcquireReleaseReadLock_When_NoLock 2586.77 ±0.12% 2592.37 ±0.03% 0.22% Micro_AcquireReleaseReadLock_When_ReadLock 2777.25 ±0.03% 2770.76 ±0.37% -0.23% Micro_AcquireReleaseUpgradeableReadLock_When_BgReadLock 2595.37 ±0.18% 2605.86 ±0.49% 0.40% Micro_AcquireReleaseUpgradeableReadLock_When_NoLock 2646.20 ±0.12% 2609.14 ±0.13% -1.40% Micro_AcquireReleaseWriteLock_When_NoLock 2548.90 ±0.04% 2621.03 ±0.18% 2.83% Micro_AcquireReleaseWriteLock_When_WriteLock 2650.63 ±0.32% 2660.43 ±0.37% 0.37% Micro_UpgradeDowngradeLock_When_ReadLock 2535.17 ±0.10% 2512.07 ±0.09% -0.91% Timeout_AcquireReleaseReadLock_When_BgWriteLock 4016.95 ±1.00% 4012.96 ±0.23% -0.10% Timeout_AcquireReleaseWriteLock_When_BgReadLock 4014.43 ±0.14% 4119.19 ±0.24% 2.61% Timeout_UpgradeDowngradeLock_When_BgReadLock 3979.77 ±0.07% 3893.49 ±0.04% -2.17% ------------------------------------------------------- --------------- --------------- --------- Total 532.66 ±1.40% 1395.93 ±1.24% 162.07% ``` @ChrisAhna's repro from #12780 ------------------------------ Baseline ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5008ms FinalCount=146778099 CountsPerSecond=29303915.9706614 ThreadCount=0002: Elapsed=5013ms FinalCount=144566761 CountsPerSecond=28832836.5040624 ThreadCount=0004: Elapsed=5030ms FinalCount=142987794 CountsPerSecond=28424230.6837138 ThreadCount=0008: Elapsed=5046ms FinalCount=140895614 CountsPerSecond=27919285.270997 ThreadCount=0016: Elapsed=5061ms FinalCount=131126565 CountsPerSecond=25904896.8847266 ThreadCount=0032: Elapsed=5046ms FinalCount=118985913 CountsPerSecond=23578687.3922634 ThreadCount=0064: Elapsed=5077ms FinalCount=87382990 CountsPerSecond=17209966.7389088 ThreadCount=0128: Elapsed=5046ms FinalCount=13983552 CountsPerSecond=2771044.65190866 ThreadCount=0256: Elapsed=5061ms FinalCount=926020 CountsPerSecond=182954.410859513 ThreadCount=0512: Elapsed=5061ms FinalCount=554880 CountsPerSecond=109633.000133682 ThreadCount=1024: Elapsed=5047ms FinalCount=403372 CountsPerSecond=79907.8598817266 ThreadCount=2048: Elapsed=5158ms FinalCount=427853 CountsPerSecond=82937.9986349298 ThreadCount=4096: Elapsed=6843ms FinalCount=1454282 CountsPerSecond=212510.216120728 ``` Changed ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5011ms FinalCount=146023913 CountsPerSecond=29136243.0844929 ThreadCount=0002: Elapsed=5029ms FinalCount=148085765 CountsPerSecond=29443444.9191416 ThreadCount=0004: Elapsed=5030ms FinalCount=147435037 CountsPerSecond=29307748.3158908 ThreadCount=0008: Elapsed=5046ms FinalCount=135669584 CountsPerSecond=26884746.2829748 ThreadCount=0016: Elapsed=5046ms FinalCount=117172253 CountsPerSecond=23219237.5243882 ThreadCount=0032: Elapsed=5077ms FinalCount=123019081 CountsPerSecond=24227324.0685217 ThreadCount=0064: Elapsed=5061ms FinalCount=114036461 CountsPerSecond=22528225.1624931 ThreadCount=0128: Elapsed=5061ms FinalCount=114874563 CountsPerSecond=22694305.2318717 ThreadCount=0256: Elapsed=5077ms FinalCount=111656891 CountsPerSecond=21990952.9701927 ThreadCount=0512: Elapsed=5092ms FinalCount=108080691 CountsPerSecond=21224516.1624796 ThreadCount=1024: Elapsed=5091ms FinalCount=101505410 CountsPerSecond=19936231.0295513 ThreadCount=2048: Elapsed=5168ms FinalCount=90210271 CountsPerSecond=17452847.1281253 ThreadCount=4096: Elapsed=5448ms FinalCount=70413247 CountsPerSecond=12923136.9608753 Monitor Running StandardObjectLock write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5003ms FinalCount=256096538 CountsPerSecond=51188037.8514373 ThreadCount=0002: Elapsed=5014ms FinalCount=247492238 CountsPerSecond=49357843.8844401 ThreadCount=0004: Elapsed=5015ms FinalCount=233682614 CountsPerSecond=46594332.7385551 ThreadCount=0008: Elapsed=5014ms FinalCount=202084181 CountsPerSecond=40295977.2812599 ThreadCount=0016: Elapsed=5015ms FinalCount=160327931 CountsPerSecond=31967286.7931112 ThreadCount=0032: Elapsed=5015ms FinalCount=159973407 CountsPerSecond=31896067.0536476 ThreadCount=0064: Elapsed=5016ms FinalCount=159925779 CountsPerSecond=31881983.1519616 ThreadCount=0128: Elapsed=5018ms FinalCount=160565171 CountsPerSecond=31994598.5148935 ThreadCount=0256: Elapsed=5027ms FinalCount=160346276 CountsPerSecond=31893699.5203601 ThreadCount=0512: Elapsed=5059ms FinalCount=160314101 CountsPerSecond=31688269.2933108 ThreadCount=1024: Elapsed=5106ms FinalCount=160400801 CountsPerSecond=31409342.086444 ThreadCount=2048: Elapsed=5198ms FinalCount=160176757 CountsPerSecond=30814118.2536987 ThreadCount=4096: Elapsed=5398ms FinalCount=160109421 CountsPerSecond=29660512.8891984 ```
kouvel
referenced
this issue
in kouvel/coreclr
Aug 6, 2017
Fixes #12780 The _myLock spin lock runs into some bad scalability issues. For example: - Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. - Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. Took some suggestions from @vancem and landed on the following after some experiments: - Introduced some fairness to _myLock acquisition by deprioritizing attempts to acquire _myLock that are not likely to make progress on the RW lock - Limited spinning in some cases where it is very unlikely that spinning would help ``` Baseline (left) vs Changed (right) Left score Right score ∆ Score % ------------------------------------------------------- --------------- --------------- --------- Concurrency_12Readers 22576.33 ±0.61% 22485.13 ±0.84% -0.40% Concurrency_12Writers 22661.57 ±0.36% 23159.64 ±0.41% 2.20% Concurrency_12Readers_4Writers 1363.40 ±11.54% 7721.94 ±2.66% 466.37% Concurrency_48Readers_8Writers 6.01 ±3.81% 5495.41 ±4.76% 91334.09% Concurrency_192Readers_16Writers 6.99 ±2.55% 4071.60 ±6.46% 58152.74% Concurrency_768Readers_32Writers 15.23 ±3.74% 1296.81 ±4.83% 8412.51% Contention_AcquireReleaseReadLock_When_BgWriteLock 3.81 ±0.81% 3.65 ±0.68% -4.25% Contention_AcquireReleaseWriteLock_When_BgReadLock 3.78 ±0.63% 3.66 ±0.69% -3.20% Contention_UpgradeDowngradeLock_When_BgReadLock 3.76 ±0.72% 3.59 ±0.80% -4.63% Micro_AcquireReleaseReadLock_When_BgReadLock 2578.81 ±0.41% 2566.67 ±0.04% -0.47% Micro_AcquireReleaseReadLock_When_NoLock 2586.77 ±0.12% 2592.37 ±0.03% 0.22% Micro_AcquireReleaseReadLock_When_ReadLock 2777.25 ±0.03% 2770.76 ±0.37% -0.23% Micro_AcquireReleaseUpgradeableReadLock_When_BgReadLock 2595.37 ±0.18% 2605.86 ±0.49% 0.40% Micro_AcquireReleaseUpgradeableReadLock_When_NoLock 2646.20 ±0.12% 2609.14 ±0.13% -1.40% Micro_AcquireReleaseWriteLock_When_NoLock 2548.90 ±0.04% 2621.03 ±0.18% 2.83% Micro_AcquireReleaseWriteLock_When_WriteLock 2650.63 ±0.32% 2660.43 ±0.37% 0.37% Micro_UpgradeDowngradeLock_When_ReadLock 2535.17 ±0.10% 2512.07 ±0.09% -0.91% Timeout_AcquireReleaseReadLock_When_BgWriteLock 4016.95 ±1.00% 4012.96 ±0.23% -0.10% Timeout_AcquireReleaseWriteLock_When_BgReadLock 4014.43 ±0.14% 4119.19 ±0.24% 2.61% Timeout_UpgradeDowngradeLock_When_BgReadLock 3979.77 ±0.07% 3893.49 ±0.04% -2.17% ------------------------------------------------------- --------------- --------------- --------- Total 532.66 ±1.40% 1395.93 ±1.24% 162.07% ``` @ChrisAhna's repro from #12780 ------------------------------ Baseline ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5008ms FinalCount=146778099 CountsPerSecond=29303915.9706614 ThreadCount=0002: Elapsed=5013ms FinalCount=144566761 CountsPerSecond=28832836.5040624 ThreadCount=0004: Elapsed=5030ms FinalCount=142987794 CountsPerSecond=28424230.6837138 ThreadCount=0008: Elapsed=5046ms FinalCount=140895614 CountsPerSecond=27919285.270997 ThreadCount=0016: Elapsed=5061ms FinalCount=131126565 CountsPerSecond=25904896.8847266 ThreadCount=0032: Elapsed=5046ms FinalCount=118985913 CountsPerSecond=23578687.3922634 ThreadCount=0064: Elapsed=5077ms FinalCount=87382990 CountsPerSecond=17209966.7389088 ThreadCount=0128: Elapsed=5046ms FinalCount=13983552 CountsPerSecond=2771044.65190866 ThreadCount=0256: Elapsed=5061ms FinalCount=926020 CountsPerSecond=182954.410859513 ThreadCount=0512: Elapsed=5061ms FinalCount=554880 CountsPerSecond=109633.000133682 ThreadCount=1024: Elapsed=5047ms FinalCount=403372 CountsPerSecond=79907.8598817266 ThreadCount=2048: Elapsed=5158ms FinalCount=427853 CountsPerSecond=82937.9986349298 ThreadCount=4096: Elapsed=6843ms FinalCount=1454282 CountsPerSecond=212510.216120728 ``` Changed ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5011ms FinalCount=146023913 CountsPerSecond=29136243.0844929 ThreadCount=0002: Elapsed=5029ms FinalCount=148085765 CountsPerSecond=29443444.9191416 ThreadCount=0004: Elapsed=5030ms FinalCount=147435037 CountsPerSecond=29307748.3158908 ThreadCount=0008: Elapsed=5046ms FinalCount=135669584 CountsPerSecond=26884746.2829748 ThreadCount=0016: Elapsed=5046ms FinalCount=117172253 CountsPerSecond=23219237.5243882 ThreadCount=0032: Elapsed=5077ms FinalCount=123019081 CountsPerSecond=24227324.0685217 ThreadCount=0064: Elapsed=5061ms FinalCount=114036461 CountsPerSecond=22528225.1624931 ThreadCount=0128: Elapsed=5061ms FinalCount=114874563 CountsPerSecond=22694305.2318717 ThreadCount=0256: Elapsed=5077ms FinalCount=111656891 CountsPerSecond=21990952.9701927 ThreadCount=0512: Elapsed=5092ms FinalCount=108080691 CountsPerSecond=21224516.1624796 ThreadCount=1024: Elapsed=5091ms FinalCount=101505410 CountsPerSecond=19936231.0295513 ThreadCount=2048: Elapsed=5168ms FinalCount=90210271 CountsPerSecond=17452847.1281253 ThreadCount=4096: Elapsed=5448ms FinalCount=70413247 CountsPerSecond=12923136.9608753 Monitor Running StandardObjectLock write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5003ms FinalCount=256096538 CountsPerSecond=51188037.8514373 ThreadCount=0002: Elapsed=5014ms FinalCount=247492238 CountsPerSecond=49357843.8844401 ThreadCount=0004: Elapsed=5015ms FinalCount=233682614 CountsPerSecond=46594332.7385551 ThreadCount=0008: Elapsed=5014ms FinalCount=202084181 CountsPerSecond=40295977.2812599 ThreadCount=0016: Elapsed=5015ms FinalCount=160327931 CountsPerSecond=31967286.7931112 ThreadCount=0032: Elapsed=5015ms FinalCount=159973407 CountsPerSecond=31896067.0536476 ThreadCount=0064: Elapsed=5016ms FinalCount=159925779 CountsPerSecond=31881983.1519616 ThreadCount=0128: Elapsed=5018ms FinalCount=160565171 CountsPerSecond=31994598.5148935 ThreadCount=0256: Elapsed=5027ms FinalCount=160346276 CountsPerSecond=31893699.5203601 ThreadCount=0512: Elapsed=5059ms FinalCount=160314101 CountsPerSecond=31688269.2933108 ThreadCount=1024: Elapsed=5106ms FinalCount=160400801 CountsPerSecond=31409342.086444 ThreadCount=2048: Elapsed=5198ms FinalCount=160176757 CountsPerSecond=30814118.2536987 ThreadCount=4096: Elapsed=5398ms FinalCount=160109421 CountsPerSecond=29660512.8891984 ```
kouvel
referenced
this issue
in kouvel/coreclr
Aug 6, 2017
Fixes #12780 The _myLock spin lock runs into some bad scalability issues. For example: - Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. - Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. Took some suggestions from @vancem and landed on the following after some experiments: - Introduced some fairness to _myLock acquisition by deprioritizing attempts to acquire _myLock that are not likely to make progress on the RW lock - Limited spinning in some cases where it is very unlikely that spinning would help ``` Baseline (left) vs Changed (right) Left score Right score ∆ Score % ------------------------------------------------------- --------------- --------------- --------- Concurrency_12Readers 22576.33 ±0.61% 22485.13 ±0.84% -0.40% Concurrency_12Writers 22661.57 ±0.36% 23159.64 ±0.41% 2.20% Concurrency_12Readers_4Writers 1363.40 ±11.54% 7721.94 ±2.66% 466.37% Concurrency_48Readers_8Writers 6.01 ±3.81% 5495.41 ±4.76% 91334.09% Concurrency_192Readers_16Writers 6.99 ±2.55% 4071.60 ±6.46% 58152.74% Concurrency_768Readers_32Writers 15.23 ±3.74% 1296.81 ±4.83% 8412.51% Contention_AcquireReleaseReadLock_When_BgWriteLock 3.81 ±0.81% 3.65 ±0.68% -4.25% Contention_AcquireReleaseWriteLock_When_BgReadLock 3.78 ±0.63% 3.66 ±0.69% -3.20% Contention_UpgradeDowngradeLock_When_BgReadLock 3.76 ±0.72% 3.59 ±0.80% -4.63% Micro_AcquireReleaseReadLock_When_BgReadLock 2578.81 ±0.41% 2566.67 ±0.04% -0.47% Micro_AcquireReleaseReadLock_When_NoLock 2586.77 ±0.12% 2592.37 ±0.03% 0.22% Micro_AcquireReleaseReadLock_When_ReadLock 2777.25 ±0.03% 2770.76 ±0.37% -0.23% Micro_AcquireReleaseUpgradeableReadLock_When_BgReadLock 2595.37 ±0.18% 2605.86 ±0.49% 0.40% Micro_AcquireReleaseUpgradeableReadLock_When_NoLock 2646.20 ±0.12% 2609.14 ±0.13% -1.40% Micro_AcquireReleaseWriteLock_When_NoLock 2548.90 ±0.04% 2621.03 ±0.18% 2.83% Micro_AcquireReleaseWriteLock_When_WriteLock 2650.63 ±0.32% 2660.43 ±0.37% 0.37% Micro_UpgradeDowngradeLock_When_ReadLock 2535.17 ±0.10% 2512.07 ±0.09% -0.91% Timeout_AcquireReleaseReadLock_When_BgWriteLock 4016.95 ±1.00% 4012.96 ±0.23% -0.10% Timeout_AcquireReleaseWriteLock_When_BgReadLock 4014.43 ±0.14% 4119.19 ±0.24% 2.61% Timeout_UpgradeDowngradeLock_When_BgReadLock 3979.77 ±0.07% 3893.49 ±0.04% -2.17% ------------------------------------------------------- --------------- --------------- --------- Total 532.66 ±1.40% 1395.93 ±1.24% 162.07% ``` @ChrisAhna's repro from #12780 Baseline ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5008ms FinalCount=146778099 CountsPerSecond=29303915.9706614 ThreadCount=0002: Elapsed=5013ms FinalCount=144566761 CountsPerSecond=28832836.5040624 ThreadCount=0004: Elapsed=5030ms FinalCount=142987794 CountsPerSecond=28424230.6837138 ThreadCount=0008: Elapsed=5046ms FinalCount=140895614 CountsPerSecond=27919285.270997 ThreadCount=0016: Elapsed=5061ms FinalCount=131126565 CountsPerSecond=25904896.8847266 ThreadCount=0032: Elapsed=5046ms FinalCount=118985913 CountsPerSecond=23578687.3922634 ThreadCount=0064: Elapsed=5077ms FinalCount=87382990 CountsPerSecond=17209966.7389088 ThreadCount=0128: Elapsed=5046ms FinalCount=13983552 CountsPerSecond=2771044.65190866 ThreadCount=0256: Elapsed=5061ms FinalCount=926020 CountsPerSecond=182954.410859513 ThreadCount=0512: Elapsed=5061ms FinalCount=554880 CountsPerSecond=109633.000133682 ThreadCount=1024: Elapsed=5047ms FinalCount=403372 CountsPerSecond=79907.8598817266 ThreadCount=2048: Elapsed=5158ms FinalCount=427853 CountsPerSecond=82937.9986349298 ThreadCount=4096: Elapsed=6843ms FinalCount=1454282 CountsPerSecond=212510.216120728 ``` Changed ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5011ms FinalCount=146023913 CountsPerSecond=29136243.0844929 ThreadCount=0002: Elapsed=5029ms FinalCount=148085765 CountsPerSecond=29443444.9191416 ThreadCount=0004: Elapsed=5030ms FinalCount=147435037 CountsPerSecond=29307748.3158908 ThreadCount=0008: Elapsed=5046ms FinalCount=135669584 CountsPerSecond=26884746.2829748 ThreadCount=0016: Elapsed=5046ms FinalCount=117172253 CountsPerSecond=23219237.5243882 ThreadCount=0032: Elapsed=5077ms FinalCount=123019081 CountsPerSecond=24227324.0685217 ThreadCount=0064: Elapsed=5061ms FinalCount=114036461 CountsPerSecond=22528225.1624931 ThreadCount=0128: Elapsed=5061ms FinalCount=114874563 CountsPerSecond=22694305.2318717 ThreadCount=0256: Elapsed=5077ms FinalCount=111656891 CountsPerSecond=21990952.9701927 ThreadCount=0512: Elapsed=5092ms FinalCount=108080691 CountsPerSecond=21224516.1624796 ThreadCount=1024: Elapsed=5091ms FinalCount=101505410 CountsPerSecond=19936231.0295513 ThreadCount=2048: Elapsed=5168ms FinalCount=90210271 CountsPerSecond=17452847.1281253 ThreadCount=4096: Elapsed=5448ms FinalCount=70413247 CountsPerSecond=12923136.9608753 Monitor Running StandardObjectLock write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5003ms FinalCount=256096538 CountsPerSecond=51188037.8514373 ThreadCount=0002: Elapsed=5014ms FinalCount=247492238 CountsPerSecond=49357843.8844401 ThreadCount=0004: Elapsed=5015ms FinalCount=233682614 CountsPerSecond=46594332.7385551 ThreadCount=0008: Elapsed=5014ms FinalCount=202084181 CountsPerSecond=40295977.2812599 ThreadCount=0016: Elapsed=5015ms FinalCount=160327931 CountsPerSecond=31967286.7931112 ThreadCount=0032: Elapsed=5015ms FinalCount=159973407 CountsPerSecond=31896067.0536476 ThreadCount=0064: Elapsed=5016ms FinalCount=159925779 CountsPerSecond=31881983.1519616 ThreadCount=0128: Elapsed=5018ms FinalCount=160565171 CountsPerSecond=31994598.5148935 ThreadCount=0256: Elapsed=5027ms FinalCount=160346276 CountsPerSecond=31893699.5203601 ThreadCount=0512: Elapsed=5059ms FinalCount=160314101 CountsPerSecond=31688269.2933108 ThreadCount=1024: Elapsed=5106ms FinalCount=160400801 CountsPerSecond=31409342.086444 ThreadCount=2048: Elapsed=5198ms FinalCount=160176757 CountsPerSecond=30814118.2536987 ThreadCount=4096: Elapsed=5398ms FinalCount=160109421 CountsPerSecond=29660512.8891984 ```
kouvel
referenced
this issue
in kouvel/coreclr
Aug 6, 2017
Fixes #12780 The _myLock spin lock runs into some bad scalability issues. For example: - Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. - Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. Took some suggestions from @vancem and landed on the following after some experiments: - Introduced some fairness to _myLock acquisition by deprioritizing attempts to acquire _myLock that are not likely to make progress on the RW lock - Limited spinning in some cases where it is very unlikely that spinning would help ``` Baseline (left) vs Changed (right) Left score Right score ∆ Score % ------------------------------------------------------- --------------- --------------- --------- Concurrency_12Readers 22576.33 ±0.61% 22485.13 ±0.84% -0.40% Concurrency_12Writers 22661.57 ±0.36% 23159.64 ±0.41% 2.20% Concurrency_12Readers_4Writers 1363.40 ±11.54% 7721.94 ±2.66% 466.37% Concurrency_48Readers_8Writers 6.01 ±3.81% 5495.41 ±4.76% 91334.09% Concurrency_192Readers_16Writers 6.99 ±2.55% 4071.60 ±6.46% 58152.74% Concurrency_768Readers_32Writers 15.23 ±3.74% 1296.81 ±4.83% 8412.51% Contention_AcquireReleaseReadLock_When_BgWriteLock 3.81 ±0.81% 3.65 ±0.68% -4.25% Contention_AcquireReleaseWriteLock_When_BgReadLock 3.78 ±0.63% 3.66 ±0.69% -3.20% Contention_UpgradeDowngradeLock_When_BgReadLock 3.76 ±0.72% 3.59 ±0.80% -4.63% Micro_AcquireReleaseReadLock_When_BgReadLock 2578.81 ±0.41% 2566.67 ±0.04% -0.47% Micro_AcquireReleaseReadLock_When_NoLock 2586.77 ±0.12% 2592.37 ±0.03% 0.22% Micro_AcquireReleaseReadLock_When_ReadLock 2777.25 ±0.03% 2770.76 ±0.37% -0.23% Micro_AcquireReleaseUpgradeableReadLock_When_BgReadLock 2595.37 ±0.18% 2605.86 ±0.49% 0.40% Micro_AcquireReleaseUpgradeableReadLock_When_NoLock 2646.20 ±0.12% 2609.14 ±0.13% -1.40% Micro_AcquireReleaseWriteLock_When_NoLock 2548.90 ±0.04% 2621.03 ±0.18% 2.83% Micro_AcquireReleaseWriteLock_When_WriteLock 2650.63 ±0.32% 2660.43 ±0.37% 0.37% Micro_UpgradeDowngradeLock_When_ReadLock 2535.17 ±0.10% 2512.07 ±0.09% -0.91% Timeout_AcquireReleaseReadLock_When_BgWriteLock 4016.95 ±1.00% 4012.96 ±0.23% -0.10% Timeout_AcquireReleaseWriteLock_When_BgReadLock 4014.43 ±0.14% 4119.19 ±0.24% 2.61% Timeout_UpgradeDowngradeLock_When_BgReadLock 3979.77 ±0.07% 3893.49 ±0.04% -2.17% ------------------------------------------------------- --------------- --------------- --------- Total 532.66 ±1.40% 1395.93 ±1.24% 162.07% ``` @ChrisAhna's repro from #12780 Baseline ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5008ms FinalCount=146778099 CountsPerSecond=29303915.9706614 ThreadCount=0002: Elapsed=5013ms FinalCount=144566761 CountsPerSecond=28832836.5040624 ThreadCount=0004: Elapsed=5030ms FinalCount=142987794 CountsPerSecond=28424230.6837138 ThreadCount=0008: Elapsed=5046ms FinalCount=140895614 CountsPerSecond=27919285.270997 ThreadCount=0016: Elapsed=5061ms FinalCount=131126565 CountsPerSecond=25904896.8847266 ThreadCount=0032: Elapsed=5046ms FinalCount=118985913 CountsPerSecond=23578687.3922634 ThreadCount=0064: Elapsed=5077ms FinalCount=87382990 CountsPerSecond=17209966.7389088 ThreadCount=0128: Elapsed=5046ms FinalCount=13983552 CountsPerSecond=2771044.65190866 ThreadCount=0256: Elapsed=5061ms FinalCount=926020 CountsPerSecond=182954.410859513 ThreadCount=0512: Elapsed=5061ms FinalCount=554880 CountsPerSecond=109633.000133682 ThreadCount=1024: Elapsed=5047ms FinalCount=403372 CountsPerSecond=79907.8598817266 ThreadCount=2048: Elapsed=5158ms FinalCount=427853 CountsPerSecond=82937.9986349298 ThreadCount=4096: Elapsed=6843ms FinalCount=1454282 CountsPerSecond=212510.216120728 ``` Changed ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5011ms FinalCount=146023913 CountsPerSecond=29136243.0844929 ThreadCount=0002: Elapsed=5029ms FinalCount=148085765 CountsPerSecond=29443444.9191416 ThreadCount=0004: Elapsed=5030ms FinalCount=147435037 CountsPerSecond=29307748.3158908 ThreadCount=0008: Elapsed=5046ms FinalCount=135669584 CountsPerSecond=26884746.2829748 ThreadCount=0016: Elapsed=5046ms FinalCount=117172253 CountsPerSecond=23219237.5243882 ThreadCount=0032: Elapsed=5077ms FinalCount=123019081 CountsPerSecond=24227324.0685217 ThreadCount=0064: Elapsed=5061ms FinalCount=114036461 CountsPerSecond=22528225.1624931 ThreadCount=0128: Elapsed=5061ms FinalCount=114874563 CountsPerSecond=22694305.2318717 ThreadCount=0256: Elapsed=5077ms FinalCount=111656891 CountsPerSecond=21990952.9701927 ThreadCount=0512: Elapsed=5092ms FinalCount=108080691 CountsPerSecond=21224516.1624796 ThreadCount=1024: Elapsed=5091ms FinalCount=101505410 CountsPerSecond=19936231.0295513 ThreadCount=2048: Elapsed=5168ms FinalCount=90210271 CountsPerSecond=17452847.1281253 ThreadCount=4096: Elapsed=5448ms FinalCount=70413247 CountsPerSecond=12923136.9608753 ``` Monitor ``` Running StandardObjectLock write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5003ms FinalCount=256096538 CountsPerSecond=51188037.8514373 ThreadCount=0002: Elapsed=5014ms FinalCount=247492238 CountsPerSecond=49357843.8844401 ThreadCount=0004: Elapsed=5015ms FinalCount=233682614 CountsPerSecond=46594332.7385551 ThreadCount=0008: Elapsed=5014ms FinalCount=202084181 CountsPerSecond=40295977.2812599 ThreadCount=0016: Elapsed=5015ms FinalCount=160327931 CountsPerSecond=31967286.7931112 ThreadCount=0032: Elapsed=5015ms FinalCount=159973407 CountsPerSecond=31896067.0536476 ThreadCount=0064: Elapsed=5016ms FinalCount=159925779 CountsPerSecond=31881983.1519616 ThreadCount=0128: Elapsed=5018ms FinalCount=160565171 CountsPerSecond=31994598.5148935 ThreadCount=0256: Elapsed=5027ms FinalCount=160346276 CountsPerSecond=31893699.5203601 ThreadCount=0512: Elapsed=5059ms FinalCount=160314101 CountsPerSecond=31688269.2933108 ThreadCount=1024: Elapsed=5106ms FinalCount=160400801 CountsPerSecond=31409342.086444 ThreadCount=2048: Elapsed=5198ms FinalCount=160176757 CountsPerSecond=30814118.2536987 ThreadCount=4096: Elapsed=5398ms FinalCount=160109421 CountsPerSecond=29660512.8891984 ```
kouvel
referenced
this issue
in kouvel/coreclr
Aug 18, 2017
Fixes #12780 The _myLock spin lock runs into some bad scalability issues. For example: - Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. - Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. Took some suggestions from @vancem and landed on the following after some experiments: - Introduced some fairness to _myLock acquisition by deprioritizing attempts to acquire _myLock that are not likely to make progress on the RW lock - Limited spinning in some cases where it is very unlikely that spinning would help ``` Baseline (left) vs Changed (right) Left score Right score ∆ Score % ------------------------------------------------------- --------------- --------------- --------- Concurrency_12Readers 22576.33 ±0.61% 22485.13 ±0.84% -0.40% Concurrency_12Writers 22661.57 ±0.36% 23159.64 ±0.41% 2.20% Concurrency_12Readers_4Writers 1363.40 ±11.54% 7721.94 ±2.66% 466.37% Concurrency_48Readers_8Writers 6.01 ±3.81% 5495.41 ±4.76% 91334.09% Concurrency_192Readers_16Writers 6.99 ±2.55% 4071.60 ±6.46% 58152.74% Concurrency_768Readers_32Writers 15.23 ±3.74% 1296.81 ±4.83% 8412.51% Contention_AcquireReleaseReadLock_When_BgWriteLock 3.81 ±0.81% 3.65 ±0.68% -4.25% Contention_AcquireReleaseWriteLock_When_BgReadLock 3.78 ±0.63% 3.66 ±0.69% -3.20% Contention_UpgradeDowngradeLock_When_BgReadLock 3.76 ±0.72% 3.59 ±0.80% -4.63% Micro_AcquireReleaseReadLock_When_BgReadLock 2578.81 ±0.41% 2566.67 ±0.04% -0.47% Micro_AcquireReleaseReadLock_When_NoLock 2586.77 ±0.12% 2592.37 ±0.03% 0.22% Micro_AcquireReleaseReadLock_When_ReadLock 2777.25 ±0.03% 2770.76 ±0.37% -0.23% Micro_AcquireReleaseUpgradeableReadLock_When_BgReadLock 2595.37 ±0.18% 2605.86 ±0.49% 0.40% Micro_AcquireReleaseUpgradeableReadLock_When_NoLock 2646.20 ±0.12% 2609.14 ±0.13% -1.40% Micro_AcquireReleaseWriteLock_When_NoLock 2548.90 ±0.04% 2621.03 ±0.18% 2.83% Micro_AcquireReleaseWriteLock_When_WriteLock 2650.63 ±0.32% 2660.43 ±0.37% 0.37% Micro_UpgradeDowngradeLock_When_ReadLock 2535.17 ±0.10% 2512.07 ±0.09% -0.91% Timeout_AcquireReleaseReadLock_When_BgWriteLock 4016.95 ±1.00% 4012.96 ±0.23% -0.10% Timeout_AcquireReleaseWriteLock_When_BgReadLock 4014.43 ±0.14% 4119.19 ±0.24% 2.61% Timeout_UpgradeDowngradeLock_When_BgReadLock 3979.77 ±0.07% 3893.49 ±0.04% -2.17% ------------------------------------------------------- --------------- --------------- --------- Total 532.66 ±1.40% 1395.93 ±1.24% 162.07% ``` @ChrisAhna's repro from #12780 Baseline ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5008ms FinalCount=146778099 CountsPerSecond=29303915.9706614 ThreadCount=0002: Elapsed=5013ms FinalCount=144566761 CountsPerSecond=28832836.5040624 ThreadCount=0004: Elapsed=5030ms FinalCount=142987794 CountsPerSecond=28424230.6837138 ThreadCount=0008: Elapsed=5046ms FinalCount=140895614 CountsPerSecond=27919285.270997 ThreadCount=0016: Elapsed=5061ms FinalCount=131126565 CountsPerSecond=25904896.8847266 ThreadCount=0032: Elapsed=5046ms FinalCount=118985913 CountsPerSecond=23578687.3922634 ThreadCount=0064: Elapsed=5077ms FinalCount=87382990 CountsPerSecond=17209966.7389088 ThreadCount=0128: Elapsed=5046ms FinalCount=13983552 CountsPerSecond=2771044.65190866 ThreadCount=0256: Elapsed=5061ms FinalCount=926020 CountsPerSecond=182954.410859513 ThreadCount=0512: Elapsed=5061ms FinalCount=554880 CountsPerSecond=109633.000133682 ThreadCount=1024: Elapsed=5047ms FinalCount=403372 CountsPerSecond=79907.8598817266 ThreadCount=2048: Elapsed=5158ms FinalCount=427853 CountsPerSecond=82937.9986349298 ThreadCount=4096: Elapsed=6843ms FinalCount=1454282 CountsPerSecond=212510.216120728 ``` Changed ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5011ms FinalCount=146023913 CountsPerSecond=29136243.0844929 ThreadCount=0002: Elapsed=5029ms FinalCount=148085765 CountsPerSecond=29443444.9191416 ThreadCount=0004: Elapsed=5030ms FinalCount=147435037 CountsPerSecond=29307748.3158908 ThreadCount=0008: Elapsed=5046ms FinalCount=135669584 CountsPerSecond=26884746.2829748 ThreadCount=0016: Elapsed=5046ms FinalCount=117172253 CountsPerSecond=23219237.5243882 ThreadCount=0032: Elapsed=5077ms FinalCount=123019081 CountsPerSecond=24227324.0685217 ThreadCount=0064: Elapsed=5061ms FinalCount=114036461 CountsPerSecond=22528225.1624931 ThreadCount=0128: Elapsed=5061ms FinalCount=114874563 CountsPerSecond=22694305.2318717 ThreadCount=0256: Elapsed=5077ms FinalCount=111656891 CountsPerSecond=21990952.9701927 ThreadCount=0512: Elapsed=5092ms FinalCount=108080691 CountsPerSecond=21224516.1624796 ThreadCount=1024: Elapsed=5091ms FinalCount=101505410 CountsPerSecond=19936231.0295513 ThreadCount=2048: Elapsed=5168ms FinalCount=90210271 CountsPerSecond=17452847.1281253 ThreadCount=4096: Elapsed=5448ms FinalCount=70413247 CountsPerSecond=12923136.9608753 ``` Monitor ``` Running StandardObjectLock write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5003ms FinalCount=256096538 CountsPerSecond=51188037.8514373 ThreadCount=0002: Elapsed=5014ms FinalCount=247492238 CountsPerSecond=49357843.8844401 ThreadCount=0004: Elapsed=5015ms FinalCount=233682614 CountsPerSecond=46594332.7385551 ThreadCount=0008: Elapsed=5014ms FinalCount=202084181 CountsPerSecond=40295977.2812599 ThreadCount=0016: Elapsed=5015ms FinalCount=160327931 CountsPerSecond=31967286.7931112 ThreadCount=0032: Elapsed=5015ms FinalCount=159973407 CountsPerSecond=31896067.0536476 ThreadCount=0064: Elapsed=5016ms FinalCount=159925779 CountsPerSecond=31881983.1519616 ThreadCount=0128: Elapsed=5018ms FinalCount=160565171 CountsPerSecond=31994598.5148935 ThreadCount=0256: Elapsed=5027ms FinalCount=160346276 CountsPerSecond=31893699.5203601 ThreadCount=0512: Elapsed=5059ms FinalCount=160314101 CountsPerSecond=31688269.2933108 ThreadCount=1024: Elapsed=5106ms FinalCount=160400801 CountsPerSecond=31409342.086444 ThreadCount=2048: Elapsed=5198ms FinalCount=160176757 CountsPerSecond=30814118.2536987 ThreadCount=4096: Elapsed=5398ms FinalCount=160109421 CountsPerSecond=29660512.8891984 ```
kouvel
referenced
this issue
in kouvel/coreclr
Aug 20, 2017
Alternative to and subset of dotnet#13243, fixes #12780 - Prevented waking more than one waiter when only one of them may acquire the lock - Limited spinning in some cases where it is very unlikely that spinning would help The _myLock spin lock runs into some bad scalability issues. For example: 1. Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. 2. Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. This PR does not solve (1), but solves (2) to a degree that could be considered sufficient. dotnet#13243 solves (1) and (2) and for (2) it is still better by order-of-magnitude compared with this PR in several cases that I believe are not extreme, but if the complexity of deprioritization is deemed excessive for the goals then of what I tried so far this is the perhaps simplest and most reasonable way to solve (2). I believe this PR would also be a reasonably low-risk one to port back to .NET Framework.
kouvel
referenced
this issue
in kouvel/coreclr
Aug 20, 2017
Fixes #12780 The _myLock spin lock runs into some bad scalability issues. For example: - Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. - Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. Took some suggestions from @vancem and landed on the following after some experiments: - Introduced some fairness to _myLock acquisition by deprioritizing attempts to acquire _myLock that are not likely to make progress on the RW lock - Limited spinning in some cases where it is very unlikely that spinning would help ``` Baseline (left) vs Changed (right) Left score Right score ∆ Score % ------------------------------------------------------- --------------- --------------- --------- Concurrency_12Readers 22576.33 ±0.61% 22485.13 ±0.84% -0.40% Concurrency_12Writers 22661.57 ±0.36% 23159.64 ±0.41% 2.20% Concurrency_12Readers_4Writers 1363.40 ±11.54% 7721.94 ±2.66% 466.37% Concurrency_48Readers_8Writers 6.01 ±3.81% 5495.41 ±4.76% 91334.09% Concurrency_192Readers_16Writers 6.99 ±2.55% 4071.60 ±6.46% 58152.74% Concurrency_768Readers_32Writers 15.23 ±3.74% 1296.81 ±4.83% 8412.51% Contention_AcquireReleaseReadLock_When_BgWriteLock 3.81 ±0.81% 3.65 ±0.68% -4.25% Contention_AcquireReleaseWriteLock_When_BgReadLock 3.78 ±0.63% 3.66 ±0.69% -3.20% Contention_UpgradeDowngradeLock_When_BgReadLock 3.76 ±0.72% 3.59 ±0.80% -4.63% Micro_AcquireReleaseReadLock_When_BgReadLock 2578.81 ±0.41% 2566.67 ±0.04% -0.47% Micro_AcquireReleaseReadLock_When_NoLock 2586.77 ±0.12% 2592.37 ±0.03% 0.22% Micro_AcquireReleaseReadLock_When_ReadLock 2777.25 ±0.03% 2770.76 ±0.37% -0.23% Micro_AcquireReleaseUpgradeableReadLock_When_BgReadLock 2595.37 ±0.18% 2605.86 ±0.49% 0.40% Micro_AcquireReleaseUpgradeableReadLock_When_NoLock 2646.20 ±0.12% 2609.14 ±0.13% -1.40% Micro_AcquireReleaseWriteLock_When_NoLock 2548.90 ±0.04% 2621.03 ±0.18% 2.83% Micro_AcquireReleaseWriteLock_When_WriteLock 2650.63 ±0.32% 2660.43 ±0.37% 0.37% Micro_UpgradeDowngradeLock_When_ReadLock 2535.17 ±0.10% 2512.07 ±0.09% -0.91% Timeout_AcquireReleaseReadLock_When_BgWriteLock 4016.95 ±1.00% 4012.96 ±0.23% -0.10% Timeout_AcquireReleaseWriteLock_When_BgReadLock 4014.43 ±0.14% 4119.19 ±0.24% 2.61% Timeout_UpgradeDowngradeLock_When_BgReadLock 3979.77 ±0.07% 3893.49 ±0.04% -2.17% ------------------------------------------------------- --------------- --------------- --------- Total 532.66 ±1.40% 1395.93 ±1.24% 162.07% ``` @ChrisAhna's repro from #12780 Baseline ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5008ms FinalCount=146778099 CountsPerSecond=29303915.9706614 ThreadCount=0002: Elapsed=5013ms FinalCount=144566761 CountsPerSecond=28832836.5040624 ThreadCount=0004: Elapsed=5030ms FinalCount=142987794 CountsPerSecond=28424230.6837138 ThreadCount=0008: Elapsed=5046ms FinalCount=140895614 CountsPerSecond=27919285.270997 ThreadCount=0016: Elapsed=5061ms FinalCount=131126565 CountsPerSecond=25904896.8847266 ThreadCount=0032: Elapsed=5046ms FinalCount=118985913 CountsPerSecond=23578687.3922634 ThreadCount=0064: Elapsed=5077ms FinalCount=87382990 CountsPerSecond=17209966.7389088 ThreadCount=0128: Elapsed=5046ms FinalCount=13983552 CountsPerSecond=2771044.65190866 ThreadCount=0256: Elapsed=5061ms FinalCount=926020 CountsPerSecond=182954.410859513 ThreadCount=0512: Elapsed=5061ms FinalCount=554880 CountsPerSecond=109633.000133682 ThreadCount=1024: Elapsed=5047ms FinalCount=403372 CountsPerSecond=79907.8598817266 ThreadCount=2048: Elapsed=5158ms FinalCount=427853 CountsPerSecond=82937.9986349298 ThreadCount=4096: Elapsed=6843ms FinalCount=1454282 CountsPerSecond=212510.216120728 ``` Changed ``` Running NonReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5011ms FinalCount=146023913 CountsPerSecond=29136243.0844929 ThreadCount=0002: Elapsed=5029ms FinalCount=148085765 CountsPerSecond=29443444.9191416 ThreadCount=0004: Elapsed=5030ms FinalCount=147435037 CountsPerSecond=29307748.3158908 ThreadCount=0008: Elapsed=5046ms FinalCount=135669584 CountsPerSecond=26884746.2829748 ThreadCount=0016: Elapsed=5046ms FinalCount=117172253 CountsPerSecond=23219237.5243882 ThreadCount=0032: Elapsed=5077ms FinalCount=123019081 CountsPerSecond=24227324.0685217 ThreadCount=0064: Elapsed=5061ms FinalCount=114036461 CountsPerSecond=22528225.1624931 ThreadCount=0128: Elapsed=5061ms FinalCount=114874563 CountsPerSecond=22694305.2318717 ThreadCount=0256: Elapsed=5077ms FinalCount=111656891 CountsPerSecond=21990952.9701927 ThreadCount=0512: Elapsed=5092ms FinalCount=108080691 CountsPerSecond=21224516.1624796 ThreadCount=1024: Elapsed=5091ms FinalCount=101505410 CountsPerSecond=19936231.0295513 ThreadCount=2048: Elapsed=5168ms FinalCount=90210271 CountsPerSecond=17452847.1281253 ThreadCount=4096: Elapsed=5448ms FinalCount=70413247 CountsPerSecond=12923136.9608753 ``` Monitor ``` Running StandardObjectLock write lock contention scenarios with process affinity set to 0x0000000000000fff... Initial output should appear in approximately 5 seconds... ThreadCount=0001: Elapsed=5003ms FinalCount=256096538 CountsPerSecond=51188037.8514373 ThreadCount=0002: Elapsed=5014ms FinalCount=247492238 CountsPerSecond=49357843.8844401 ThreadCount=0004: Elapsed=5015ms FinalCount=233682614 CountsPerSecond=46594332.7385551 ThreadCount=0008: Elapsed=5014ms FinalCount=202084181 CountsPerSecond=40295977.2812599 ThreadCount=0016: Elapsed=5015ms FinalCount=160327931 CountsPerSecond=31967286.7931112 ThreadCount=0032: Elapsed=5015ms FinalCount=159973407 CountsPerSecond=31896067.0536476 ThreadCount=0064: Elapsed=5016ms FinalCount=159925779 CountsPerSecond=31881983.1519616 ThreadCount=0128: Elapsed=5018ms FinalCount=160565171 CountsPerSecond=31994598.5148935 ThreadCount=0256: Elapsed=5027ms FinalCount=160346276 CountsPerSecond=31893699.5203601 ThreadCount=0512: Elapsed=5059ms FinalCount=160314101 CountsPerSecond=31688269.2933108 ThreadCount=1024: Elapsed=5106ms FinalCount=160400801 CountsPerSecond=31409342.086444 ThreadCount=2048: Elapsed=5198ms FinalCount=160176757 CountsPerSecond=30814118.2536987 ThreadCount=4096: Elapsed=5398ms FinalCount=160109421 CountsPerSecond=29660512.8891984 ```
kouvel
referenced
this issue
in dotnet/coreclr
Aug 23, 2017
* Improve ReaderWriterLockSlim scalability Subset of #13243, fixes #12780 - Prevented waking more than one waiter when only one of them may acquire the lock - Limited spinning in some cases where it is very unlikely that spinning would help The _myLock spin lock runs into some bad scalability issues. For example: 1. Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. 2. Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. This PR does not solve (1), but solves (2) to a degree that could be considered sufficient. #13243 solves (1) and (2) and for (2) it is still better by order-of-magnitude compared with this PR in several cases that I believe are not extreme, but if the complexity of deprioritization is deemed excessive for the goals then of what I tried so far this is the perhaps simplest and most reasonable way to solve (2). I believe this PR would also be a reasonably low-risk one to port back to .NET Framework.
kouvel
referenced
this issue
in kouvel/coreclr
Aug 27, 2017
Fixes #12780 The _myLock spin lock runs into some bad scalability issues. For example: - Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. - Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. Took some suggestions from @vancem and landed on the following after some experiments: - Introduced some fairness to _myLock acquisition by deprioritizing attempts to acquire _myLock that are not likely to make progress on the RW lock - Limited spinning in some cases where it is very unlikely that spinning would help
kouvel
referenced
this issue
in kouvel/coreclr
Sep 5, 2017
Fixes #12780 The _myLock spin lock runs into some bad scalability issues. For example: - Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. - Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. Took some suggestions from @vancem and landed on the following after some experiments: - Introduced some fairness to _myLock acquisition by deprioritizing attempts to acquire _myLock that are not likely to make progress on the RW lock - Limited spinning in some cases where it is very unlikely that spinning would help
kouvel
referenced
this issue
in kouvel/coreclr
Sep 13, 2017
Fixes #12780 The _myLock spin lock runs into some bad scalability issues. For example: - Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. - Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. Took some suggestions from @vancem and landed on the following after some experiments: - Introduced some fairness to _myLock acquisition by deprioritizing attempts to acquire _myLock that are not likely to make progress on the RW lock - Limited spinning in some cases where it is very unlikely that spinning would help
kouvel
referenced
this issue
in dotnet/coreclr
Sep 14, 2017
Improve ReaderWriterLockSlim scalability Fixes #12780 The _myLock spin lock runs into some bad scalability issues. For example: - Readers can starve writers for an unreasonable amount of time. Typically there would be more readers than writers, and it doesn't take many readers to starve a writer. On my machine with 6 cores (12 logical processors with hyperthreading), 6 to 16 reader threads attempting to acquire the spin lock to acquire or release a read lock can starve one writer thread from acquiring the spin lock for several or many seconds. The issue magnifies with more reader threads. - Readers and especially writers that hold the RW lock can be starved from even releasing their lock. Releasing an RW lock requires acquiring the spin lock, so releasers are easliy starved by acquirers. How badly they are starved depends on how many acquirers there are, and it doesn't take many to show a very noticeable scalability issue. Often, these acquirers are those that would not be able to acquire the RW lock until one or more releasers release their lock, so the acquirers effectively starve themselves. Took some suggestions from @vancem and landed on the following after some experiments: - Introduced some fairness to _myLock acquisition by deprioritizing attempts to acquire _myLock that are not likely to make progress on the RW lock - Limited spinning in some cases where it is very unlikely that spinning would help
AnthonySteele
referenced
this issue
in AnthonySteele/aws-sdk-net
Jul 24, 2018
…nwait, which scales badly if high contention is generated across lots of threads https://github.com/dotnet/coreclr/issues/12780 We think that this is behind one of our production issues with CPU shooting up to 100% under load in ASP.NET core 2.1. .NET reference assignment is an atomic operation https://stackoverflow.com/questions/11745440/what-operations-are-atomic-in-c There is no half-way state that needs locking
ghost
locked as resolved and limited conversation to collaborators
Dec 21, 2020
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Experiments indicate that ReaderWriterLockSlim basically "falls apart" (i.e., generates catastrophically bad throughput) whenever it is subjected to high contention generated across a large number of processors.
I have been using the following test app to investigate this (against .NET 4.7): https://gist.github.com/ChrisAhna/37731dc47c30fa4080e9b21f5158bd14
Running on a 16-core machine with Hyperthreading enabled (i.e., %NUMBER_OF_PROCESSORS% is 32), ReaderWriterLockSlim EnterWriteLock/ExitWriteLock generates the following aggregate throughput as contention is added:
Note that the total throughput with 512 threads is about 13000 times slower than the total throughput with 2 threads.
In contrast, on the same machine, using "lock (obj) { ... }" instead of the EnterWriteLock/ExitWriteLock sequence generates throughput at high thread counts that is not even 2 times slower than throughput with 2 threads:
@vancem @kouvel
The text was updated successfully, but these errors were encountered: