-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use System.Threading.Lock in Kestrel #57236
base: main
Are you sure you want to change the base?
Conversation
The team members can run those for you - it needs write permission on the repo. |
In theory using |
/benchmark plaintext aspnet-citrine-lin kestrel |
Yes that is my impression as well. |
Benchmark started for plaintext on aspnet-citrine-lin with kestrel. Logs: link |
plaintext - aspnet-citrine-lin
|
CI failure seems likely to be addressed by #57269. |
My concern is that under lock contention the new lock does not seem to perform that well in my tests on x64. Looks nice on the first benchmark, then I repeat the tests and consistently get 30% slower results for all following benchmarks |
I wouldn't expect these locks to be too contended under normal usage with the exception of Has @kouvel Do you have any idea why @ladeak might be seeing a slowdown with |
@ladeak, can you share your tests? Do the tests do some warmup to ensure that methods get tiered up? There can be slight differences in performance due to different code gen and TLS access, but when using the |
@kouvel let me attach the benchmarks and the results: BenchmarkRunner.Run<Benchmarks>();
[SimpleJob]
public class Benchmarks
{
private int _value = 0;
private const int N = 10000;
private readonly object _regularLock = new();
private readonly Lock _newLock = new();
[Benchmark]
public async Task RegularLock()
{
var t1 = Task.Run(Update);
var t2 = Task.Run(Update);
var t3 = Task.Run(Update);
var t4 = Task.Run(Update);
await Task.WhenAll(t1, t2, t3, t4);
void Update()
{
for (int i = 0; i < N; i++)
{
lock (_regularLock)
_value++;
}
}
}
[Benchmark]
public async Task NewLock()
{
var t1 = Task.Run(Update);
var t2 = Task.Run(Update);
var t3 = Task.Run(Update);
var t4 = Task.Run(Update);
await Task.WhenAll(t1, t2, t3, t4);
void Update()
{
for (int i = 0; i < N; i++)
{
lock (_newLock)
_value++;
}
}
}
} This is the result I get most of the times:
The numbers slightly vary for RegularLock 1.6-2.9ms and NewLock 2.4-3.1ms, for example another run:
I expect larger variance due to the nature of the tests. I tried different Very rarely I get results as follows (or even the NewLock being faster slightly):
Single TaskWhen I remove the parallel tasks and have an
3 TasksWith 3 parallel tasks (instead of 4), I get results that are pretty close, but still favor of the old lock consistently:
2 TasksLooks similar to 3 tasks:
|
I'm seeing opposite results on my machine, but in any case maybe an issue is that 10000 is small enough that one started task progresses significantly before the next one is started and since uncontended lock acquisitions are much faster, they could progress a lot, and that would I imagine introduce a lot of variability to the results. So it's hard to say how much contention is actually happening during each iteration of the test. I typically test it differently, having multiple threads continuously doing the contentious work, and warming up and measuring changes to some count over time in some other thread. Maybe with BDN a setup/cleanup could achieve something like that. With this kind of lock usage, I would not expect |
What do you see with very large N, like 1,000,000? Too large an N may cause the method to not be called enough to get tiered up, but anyway just curious. |
@kouvel with
and after a second run:
and a 3rd run:
|
Ok, maybe there's a processor difference here. Based on the processor spec it seems like a low-wattage mobile processor. Maybe there are differences there that I'm not aware of. I'll try to post another test case that's similar to yours but without BDN, curious what you would see there. May be a bit before I get back though. |
Take your time, I am going to sleep in the next few hours. |
@ladeak, can you try this on your machine: using System.Diagnostics;
static class Program
{
private static void Main()
{
TestMonitor();
TestLock();
}
private static void TestMonitor()
{
object lockObj = new();
int value = 0;
Test(
"Monitor",
ref value,
n =>
{
for (int i = 0; i < n; i++)
{
lock (lockObj)
value++;
}
});
}
private static void TestLock()
{
Lock lockObj = new();
int value = 0;
Test(
"Lock",
ref value,
n =>
{
for (int i = 0; i < n; i++)
{
lock (lockObj)
value++;
}
});
}
private static void Test(string name, ref int value, Action<int> update)
{
bool stop = false;
ThreadStart threadMain = () =>
{
while (!stop)
update(100);
};
const int ThreadCount = 4;
var threads = new Thread[ThreadCount];
for (int i = 0; i < ThreadCount; i++)
{
var t = new Thread(threadMain);
t.IsBackground = true;
t.Start();
threads[i] = t;
}
var sw = new Stopwatch();
Thread.Sleep(500); // warmup
for (int i = 0; i < 8; i++)
{
sw.Restart();
int startValue = value;
Thread.Sleep(500);
sw.Stop();
int changes = value - startValue;
Console.WriteLine($"{name,10}: {sw.Elapsed.TotalNanoseconds / changes,10:0.000} ns/change");
}
stop = true;
foreach (var t in threads)
t.Join();
}
} On my machine (AMD Ryzen 9 5950X), I'm seeing numbers similar to these:
|
and then I run it again:
and again:
and a 4th time:
I think the results on this look better in the sense as in BDN the new |
Ok maybe just a processor difference there, my results are varying a bit too between runs, but mostly similar. There are more things that can be improved for Lock, such as dotnet/runtime#79485. I'm not too worried about the difference there. |
@kouvel in your suggested measurement code I updated the
2nd run
3rd run:
I repeated the tests a few times with different thread counts, and it seems with |
Yea it is possible actually. In the test, the more number of times one thread can reacquire the lock in sequence, the better the result would be. For instance, using a Sleep(1) upon contention would make the test show very good results, but lock acquisition latency would get very high. Slight differences in timing could affect the result, it's probably more nuanced to benchmark performance under contention. I typically randomize how long the lock is held to help with stabilizing the results. There are some small spin-waiting differences in Lock to yield the thread more quickly upon contention, which could also affect the timing, and helps to not waste time spinning when other threads can do more useful work. |
@BrennanConroy and @halter73 I think all questions are clarified. |
If we're worried about regressing contended performance, should we continue to use Similar logic goes for The other locks seem like a purer win, since they will almost always be accessed from the application thread, and the locks are mostly there to handle early aborts. |
I have been trying a few other things out, ie. WSL (Ubuntu 22.04.03), same machine 4 threads (the variance is huge between executions):
With 2 threads the results are pretty similar as on Windows, but with somewhat more variance between runs:
|
Some updates: with the latest preview the |
src/Servers/Kestrel/Core/src/Internal/Http2/FlowControl/InputFlowControl.cs
Show resolved
Hide resolved
- Replacing lock objects with Use System.Threading.Lock - HTTP3 has a few dictionaries/lists locked that is not addressed in this PR
fcab2b3
to
8878dad
Compare
src/Servers/Kestrel/Core/src/Internal/Http2/FlowControl/InputFlowControl.cs
Outdated
Show resolved
Hide resolved
src/Servers/Kestrel/perf/Microbenchmarks/Http2/FlowControl/InputFlowControl.cs
Outdated
Show resolved
Hide resolved
src/Servers/Kestrel/Core/test/Http2/FlowControl/InputFlowControlTests.cs
Outdated
Show resolved
Hide resolved
src/Servers/Kestrel/Core/src/Internal/Http2/FlowControl/InputFlowControl.cs
Outdated
Show resolved
Hide resolved
@halter73 would you have further suggestions? |
Thanks for all the work you've put into this! I think it would be better to first merge the System.Threading.Lock changes and separate the lock-free changes into another PR. And I'd keep the The lock-free changes look good to me too, but I would appreciate like @BrennanConroy @amcasey or maybe even @benaadams looking at it. Putting it in a separate PR to make will make it easier for others to review it on its own and make it easier to revert if need be. |
bfc2e12
to
5d7287b
Compare
Reverted this PR to use the scope of using |
Use System.Threading.Lock in Kestrel
Replacing lock
object
s with System.Threading.LockDescription
Lock
type performed marginally better compared to the regular Monitor Lock when there is no contention./benchmark plaintext aspnet-citrine-lin kestrel
or is there a particular benchmark that might be interesting to run? Not sure if there is anything designed for locking and contention.Lock
to a method acceptingobject
and locking on theobject
results a problematic behavior as I see)Linked #56794