-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process.ProcessorAffinity returns a 32-bit value on ARM64 #28051
Comments
Was it possible that 16 of the cores were asleep? As mentioned here |
Looks like the CLR already fixed this (when getting core count for GC) dotnet/coreclr#18053 As for Environment.GetProcessorCount it eventually gets to |
One place this shows up is in the spectralnorm-3 perf test, which uses Console.WriteLine($"ProcessArchitecture={System.Runtime.InteropServices.RuntimeInformation.ProcessArchitecture}");
Console.WriteLine($"IntPtrSize = {IntPtr.Size}");
ulong a = (ulong) System.Diagnostics.Process.GetCurrentProcess().ProcessorAffinity;
Console.WriteLine("ProcessorAffinity = " + a.ToString("X16"));
Console.WriteLine($"ProcessorCount = {Environment.ProcessorCount}"); gives
|
I can see what's going on. The System.Diagnostics.Process.ProcessorAffinity is implemented on Linux in CoreFX in SystemNative_SchedGetAffinity function. And there is a bug in there. See the intptr_t bits = 0;
for (int cpu = 0; cpu < maxCpu; cpu++)
{
if (CPU_ISSET(cpu, &set))
{
bits |= (1u << cpu);
}
} It should be e.g. |
I've created a PR with a fix: dotnet/corefx#33825 |
With the fix from @janvorli we can remove "restrictive affinity" from the list. I just run spectralnorm-3 on my 12 core x64 Windows machine (6 real cores, HTx2) with Concurrency Visualizer Profiler enabled ( The CV tells me that on Windows we use up to 5 cores out of 12: And that 90% of the time is spent for synchronization: So @AndyAyersMS you are most probably right about higher synchronization costs on ARM. |
I just read the code of the benchmark. As of today, it creates an array of just 100 doubles and divides it into 100 is a very small input here. When I set it to and it spends less time for synchronizaiton: So it looks like that as of today this benchmark is measuring the perf of synchronization? |
@danmosemsft I did not know that this is possible. I guess that this is why ARM is having lower energy consumption. @brianrob if we ever run the perf tests for ARM we need to make sure this setting is off |
I opened dotnet/corefx#33838 for the other issue |
@kouvel are you/have you looked at synchronization costs on ARM? I am not famliar with this benchmark but |
@danmosemsft this is arm64, not arm. I guess you meant that, but wanted to make it clear for others. |
@janvorli thanks, it's confusing that we often call 32 bit ARM just ARM. Is it common for 32 and 64 bit ARM to have quite different performance issues? |
@danmosemsft they have different instruction set with different performance characteristics, so I would expect them to be quite different. Btw, the official naming is ARM and ARM64. |
I haven't looked at synchronization costs on arm64 and I'm not aware of issues related to synchronization there. I suspect that spectralnorm-3 would be mostly measuring |
If anyone has ready access to an Ubuntu arm64 machine, could you please run the following code? Just want to make sure the thread pool is also getting the correct number of processors, otherwise it would explain why it would be so slow. Console.WriteLine($"ProcessorCount: {Environment.ProcessorCount}");
int w, c;
ThreadPool.GetMinThreads(out w, out c);
Console.WriteLine($"ThreadPool min thread counts: {w}, {c}"); |
Last line is
|
Thanks, that looks correct, I'll take a closer look. |
@adamsitnik it might make sense to move this to a new issue |
The function was incorrectly using unsigned int constant 1 as a value that is shifted as a mask for each processor present or-ed to the final mask. So on machines with more cores than 32, it was returning max 32 set bits.
When we run the following code on ARM64 machine with 48 cores (Ubuntu) without setting the CPU affinity in explicit way:
We get
00000000FFFFFFFF
which is2^32 - 1
while it should be2^48 - 1
I don't know if this is specific to ARM64 or Ubuntu or 64 bit in general. I just don't have an access to a machine with more than 32 core to test.
@AndyAyersMS have hit this issue when he was benchmarking .NET Core 3.0 on ARM
The text was updated successfully, but these errors were encountered: