Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows cpuSet for application has different performance for different textual representations of the same cpuSet #576

Open
RobertHenry6bev opened this issue Apr 20, 2023 · 0 comments

Comments

@RobertHenry6bev
Copy link
Contributor

I run the dotnet/aspnet/teche/plaintext benchmark so the PlatformBenchmarks application is on a modern Intel server machine, running modern Windows 11 2022. The load generator is on an adjacent server in the same rack; the network link is not a bottleneck. The server has 2 sockets per board, 64 cores per socket, with 2 way SMT enabled, for a total of 128 "logical processors". There are 2 NUMA domains.

The local apparent maximum rps is to set the cpuSet of the application PlatformBenchmarks to 26 cores in the same NUMA domain.

The cpuSet specification of "0-25" runs 483krps.
The cpuSet specification of "0-0,1-1,2-2,3-3,4-4,5-5,6-6,7-7,8-8,9-9,10-10,11-11,12-12,13-13,14-14,15-15,16-16,17-17,18-18,19-19,20-20,21-21,22-22,23-23,24-24,25-25" runs slower by 10%, eg at 447krps.

This is repeatable.

I would expect semantically identical specifications of the cpuSet to have equivalent behavior at runtime.

Speculation: the long form "0-0,1-1, ..." incrementally tells the kernel what the cpuset is. perhaps incremental change runs afoul of the NUMA domain?

I need cpuSet to "do the right thing" so I can experiment with NUMA splits, uncore routing, and more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant