Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL 2 uses half the number of cores on AMD Threadripper 3990X #5423

Open
AAlMutairi opened this issue Jun 16, 2020 · 164 comments
Open

WSL 2 uses half the number of cores on AMD Threadripper 3990X #5423

AAlMutairi opened this issue Jun 16, 2020 · 164 comments
Assignees

Comments

@AAlMutairi
Copy link

AAlMutairi commented Jun 16, 2020

Environment

Windows build number: Microsoft Windows [Version 10.0.19041.329]
Your Distribution version: Ubuntu: 20.04
WSL 2

Steps to reproduce

I am using AMD threadripper 3990x in my PC. when I use the command lscpu I get the following

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              32
Socket(s):                       1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD Ryzen Threadripper 3990X 64-Core Processor
.
.
.

Also when I use the command nproc, I get 64.

However, using both openmpi and mpich to run parallel job, mpi uses only 32 cores (half real cores). For this test I used the following code (copied from: https://mpitutorial.com/tutorials/mpi-hello-world/)

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}

Expected behavior

.
.
.
Hello world from processor Ubuntu, rank 10 out of 64 processors
Hello world from processor Ubuntu, rank 18 out of 64 processors
Hello world from processor Ubuntu, rank 23 out of 64 processors
.
.
.

Actual behavior

.
.
.
Hello world from processor Ubuntu, rank 10 out of 32 processors
Hello world from processor Ubuntu, rank 18 out of 32 processors
Hello world from processor Ubuntu, rank 23 out of 32 processors
.
.
.
@AAlMutairi
Copy link
Author

Not sure if it is relevant but I am experiencing the same issue in hyper-V too.

@WSLUser
Copy link

WSLUser commented Jun 19, 2020

It's the kernel config. Look at https://github.com/microsoft/WSL2-Linux-Kernel/tree/master/Microsoft. In the config for x86_64 you will see it's set to 64. This is standard from the Linux kernel. What you can do is update the config to match the number of cores you have. Ideally WSL would do a check on the number of CPU cores and update the config appropriately in .wslconfig. For now this is a manual process.

@AAlMutairi
Copy link
Author

@WSLUser , thanks for the answer. Just to confirm, you meant updating config-wsl since I couldn't find .wslconfig. if this is the case, I believe the part of interest is the following:

CONFIG_NR_CPUS_RANGE_BEGIN=2
CONFIG_NR_CPUS_RANGE_END=512
CONFIG_NR_CPUS_DEFAULT=64
CONFIG_NR_CPUS=256

Are you suggesting that despite the range, WSL 2 uses the default value as a max value?

My apologies if I misunderstood your suggestion.

@WSLUser
Copy link

WSLUser commented Jun 19, 2020

You have to create .wslconfig. https://docs.microsoft.com/en-us/windows/wsl/release-notes#build-18945

@AAlMutairi
Copy link
Author

AAlMutairi commented Jun 19, 2020

@WSLUser thank you again for the suggestion and sorry for the misunderstanding on my part. I tried your method to change the number of processors. It works when I decrease the number of processors but unfortunately, it doesn't work passed the 64 processor (which is equivalent to 32 physical processors). it seems to still limit me to half of number of physical cores (64/2 = 32).

@sanastasiou
Copy link

@WSLUser Does this work also for multiple CPUs? i.e. Dual Xeon setup?

@WSLUser
Copy link

WSLUser commented Jun 19, 2020

Not sure. @craigloewen-msft would probably know better. In your case it appears the kernel config itself needs updating. You should be able to override the original value in .wslconfig as well. You should see the option in the release notes. And yes (sorry I didn't answer before), it's using the default value. So you'll overwrite it. I don't recommend going above 256.

@AAlMutairi
Copy link
Author

@WSLUser Thanks for all the help, I guess I will wait for the kernel to be updated.
@benhillis @therealkenc, would you be able to let us know if such fix to the kernel will be added to the next build?
@sanastasiou Did you have the chance to try the .wslconfig method?

@sanastasiou
Copy link

@AAlMutairi not yet,not sure if it applies to dual cpu setups as well. If it does, I'll try.

@AAlMutairi
Copy link
Author

Any updates or fixes to test?

@mozram
Copy link

mozram commented Jun 30, 2020

It affect compiling also when running make -j. Only half of CPUs used whereas WSL1 does not have this issue. Ryzen 2600, Ubuntu 20.04 WSL2

@sanastasiou
Copy link

This basically blocks any usage of WSL 2, even if I check out my repo there, I lose 50% of my processing power.. That's simply a no go.

@AAlMutairi
Copy link
Author

@mozram, it is surprising that it was working for you in WSL 1. Unfortunately for me, Both don't work for me.

@AAlMutairi
Copy link
Author

@sanastasiou , hopefully any fix can work for both WSL 2 and hyper-V since the issue persist in both.

@AAlMutairi
Copy link
Author

Interestingly, even when I used mpich on windows, it only sees 32 physical cores. I guess this issue isn't just limited to WSL or hyper-V

@AAlMutairi
Copy link
Author

Any updates?

@sanastasiou
Copy link

Changing WSL config has 0 effect whatsoever. 2nd CPU is not recognized.

@AAlMutairi
Copy link
Author

AAlMutairi commented Jul 13, 2020

I tried contacting AMD customer support about the issue and if they have any fixes but to no avail.

@onomatopellan
Copy link

Ben said "I am already looking into this, AMD brought this to my attention as well."
So be patient.

@AAlMutairi
Copy link
Author

Just to help narrow the issue, this issue seems to effect the 3990x alone since John from the AMD community test running WSL2 on his 3970x and got the following results:
pastedImage_1

it shows it detected all 32 physical cores (shown next to cores per socket) and all 64 logical cores (next to CPUs). not sure how helpful it is, but I thought it might help.

@sanastasiou
Copy link

Not quite true, I have a dual xeon setup and it only detects one of them. So it doesn't affect only 3990X

@AAlMutairi
Copy link
Author

@sanastasiou , my apologies, I meant within the AMD thread ripper line, only the 3990x is affected. by the way, did you test if the same issue persists when you use hyper-V? because it is the case for me.

@ykim362
Copy link
Member

ykim362 commented Jul 22, 2020

I have the same issue with Intel Xeon. I have two 6242R CPUs (2 sockets), and only 1 socket is available from WSL 2.

@AAlMutairi
Copy link
Author

@ykim362 Which Windows are you using? Do you have the same issue with hyper-V?

@sanastasiou
Copy link

@AAlMutairi how do I enable/how can I check this with Hyper-V?

@AAlMutairi
Copy link
Author

@sanastasiou it is similar to WSL in which you enable it through the "Turn Windows features on or off" as shown here:
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/enable-hyper-v#enable-the-hyper-v-role-through-settings

Then use the "Hyper-V quick create" as shown here (based on old windows but it is still the same):
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/quick-create-virtual-machine

I guess you can install an Ubuntu VM for now.

if you right click on the VM you can access it setting and change the number of cores and sockets you want. Then you can run and test.

@ykim362
Copy link
Member

ykim362 commented Jul 22, 2020

@AAlMutairi I was able to configure the number of virtual cores (2 x physical cores) with Hyper-V (on windows 10 enterprise). But, I am not sure it's really using all CPUs, or just doing virtually showing 2x more cpus. It was 40 logical cores (20 physical cores) by default, and even after I increased the number to 80 logical cores, it only shows as 1 socket.

lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 40
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz
Stepping: 7
CPU MHz: 3092.733
BogoMIPS: 6185.46
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-79

@nguyentrangiabao05
Copy link

I’m sorry, I feel so dumb because I can’t find the Project.sln file in the build folder. :(( It would be great if you can help me in this regard. image

You need run cmake udner widnows and you need have visual studio installed. WSL service is a windows process.

Thank you for your helping. I have been running the cmake under the Windows terminal. However i can find the method to generate a x64 release build from the Project.sln. Would you mind to give more instruction about this action?

@xieyubo
Copy link

xieyubo commented Feb 22, 2024

Thank you for your helping. I have been running the cmake under the Windows terminal. However i can find the method to generate a x64 release build from the Project.sln. Would you mind to give more instruction about this action?

You can launch Project.sln by Visual Studio, Chose Release and x64 on the toolbar, then click Build menu select Build Solution

image

@aramor
Copy link

aramor commented Feb 29, 2024

Thank you for your helping. I have been running the cmake under the Windows terminal. However i can find the method to generate a x64 release build from the Project.sln. Would you mind to give more instruction about this action?

You can launch Project.sln by Visual Studio, Chose Release and x64 on the toolbar, then click Build menu select Build Solution

image

Can you please send compiled dll? Cant undestand how to compile by myself

@niltecedu
Copy link

Hey guys got a dual Intel Xeon Gold 6430 with the same problem only utilising 64 out of my 128 cores; any chance that this pushed to upstream? We cant really compile from scratch as its a corporate environment however it still a slightly bigger issue for us as it wont be approved in a package push

@alice-comfy
Copy link

I'm hoping for some positive news given David Cutler mentioned in his interview with Dave's Garage that he's rocking a 96 core PC as his personal system. Given the need for either a Hyper-V change, or to override the scheduler (from root to core), I don't think this will happen before the next major version (25H2? or 12)

@niltecedu
Copy link

Its just a overriding the scheduler thats the issue, you also need that custom dll to capture all cores, hoping they fix it soon.

@sirredbeard
Copy link
Contributor

Can you please send compiled dll? Cant undestand how to compile by myself

I have compiled .dlls for x86 and arm64 under releases on my fork.

@Andymion97
Copy link

Thank you for your helping. I have been running the cmake under the Windows terminal. However i can find the method to generate a x64 release build from the Project.sln. Would you mind to give more instruction about this action?

You can launch Project.sln by Visual Studio, Chose Release and x64 on the toolbar, then click Build menu select Build Solution

image

Hey, I'm pretty much a noob so please excuse the dumb questions.
So I need to do this because I run a system with two Xeon CPUs (cores are only one problem, I only have half of the RAM available for Docker Desktop because it's linked to the second CPU Socket).

Can I do the hack without redoing everything that already is built on top of WSL?
Also I'm confused about the steps before the "generate project". Do I need to copy+paste the src/computecore.cpp to somewhere?

Sorry, I'm currently trying to pull of a project server for my finals for LLMs and I should've started by installing Ubuntu instead of Windows Server 2022, it's been a massive pain to even get Docker running. Now using Docker Desktop despite it not being supported but if it works it works...

@Andymion97
Copy link

Never mind, I figured it out. Used Threads went from 44 to 88 but it still only uses 128GB of RAM instead of 256GB though.

And I can't get Open-Webui to run in CPU only mode, it always uses the Titan Xp which isn't that great for very large models

@alice-comfy
Copy link

You can change ram with .wslconfig

@jinzzasol
Copy link

jinzzasol commented Jun 12, 2024

Yes, I believe it can truely utilize all the cores, but you need change the hyperV scheduler type to "Core". I added an instruction about how to do it: xieyubo/WSL2@41511c9

@xieyubo I was trying to follow your instruction, but where should I create Project.sln file? Windows? WSL2? and how can I build it? Sorry I'm just no familiar with it.

@Thernn88
Copy link

Confirming this is still an issue with the latest kernel update. I had a small hope that we might see a fix as I saw some stuff about hyper-v fixes in the previous kernel update attempt.

@KYU49
Copy link

KYU49 commented Aug 22, 2024

I think this is a wsl's bug. I think in , it might invoke or api to get how many cores and pass this number to api. / can only get the number of cores in current cpu group. In windows, the max cores in a single cpu group is 64, so the vm created by for WSL only has 64 cores at max.wslservice.exe``GetSystemInfo()``GetLogicalProcessorInformation()``HcsCreateComputeSystem()``GetSystemInfo()``GetLogicalProcessorInformation()``HcsCreateComputeSystem()

It's better to invoke to get number of cores. This api supports get all cores cross cpu groups. I have a hack to resolve this issue: xieyubo/WSL2@9bdce81GetLogicalProcessorInformationEx()

Now all cores works:

The issue being mentioned in this comment is about WSL2 not recognizing >64 cores. Unfortunately, the issue the author is facing is likely due to a problem with multiple CPUs, and it probably cannot be resolved with computecore.dll and Core Schedular (at least, it didn't resolve the issue in my environment). In the case of multiple CPUs, even though nproc and lscpu display the correct number of processors, MPI only uses cores for 1 CPU in Root Schedular and half of threads in Core Schedular.

Root Schedular

image
image

Core Schedular

image
image

@muratyurdakul75
Copy link

Hi

Unfortunately, there is no development on this issue. There are constant requests on this issue, but they are marked as "duplicate requests" and directed to old notifications. Is there any development?

Actually, there are multiple expectations regarding this NUMA issue;
1> More than one NUMA needs to be supported.
2> It needs to be able to work with more than 64 processors.
3> We need to be able to decide which NUMA it will work on. (There may be those who want it to work on only one NUMA)

It would be really great if such a development could be made.

Thank you in advance for your efforts.

@benhillis
Copy link
Member

@muratyurdakul75 - I have a fix in PR for this - stay tuned...

@benhillis benhillis self-assigned this Nov 29, 2024
@muratyurdakul75
Copy link

@muratyurdakul75 - I have a fix in PR for this - stay tuned...

I can't wait. I hope there will be an improvement as soon as possible. :)

@pnthai88
Copy link

pnthai88 commented Nov 30, 2024 via email

@muratyurdakul75
Copy link

I’m almost retired to wait for the fix 😵

On Sat, 30 Nov 2024 at 15:30 Murat Yurdakul @.> wrote: @muratyurdakul75 https://github.com/muratyurdakul75 - I have a fix in PR for this - stay tuned... I can't wait. I hope there will be an improvement as soon as possible. :) — Reply to this email directly, view it on GitHub <#5423 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCRJQDXROHBUO44J4M4GUL2DFZTPAVCNFSM4N75246KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TENJQHA4DQNRRG4ZA . You are receiving this because you commented.Message ID: @.>

I'm in the same situation. :)

@pnthai88
Copy link

pnthai88 commented Nov 30, 2024 via email

@muratyurdakul75
Copy link

@muratyurdakul75 - I have a fix in PR for this - stay tuned...

Is there an estimated date?

@benhillis
Copy link
Member

PR just got merged, hoping to tag a new prelease build this week.

@muratyurdakul75
Copy link

PR just got merged, hoping to tag a new prelease build this week.

Excellent news. Thank you very much for your efforts. :)

@Thernn88
Copy link

Thernn88 commented Dec 3, 2024

Thank you!!!

@botelhs
Copy link

botelhs commented Dec 27, 2024

PR just got merged, hoping to tag a new prelease build this week.

Hey I tried to use https://github.com/microsoft/WSL/releases/tag/2.4.8
and WSL now sees the cores (i.e. in HTOP sees it) but running multithreaded workloads looks like in the VM it is using all cores, but the windows task scheduler is only scheduling on the first processor group (first 64 threads). I am using the same ubuntu install as before, does this require a reinstall of the distro or am I somehow not installing the prerelease correctly (I installed using the x64 msi, seemed to go smoothly and shows 2.4.8 version).

@botelhs
Copy link

botelhs commented Dec 27, 2024

Ah ok I was able to utilize all cores by switching from root scheduler to Core as described in the repo above:
https://github.com/xieyubo/WSL2

bcdedit /set hypervisorschedulertype Core

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests