-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't compile on Windows #33
Comments
I don't have a Windows PC to test on, so I'm not sure on the exact build steps. But it seems to be looking for Here's someone who got it running in WSL at least. |
At least searching the file, it seems that cl.exe exist. Pretty weird it wouldn't be on the path. I will try WSL and see how it goes, but also gonna keep trying on how to find a fix for pure Windows. |
I've gotten to build on windows 10 without WSL. If you're not seeing cl.exe on the path you probably need to run the developer console shell script to set the environment. I also needed to add If the cpp extension compilation gets into a bad place and you need to clean it, it gets stored somewhere under |
Okay, managed to build the kernel with @allenbenz suggestions and Visual Studio Code 2022. Vistual Studio Code 2019 just refused to work. So, on Windows and exllama (gs 16,19): 30B on a single 4090 does 30-35 tokens/s 65B on multigpu (2x4090) does 13-15 tokens/s For comparison 30B on a single 4090 does 5-7 tokens/s with GPTQ on Windows 65B on multigpu (2x4090) does 2 tokens/s with GPTQ on Windows A HUGE improvement But very important, this is with Hardware Accelerated GPU disabled. Enabling this gets a really nice boost on multigpu uses. Gonna post the results after enabling it as soon as possible. |
Okay, and as promised, after enabling Hardware Accelerated GPU Scheduling on Windows 65B on multigpu (2x4090) does 22+ tokens/s Unreal
Model used is Aeala_VicUnlocked-alpaca-65b-4bit_128g. This also helped on Single card, where on 30B
I just can't believe it. The speeds are here if the author wants to add it on the main post. |
What CPU are you using? |
@turboderp A Ryzen 7 7800X3D and 64GB of RAM at 6200Mhz. |
So about as fast as mine (12900K) single threaded, bit slower but with somewhat faster RAM, and you're getting about the same speeds. More evidence of that damn CPU bottleneck. But I'm glad it's working so well. I think I need to look into Hardware Accelerated GPU Scheduling, to see if Linux already does something equivalent by default, because that's a big difference just from the scheduler. Would anyone be interested in doing a little writeup/howto for getting it running on Windows without WSL that I can link from the README.md? |
Here's what works for me:
For step 5, it would probably be better to kludge the base repo here so it just works on both, without having to have users go in and fiddle with the scripts. Something like this should do: |
@EyeDeck thanks for this, adding
Is a must, else you would have to open exllama always from the developer console of VS2022. Also, I can confirm that it also works with CUDA 12.1 (installed the nightily with cu121) and it runs without issues. |
Checking for I could also make it check if cl.exe is in the path, otherwise look for cl.exe in at least Although, I would much prefer a pull request from someone who could write this up and actually test it. |
Hopefully #36 should do the trick. |
Closing as PR #36 fixes the compilation issues. |
Hi there, really amazing work that you're doing here.
I'm trying to run either the benchmark or the webui to test (I have 2x4090), but it seems it can't find the compiler or something similar?
The complete error is:
I have CUDA 11.8 and CUDA 12.1 on my system. I do specify when building for gtpq for example (with
$env:CUDA_PATH="CUDA_DIR"
, but here I'm not sure if it uses those or self built. Also, when specifying the CUDA version, it doesn't work either.Maybe I'm missing something here?
Python 3.10.10
Windows 11 Pro
RTX 4090 x2
AMD Ryzen 7 7800X3D
VS2019
The text was updated successfully, but these errors were encountered: