Can't compile on Windows #33

Panchovix · 2023-06-05T22:47:36Z

Hi there, really amazing work that you're doing here.

I'm trying to run either the benchmark or the webui to test (I have 2x4090), but it seems it can't find the compiler or something similar?

The complete error is:

python .\webui\app.py
F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
  warnings.warn(f'Error checking compiler version for {compiler}: {error}')
INFO: Could not find files for the given pattern(s)
Traceback (most recent call last):
  File "F:\ChatIAs\exllama\webui\app.py", line 9, in <module>
    import model_init
  File "F:\ChatIAs\exllama\model_init.py", line 1, in <module>
    from model import ExLlama, ExLlamaCache, ExLlamaConfig
  File "F:\ChatIAs\exllama\model.py", line 5, in <module>
    import cuda_ext
  File "F:\ChatIAs\exllama\cuda_ext.py", line 14, in <module>
    exllama_ext = load(
  File "F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1283, in load
    return _jit_compile(
  File "F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1508, in _jit_compile
    _write_ninja_file_and_build_library(
  File "F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1610, in _write_ninja_file_and_build_library
    _write_ninja_file_to_build_library(
  File "F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2057, in _write_ninja_file_to_build_library
    _write_ninja_file(
  File "F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2200, in _write_ninja_file
    cl_paths = subprocess.check_output(['where',
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

I have CUDA 11.8 and CUDA 12.1 on my system. I do specify when building for gtpq for example (with $env:CUDA_PATH="CUDA_DIR", but here I'm not sure if it uses those or self built. Also, when specifying the CUDA version, it doesn't work either.

Maybe I'm missing something here?

Python 3.10.10
Windows 11 Pro
RTX 4090 x2
AMD Ryzen 7 7800X3D
VS2019

C:\Program Files (x86)\Microsoft Visual Studio\2019\Community>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

The text was updated successfully, but these errors were encountered:

turboderp · 2023-06-05T23:29:04Z

I don't have a Windows PC to test on, so I'm not sure on the exact build steps. But it seems to be looking for cl.exe and not finding it. So it's not the CUDA stuff that's failing but plain C++. Since you have VS2019 installed, maybe it's just not in your path?

Here's someone who got it running in WSL at least.

Panchovix · 2023-06-05T23:35:48Z

I don't have a Windows PC to test on, so I'm not sure on the exact build steps. But it seems to be looking for cl.exe and not finding it. So it's not the CUDA stuff that's failing but plain C++. Since you have VS2019 installed, maybe it's just not in your path?

Here's someone who got it running in WSL at least.

At least searching the file, it seems that cl.exe exist. Pretty weird it wouldn't be on the path.

I will try WSL and see how it goes, but also gonna keep trying on how to find a fix for pure Windows.

allenbenz · 2023-06-06T04:49:38Z

I've gotten to build on windows 10 without WSL.

If you're not seeing cl.exe on the path you probably need to run the developer console shell script to set the environment.
Something like %comspec% /k "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"

I also needed to add extra_ldflags=['cublas.lib'], to the exllama_ext = load( block in cuda_ext.py. You can also add verbose=True to that block to help diagnose issues.

If the cpp extension compilation gets into a bad place and you need to clean it, it gets stored somewhere under %LOCALAPPDATA%\torch_extensions\torch_extensions\Cache

Panchovix · 2023-06-06T06:14:08Z

Okay, managed to build the kernel with @allenbenz suggestions and Visual Studio Code 2022. Vistual Studio Code 2019 just refused to work.

So, on Windows and exllama (gs 16,19):

30B on a single 4090 does 30-35 tokens/s

65B on multigpu (2x4090) does 13-15 tokens/s

For comparison

30B on a single 4090 does 5-7 tokens/s with GPTQ on Windows

65B on multigpu (2x4090) does 2 tokens/s with GPTQ on Windows

A HUGE improvement

But very important, this is with Hardware Accelerated GPU disabled. Enabling this gets a really nice boost on multigpu uses. Gonna post the results after enabling it as soon as possible.

Panchovix · 2023-06-06T07:27:11Z

Okay, and as promised, after enabling Hardware Accelerated GPU Scheduling on Windows

65B on multigpu (2x4090) does 22+ tokens/s

Unreal

Response generated in 2.7 seconds, 59 tokens, 22.15 tokens/second

Response generated in 3.5 seconds, 86 tokens, 24.62 tokens/second

Response generated in 2.6 seconds, 64 tokens, 24.19 tokens/second

Model used is Aeala_VicUnlocked-alpaca-65b-4bit_128g.

This also helped on Single card, where on 30B

Response generated in 1.6 seconds, 66 tokens, 42.21 tokens/second

Response generated in 1.8 seconds, 74 tokens, 41.69 tokens/second

I just can't believe it. The speeds are here if the author wants to add it on the main post.

turboderp · 2023-06-06T08:16:52Z

What CPU are you using?

Panchovix · 2023-06-06T08:35:09Z

@turboderp A Ryzen 7 7800X3D and 64GB of RAM at 6200Mhz.

turboderp · 2023-06-06T11:57:37Z

So about as fast as mine (12900K) single threaded, bit slower but with somewhat faster RAM, and you're getting about the same speeds. More evidence of that damn CPU bottleneck. But I'm glad it's working so well.

I think I need to look into Hardware Accelerated GPU Scheduling, to see if Linux already does something equivalent by default, because that's a big difference just from the scheduler.

Would anyone be interested in doing a little writeup/howto for getting it running on Windows without WSL that I can link from the README.md?

EyeDeck · 2023-06-06T14:09:29Z

Here's what works for me:

Install MSVC 2022: https://visualstudio.microsoft.com/downloads/

You can choose to install the whole Visual Studio 2022 IDE, or alternatively just the Build Tools for Visual Studio 2022 package (make sure Desktop development with C++ is ticked in the installer), it doesn't really matter which.
Track down where cl.exe lives, and add that directory to your Path environment variable:
- With the IDE, it'll probably be in:
  C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\<some-version-number>\bin\Hostx64\x64
- With Build Tools, it'll probably be in:
  C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\<some-version-number>\bin\Hostx64\x64
- Restart PowerShell/cmd if it's already running, and verify that you can run cl in the shell.
  It should print the compiler version, and not error.

Install the appropriate version of PyTorch: https://pytorch.org/get-started/locally/
Install CUDA Toolkit, (11.7 and 11.8 both seem to work, just make sure to match PyTorch's Compute Platform version):

Follow the directions in exllama's readme to install dependencies, and clone the repo as normal.
Open cuda_ext.py and find where ext_llama = load( is called. You need to add this kwarg, after the existing name and sources kwargs:
extra_ldflags=['cublas.lib'],

It may be helpful to uncomment the # verbose = True, kwarg line while you're here, for a more detailed compiler log.
Without this step, you'd get a linker error about unresolved external symbol cublasHgemm for some reason.

Run the repo as normal, python test_chatbot.py [...] / python webui/app.py [...] etc

For step 5, it would probably be better to kludge the base repo here so it just works on both, without having to have users go in and fiddle with the scripts. Something like this should do:
extra_ldflags=['cublas.lib'] if os.name == 'nt' else [],

Panchovix · 2023-06-06T16:49:03Z

@EyeDeck thanks for this, adding

C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\<some-version-number>\bin\Hostx64\x64

Is a must, else you would have to open exllama always from the developer console of VS2022.

Also, I can confirm that it also works with CUDA 12.1 (installed the nightily with cu121) and it runs without issues.

turboderp · 2023-06-06T17:18:08Z

Checking for os.name == "nt" makes sense, yes.

I could also make it check if cl.exe is in the path, otherwise look for cl.exe in at least C:\Program Files\Microsoft Visual Studio\2022\ and C:\Program Files (x86)\Microsoft Visual Studio\2022\ and append the path accordingly.

Although, I would much prefer a pull request from someone who could write this up and actually test it.

EyeDeck · 2023-06-06T19:46:40Z

Hopefully #36 should do the trick.

Panchovix · 2023-06-06T21:10:16Z

Closing as PR #36 fixes the compilation issues.

Panchovix mentioned this issue Jun 6, 2023

Add exllama support (janky) oobabooga/text-generation-webui#2444

Merged

shinomakoi mentioned this issue Jun 6, 2023

Dumb question but... shinomakoi/magi_llm_gui#1

Closed

EyeDeck mentioned this issue Jun 6, 2023

Improve Windows compatibility #36

Merged

Panchovix closed this as completed Jun 6, 2023

henk717 mentioned this issue Aug 13, 2023

Trouble building wheel for windows AutoGPTQ/AutoGPTQ#252

Closed

ZanMax mentioned this issue Apr 18, 2024

Run on CPU without AVX2 #315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't compile on Windows #33

Can't compile on Windows #33

Panchovix commented Jun 5, 2023 •

edited

Loading

turboderp commented Jun 5, 2023

Panchovix commented Jun 5, 2023

allenbenz commented Jun 6, 2023

Panchovix commented Jun 6, 2023 •

edited

Loading

Panchovix commented Jun 6, 2023 •

edited

Loading

turboderp commented Jun 6, 2023

Panchovix commented Jun 6, 2023

turboderp commented Jun 6, 2023 •

edited

Loading

EyeDeck commented Jun 6, 2023 •

edited

Loading

Panchovix commented Jun 6, 2023 •

edited

Loading

turboderp commented Jun 6, 2023

EyeDeck commented Jun 6, 2023

Panchovix commented Jun 6, 2023

Can't compile on Windows #33

Can't compile on Windows #33

Comments

Panchovix commented Jun 5, 2023 • edited Loading

turboderp commented Jun 5, 2023

Panchovix commented Jun 5, 2023

allenbenz commented Jun 6, 2023

Panchovix commented Jun 6, 2023 • edited Loading

Panchovix commented Jun 6, 2023 • edited Loading

turboderp commented Jun 6, 2023

Panchovix commented Jun 6, 2023

turboderp commented Jun 6, 2023 • edited Loading

EyeDeck commented Jun 6, 2023 • edited Loading

Panchovix commented Jun 6, 2023 • edited Loading

turboderp commented Jun 6, 2023

EyeDeck commented Jun 6, 2023

Panchovix commented Jun 6, 2023

Panchovix commented Jun 5, 2023 •

edited

Loading

Panchovix commented Jun 6, 2023 •

edited

Loading

Panchovix commented Jun 6, 2023 •

edited

Loading

turboderp commented Jun 6, 2023 •

edited

Loading

EyeDeck commented Jun 6, 2023 •

edited

Loading

Panchovix commented Jun 6, 2023 •

edited

Loading