Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't compile on Windows #33

Closed
Panchovix opened this issue Jun 5, 2023 · 13 comments
Closed

Can't compile on Windows #33

Panchovix opened this issue Jun 5, 2023 · 13 comments

Comments

@Panchovix
Copy link
Contributor

Panchovix commented Jun 5, 2023

Hi there, really amazing work that you're doing here.

I'm trying to run either the benchmark or the webui to test (I have 2x4090), but it seems it can't find the compiler or something similar?

The complete error is:

python .\webui\app.py
F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
  warnings.warn(f'Error checking compiler version for {compiler}: {error}')
INFO: Could not find files for the given pattern(s)
Traceback (most recent call last):
  File "F:\ChatIAs\exllama\webui\app.py", line 9, in <module>
    import model_init
  File "F:\ChatIAs\exllama\model_init.py", line 1, in <module>
    from model import ExLlama, ExLlamaCache, ExLlamaConfig
  File "F:\ChatIAs\exllama\model.py", line 5, in <module>
    import cuda_ext
  File "F:\ChatIAs\exllama\cuda_ext.py", line 14, in <module>
    exllama_ext = load(
  File "F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1283, in load
    return _jit_compile(
  File "F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1508, in _jit_compile
    _write_ninja_file_and_build_library(
  File "F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1610, in _write_ninja_file_and_build_library
    _write_ninja_file_to_build_library(
  File "F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2057, in _write_ninja_file_to_build_library
    _write_ninja_file(
  File "F:\ChatIAs\exllama\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2200, in _write_ninja_file
    cl_paths = subprocess.check_output(['where',
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

I have CUDA 11.8 and CUDA 12.1 on my system. I do specify when building for gtpq for example (with $env:CUDA_PATH="CUDA_DIR", but here I'm not sure if it uses those or self built. Also, when specifying the CUDA version, it doesn't work either.

Maybe I'm missing something here?

Python 3.10.10
Windows 11 Pro
RTX 4090 x2
AMD Ryzen 7 7800X3D
VS2019

C:\Program Files (x86)\Microsoft Visual Studio\2019\Community>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
@turboderp
Copy link
Owner

I don't have a Windows PC to test on, so I'm not sure on the exact build steps. But it seems to be looking for cl.exe and not finding it. So it's not the CUDA stuff that's failing but plain C++. Since you have VS2019 installed, maybe it's just not in your path?

Here's someone who got it running in WSL at least.

@Panchovix
Copy link
Contributor Author

I don't have a Windows PC to test on, so I'm not sure on the exact build steps. But it seems to be looking for cl.exe and not finding it. So it's not the CUDA stuff that's failing but plain C++. Since you have VS2019 installed, maybe it's just not in your path?

Here's someone who got it running in WSL at least.

At least searching the file, it seems that cl.exe exist. Pretty weird it wouldn't be on the path.

I will try WSL and see how it goes, but also gonna keep trying on how to find a fix for pure Windows.

@allenbenz
Copy link
Contributor

I've gotten to build on windows 10 without WSL.

If you're not seeing cl.exe on the path you probably need to run the developer console shell script to set the environment.
Something like %comspec% /k "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"

I also needed to add extra_ldflags=['cublas.lib'], to the exllama_ext = load( block in cuda_ext.py. You can also add verbose=True to that block to help diagnose issues.

If the cpp extension compilation gets into a bad place and you need to clean it, it gets stored somewhere under %LOCALAPPDATA%\torch_extensions\torch_extensions\Cache

@Panchovix
Copy link
Contributor Author

Panchovix commented Jun 6, 2023

Okay, managed to build the kernel with @allenbenz suggestions and Visual Studio Code 2022. Vistual Studio Code 2019 just refused to work.

So, on Windows and exllama (gs 16,19):

30B on a single 4090 does 30-35 tokens/s

65B on multigpu (2x4090) does 13-15 tokens/s

For comparison

30B on a single 4090 does 5-7 tokens/s with GPTQ on Windows

65B on multigpu (2x4090) does 2 tokens/s with GPTQ on Windows

A HUGE improvement

But very important, this is with Hardware Accelerated GPU disabled. Enabling this gets a really nice boost on multigpu uses. Gonna post the results after enabling it as soon as possible.

@Panchovix
Copy link
Contributor Author

Panchovix commented Jun 6, 2023

Okay, and as promised, after enabling Hardware Accelerated GPU Scheduling on Windows

65B on multigpu (2x4090) does 22+ tokens/s

Unreal

Response generated in 2.7 seconds, 59 tokens, 22.15 tokens/second

Response generated in 3.5 seconds, 86 tokens, 24.62 tokens/second

Response generated in 2.6 seconds, 64 tokens, 24.19 tokens/second

Model used is Aeala_VicUnlocked-alpaca-65b-4bit_128g.

This also helped on Single card, where on 30B

Response generated in 1.6 seconds, 66 tokens, 42.21 tokens/second

Response generated in 1.8 seconds, 74 tokens, 41.69 tokens/second

I just can't believe it. The speeds are here if the author wants to add it on the main post.

@turboderp
Copy link
Owner

What CPU are you using?

@Panchovix
Copy link
Contributor Author

@turboderp A Ryzen 7 7800X3D and 64GB of RAM at 6200Mhz.

@turboderp
Copy link
Owner

turboderp commented Jun 6, 2023

So about as fast as mine (12900K) single threaded, bit slower but with somewhat faster RAM, and you're getting about the same speeds. More evidence of that damn CPU bottleneck. But I'm glad it's working so well.

I think I need to look into Hardware Accelerated GPU Scheduling, to see if Linux already does something equivalent by default, because that's a big difference just from the scheduler.

Would anyone be interested in doing a little writeup/howto for getting it running on Windows without WSL that I can link from the README.md?

@EyeDeck
Copy link
Contributor

EyeDeck commented Jun 6, 2023

Here's what works for me:

  1. Install MSVC 2022: https://visualstudio.microsoft.com/downloads/
  • You can choose to install the whole Visual Studio 2022 IDE, or alternatively just the Build Tools for Visual Studio 2022 package (make sure Desktop development with C++ is ticked in the installer), it doesn't really matter which.
  • Track down where cl.exe lives, and add that directory to your Path environment variable:
    • With the IDE, it'll probably be in:
      C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\<some-version-number>\bin\Hostx64\x64
    • With Build Tools, it'll probably be in:
      C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\<some-version-number>\bin\Hostx64\x64
    • Restart PowerShell/cmd if it's already running, and verify that you can run cl in the shell.
      It should print the compiler version, and not error.
  1. Install the appropriate version of PyTorch: https://pytorch.org/get-started/locally/
  2. Install CUDA Toolkit, (11.7 and 11.8 both seem to work, just make sure to match PyTorch's Compute Platform version):
  1. Follow the directions in exllama's readme to install dependencies, and clone the repo as normal.
  2. Open cuda_ext.py and find where ext_llama = load( is called. You need to add this kwarg, after the existing name and sources kwargs:
    extra_ldflags=['cublas.lib'],
  • It may be helpful to uncomment the # verbose = True, kwarg line while you're here, for a more detailed compiler log.
  • Without this step, you'd get a linker error about unresolved external symbol cublasHgemm for some reason.
  1. Run the repo as normal, python test_chatbot.py [...] / python webui/app.py [...] etc

For step 5, it would probably be better to kludge the base repo here so it just works on both, without having to have users go in and fiddle with the scripts. Something like this should do:
extra_ldflags=['cublas.lib'] if os.name == 'nt' else [],

@Panchovix
Copy link
Contributor Author

Panchovix commented Jun 6, 2023

@EyeDeck thanks for this, adding

C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\<some-version-number>\bin\Hostx64\x64

Is a must, else you would have to open exllama always from the developer console of VS2022.

Also, I can confirm that it also works with CUDA 12.1 (installed the nightily with cu121) and it runs without issues.

@turboderp
Copy link
Owner

Checking for os.name == "nt" makes sense, yes.

I could also make it check if cl.exe is in the path, otherwise look for cl.exe in at least C:\Program Files\Microsoft Visual Studio\2022\ and C:\Program Files (x86)\Microsoft Visual Studio\2022\ and append the path accordingly.

Although, I would much prefer a pull request from someone who could write this up and actually test it.

@EyeDeck
Copy link
Contributor

EyeDeck commented Jun 6, 2023

Hopefully #36 should do the trick.

@Panchovix
Copy link
Contributor Author

Closing as PR #36 fixes the compilation issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants