Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to build on windows? #103

Closed
Zerogoki00 opened this issue Mar 13, 2023 · 22 comments
Closed

How to build on windows? #103

Zerogoki00 opened this issue Mar 13, 2023 · 22 comments
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers windows Issues specific to Windows

Comments

@Zerogoki00
Copy link

Please give instructions. There is nothing in README but it says that it supports it

@ggerganov ggerganov added documentation Improvements or additions to documentation good first issue Good for newcomers labels Mar 13, 2023
@cgcha
Copy link

cgcha commented Mar 13, 2023

At this point, there's support for CMake. The Python segments of the README should basically be the same. Once you install it, you can run

cmake -S . -B build/ -D CMAKE_BUILD_TYPE=Release

cmake --build build/ --config Release

I'm not actually sure if you need CMAKE_BUILD_TYPE=Release for the first command, but it ran for me.

Afterwards, the exe files should be in the build/Release folder, and you can call them in place of ./quantize and ./main

.\build\Release\quantize.exe .\models\7B\ggml-model-f16.bin .\models\7B\ggml-model-q4_0.bin 2

.\build\Release\llama.exe -m .\models\7B\ggml-model-q4_0.bin -t 8 -n 128

The current README points to a shell script for quantizing, but you can refer to an older version of the README for manual instructions.

@cgcha
Copy link

cgcha commented Mar 13, 2023

I usually run Linux, so I'm pretty unfamiliar with CMake, and there are probably better conventions for how to do this cleanly. I also tried everything in WSL and it seems to work fine.

@fgblanch
Copy link

Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:

set CC=C:\Strawberry\c\bin\gcc.exe
set CXX=C:\Strawberry\c\bin\g++.exe 
make
quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin  2
main.exe -m q4_0.bin -t 8 -n 128 

@akshay-verma
Copy link

akshay-verma commented Mar 14, 2023

I will recommend using WSL2 on Windows, that's what I used and everything worked fine.
I followed the steps for running the model from here -
https://til.simonwillison.net/llms/llama-7b-m2

@YongeBai
Copy link

Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:

set CC=C:\Strawberry\c\bin\gcc.exe
set CXX=C:\Strawberry\c\bin\g++.exe 
make
quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin  2
main.exe -m q4_0.bin -t 8 -n 128 

Tried these steps, ran into this error. Any ideas?

process_begin: CreateProcess(NULL, uname -s, ...) failed.
Makefile:2: pipe: No error
process_begin: CreateProcess(NULL, uname -p, ...) failed.
Makefile:6: pipe: No error
process_begin: CreateProcess(NULL, uname -m, ...) failed.
Makefile:10: pipe: No error
/usr/bin/bash: cc: command not found
I llama.cpp build info:
I UNAME_S:
I UNAME_P:
I UNAME_M:
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC
I LDFLAGS:
I CC:
I CXX: g++.exe (i686-posix-dwarf, Built by strawberryperl.com project) 8.3.0

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o
process_begin: CreateProcess(NULL, cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o, ...) failed.
make (e=2): The system cannot find the file specified.
make: *** [Makefile:186: ggml.o] Error 2

@fgblanch
Copy link

Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:

  • Install make distributed with chocolatey: choco install make

set CC=C:\Strawberry\c\bin\gcc.exe

set CXX=C:\Strawberry\c\bin\g++.exe

make

quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin 2

main.exe -m q4_0.bin -t 8 -n 128

Tried these steps, ran into this error. Any ideas?

process_begin: CreateProcess(NULL, uname -s, ...) failed.

Makefile:2: pipe: No error

process_begin: CreateProcess(NULL, uname -p, ...) failed.

Makefile:6: pipe: No error

process_begin: CreateProcess(NULL, uname -m, ...) failed.

Makefile:10: pipe: No error

/usr/bin/bash: cc: command not found

I llama.cpp build info:

I UNAME_S:

I UNAME_P:

I UNAME_M:

I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2

I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC

I LDFLAGS:

I CC:

I CXX: g++.exe (i686-posix-dwarf, Built by strawberryperl.com project) 8.3.0

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o

process_begin: CreateProcess(NULL, cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o, ...) failed.

make (e=2): The system cannot find the file specified.

make: *** [Makefile:186: ggml.o] Error 2

It seems you forgot to set gcc as CC command. Try running:

set CC=C:\Strawberry\c\bin\gcc.exe

@kaliber91
Copy link

main:` prompt: 'The first man on the moon was'
main: number of tokens in prompt = 8
     1 -> ''
  1576 -> 'The'
   937 -> ' first'
   767 -> ' man'
   373 -> ' on'
   278 -> ' the'
 18786 -> ' moon'
   471 -> ' was'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


The first man on the moon was a geologist, and he brought his hammer.
Inside Out is an amazing movie that will take you through all kinds of emotions in its 90 minute run time (and maybe even more during your afterthoughts). The film tells about RileyÔÇÖs journey when she moves from Minnesota to San Francisco for a new job opportunity and how her parents, boyfriend Oliver Tate (!) and friends help her cope with that.
The animation looks great as always in Pixar productions but even more importantly the characters feel believable ÔÇô if you would have asked me before I watched Inside

main: mem per token = 14565444 bytes
main:     load time =  1157.11 ms
main:   sample time =   114.25 ms
main:  predict time = 19469.45 ms / 144.22 ms per token
main:    total time = 21031.82 ms

It works great on Windows using the CMake. Though -t 16 is no faster than -t 8 Ryzen 9 5950x. I regenerated the prompt couple of times on 7B, and about half the time it gets it right.

@Christoph-Wagner
Copy link

The current README points to a shell script for quantizing, but you can refer to an older version of the README for manual instructions.

param([string]$modelPath, [switch]$removeF16) 

Get-ChildItem $modelPath -Filter ggml-model-f16.bin* | 
Foreach-Object {
    $newName = $_.FullName.Replace("f16","q4_0");
    Start-Process -FilePath ".\build\Release\quantize.exe" -ArgumentList $_.FullName, $newName, "2" -Wait 
    if ($removeF16) {
        Remove-Item $_.FullName
    }
}

Call it like this

.\quantize.ps1 -modelPath "C:\PathToModels\65B" or .\quantize.ps1 -modelPath "C:\PathToModels\65B" -removeF16

Just thought I’d share this quickly thrown together powershell script for the Windows version of quantize.sh

@akshay-verma
Copy link

@kaliber91 7B was terrible for me as well. 13B was a bit better.

@Reelix
Copy link

Reelix commented Mar 15, 2023

Solving some common issues people might come across on the latest version of Python when installing the requirements.

This is specifically here as installed Windows versions of Python have compatibility issues with the chosen packages.

python -m pip install numpy
pip3 install torch -f https://download.pytorch.org/whl/torch_stable.html (About a 3GB download)
pip install .\sentencepiece-0.1.97-cp311-cp311-win_amd64.whl

The sentencepiece-0.1.97-cp311-cp311-win_amd64.whl file is from here inside the wheelhouse folder.

@kassane
Copy link
Contributor

kassane commented Mar 15, 2023

If you're running WSL2, it requires the creation or modification of a .wslconfig file in your user folder.

%USERPROFILE%\.wslconfig:

[wsl2]
memory=12GB
processors=6
swap=4GB

My Setup

  • RAM: 16GB DDR4
  • CPU: Ryzen 7 7500G
  • SSD: 480GB
  • OS: Windows 11

Based on this configuration. I succeeded in making the model conversions. However, when running main it still slows down when reading the model and continuously consumes a lot of memory.

edit ---

@kassane
Copy link
Contributor

kassane commented Mar 19, 2023

@Brawlence
Copy link

I've manually built it using g++ via cmake, make from the msys2 distro

@Nephistos
Copy link

Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:

set CC=C:\Strawberry\c\bin\gcc.exe
set CXX=C:\Strawberry\c\bin\g++.exe 
make
quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin  2
main.exe -m q4_0.bin -t 8 -n 128 

Tried these steps, ran into this error. Any ideas?

process_begin: CreateProcess(NULL, uname -s, ...) failed. Makefile:2: pipe: No error process_begin: CreateProcess(NULL, uname -p, ...) failed. Makefile:6: pipe: No error process_begin: CreateProcess(NULL, uname -m, ...) failed. Makefile:10: pipe: No error /usr/bin/bash: cc: command not found I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC I LDFLAGS: I CC: I CXX: g++.exe (i686-posix-dwarf, Built by strawberryperl.com project) 8.3.0

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o process_begin: CreateProcess(NULL, cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o, ...) failed. make (e=2): The system cannot find the file specified. make: *** [Makefile:186: ggml.o] Error 2

I got the same error ("The system cannot find the file specified") while trying to start the build with CMake, despite I put the following at the beginning of my CMakeLists.txt file :
set( CMAKE_CXX_COMPILER "C:/MinGW/bin/g++.exe" )
set( CMAKE_C_COMPILER "C:/MinGW/bin/gcc.exe" )

Also, when I try g++ --version, i can see that's i'm on 6.3.0 so my WinGW is well installed. Any idea what could go wrong ? :(

@CoderRC
Copy link

CoderRC commented Mar 29, 2023

There is an very easy way to build on windows using mingw32 compilation in msys2.

  1. Download msys2-x86_64-20230318 from https://www.msys2.org/
  2. Open the file click next, next, wait for install to complete, then press finish
  3. Run C:\msys64\mingw64.exe
  4. Write the commands to install the appropriate files:
    pacman -S git
    pacman -S mingw-w64-x86_64-gcc
    pacman -S make
  5. Clone library for POSIX functions that llama.cpp needs:
    git clone https://github.com/CoderRC/libmingw32_extended.git
    cd libmingw32_extended
  6. Build the library:
    mkdir build
    cd build
    ../configure
    make
  7. Install the library:
    make install
  8. Change directory:
    cd ~
  9. Clone llama.cpp:
    git clone https://github.com/ggerganov/llama.cpp
    cd llama.cpp
  10. Build llama.cpp:
    make LDFLAGS='-D_POSIX_MAPPED_FILES -lmingw32_extended' CFLAGS='-D_POSIX_MAPPED_FILES -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -mfma -mf16c -mavx -mavx2' CXXFLAGS='-D_POSIX_MAPPED_FILES -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function'

@12lxr
Copy link

12lxr commented Apr 4, 2023

At this point, there's support for CMake. The Python segments of the README should basically be the same. Once you install it, you can run

cmake -S . -B build/ -D CMAKE_BUILD_TYPE=Release

cmake --build build/ --config Release

I'm not actually sure if you need CMAKE_BUILD_TYPE=Release for the first command, but it ran for me.

Afterwards, the exe files should be in the build/Release folder, and you can call them in place of ./quantize and ./main

.\build\Release\quantize.exe .\models\7B\ggml-model-f16.bin .\models\7B\ggml-model-q4_0.bin 2

.\build\Release\llama.exe -m .\models\7B\ggml-model-q4_0.bin -t 8 -n 128

The current README points to a shell script for quantizing, but you can refer to an older version of the README for manual instructions.

hello, i can't find quantize.exe and llama.exe. only llama.lib in \build\Release
why?

@TortueSandwich
Copy link

. . .

hello, i can't find quantize.exe and llama.exe. only llama.lib in \build\Release why?

Hey, all the .exe files will be located in /llama.cpp/build/bin/ after running the cmake commands. You just need to copy and paste them into the /llama.cpp/ directory.

@iMountTai
Copy link

Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:

set CC=C:\Strawberry\c\bin\gcc.exe
set CXX=C:\Strawberry\c\bin\g++.exe 
make
quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin  2
main.exe -m q4_0.bin -t 8 -n 128 

@fgblanch Look forward to your help, thank you!
process_begin: CreateProcess(NULL, uname -s, ...) failed. Makefile:2: pipe: No error process_begin: CreateProcess(NULL, uname -p, ...) failed. Makefile:6: pipe: No error process_begin: CreateProcess(NULL, uname -m, ...) failed. Makefile:10: pipe: No error I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -march=native -mtune=native I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -march=native -mtune=native I LDFLAGS: I CC: gcc.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0 I CXX: g++.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0 …………………………………………………… llama.cpp:246:22: warning: unknown conversion type character 'l' in format [-Wformat=] llama.cpp:246:22: warning: too many arguments for format [-Wformat-extra-args] llama.cpp: In instantiation of 'T checked_mul(T, T) [with T = unsigned int]': llama.cpp:363:72: required from here llama.cpp:246:22: warning: unknown conversion type character 'l' in format [-Wformat=] llama.cpp:246:22: warning: unknown conversion type character 'l' in format [-Wformat=] llama.cpp:246:22: warning: too many arguments for format [-Wformat-extra-args] **make: *** [Makefile:146: llama.o] Error 1**

@arcadiancomp
Copy link

You saved me hours! Thank you so much.

I expanded on your make command just a little to include OpenCL support:

make LLAMA_CLBLAST=1 LDFLAGS='-D_POSIX_MAPPED_FILES -lmingw32_extended -lclblast -lOpenCL' CFLAGS='-D_POSIX_MAPPED_FILES -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -mfma -mf16c -mavx -mavx2' CXXFLAGS='-D_POSIX_MAPPED_FILES -I. -I./examples -I./common -I/mingw64/include/CL -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function'

Extra packages I needed: mingw-w64-x86_64-clblast, mingw-w64-x86_64-opencl-headers, mingw-w64-x86_64-opencl-icd

ldd on quantize.exe after a successful build:

Admin@nidhogg MINGW64 ~/llama.cpp
$ ldd ./quantize.exe
ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ff81e190000)
KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ff81caa0000)
KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ff81b700000)
msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7ff81cd40000)
libgcc_s_seh-1.dll => /mingw64/bin/libgcc_s_seh-1.dll (0x7ff80d3b0000)
OpenCL.dll => /c/WINDOWS/SYSTEM32/OpenCL.dll (0x7fffec660000)
libclblast.dll => /mingw64/bin/libclblast.dll (0x7fff87d00000)
combase.dll => /c/WINDOWS/System32/combase.dll (0x7ff81dd40000)
libwinpthread-1.dll => /mingw64/bin/libwinpthread-1.dll (0x7ff817580000)
ucrtbase.dll => /c/WINDOWS/System32/ucrtbase.dll (0x7ff81bab0000)
RPCRT4.dll => /c/WINDOWS/System32/RPCRT4.dll (0x7ff81cb70000)
libstdc++-6.dll => /mingw64/bin/libstdc++-6.dll (0x26f583d0000)
ADVAPI32.dll => /c/WINDOWS/System32/ADVAPI32.dll (0x7ff81c9f0000)
libstdc++-6.dll => /mingw64/bin/libstdc++-6.dll (0x7fffcfea0000)
sechost.dll => /c/WINDOWS/System32/sechost.dll (0x7ff81da30000)
ole32.dll => /c/WINDOWS/System32/ole32.dll (0x7ff81c770000)
msvcp_win.dll => /c/WINDOWS/System32/msvcp_win.dll (0x7ff81b660000)
CFGMGR32.dll => /c/WINDOWS/SYSTEM32/CFGMGR32.dll (0x7ff81b230000)
GDI32.dll => /c/WINDOWS/System32/GDI32.dll (0x7ff81e120000)
win32u.dll => /c/WINDOWS/System32/win32u.dll (0x7ff81b5b0000)
gdi32full.dll => /c/WINDOWS/System32/gdi32full.dll (0x7ff81bbd0000)
USER32.dll => /c/WINDOWS/System32/USER32.dll (0x7ff81cdf0000)

Exciting times in open source these days!

There is an very easy way to build on windows using mingw32 compilation in msys2.

  1. Download msys2-x86_64-20230318 from https://www.msys2.org/
  2. Open the file click next, next, wait for install to complete, then press finish
  3. Run C:\msys64\mingw64.exe
  4. Write the commands to install the appropriate files:
    pacman -S git
    pacman -S mingw-w64-x86_64-gcc
    pacman -S make
  5. Clone library for POSIX functions that llama.cpp needs:
    git clone https://github.com/CoderRC/libmingw32_extended.git
    cd libmingw32_extended
  6. Build the library:
    mkdir build
    cd build
    ../configure
    make
  7. Install the library:
    make install
  8. Change directory:
    cd ~
  9. Clone llama.cpp:
    git clone https://github.com/ggerganov/llama.cpp
    cd llama.cpp
  10. Build llama.cpp:
    make LDFLAGS='-D_POSIX_MAPPED_FILES -lmingw32_extended' CFLAGS='-D_POSIX_MAPPED_FILES -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -mfma -mf16c -mavx -mavx2' CXXFLAGS='-D_POSIX_MAPPED_FILES -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function'

@0wwafa
Copy link

0wwafa commented May 20, 2024

This works.

git clone --recurse-submodules https://github.com/ggerganov/llama.cpp
export CC=gcc
export CPP=g++
export LDFLAGS='-D_POSIX_MAPPED_FILES -DLLAMA_NATIVE=ON -DLLAMA_BUILD_SERVER=ON -DBUILD_SHARED_LIBS=ON -DLLMODEL_CUDA=OFF -static'
git reset --hard
git clean -fd
git pull
cd llama.cpp
mingw32-make.exe -j 6

@arcadiancomp
Copy link

arcadiancomp commented May 20, 2024 via email

@zjhken
Copy link

zjhken commented Feb 8, 2025

At this point, there's support for CMake. The Python segments of the README should basically be the same. Once you install it, you can run

cmake -S . -B build/ -D CMAKE_BUILD_TYPE=Release

cmake --build build/ --config Release

I'm not actually sure if you need CMAKE_BUILD_TYPE=Release for the first command, but it ran for me.

Afterwards, the exe files should be in the build/Release folder, and you can call them in place of ./quantize and ./main

.\build\Release\quantize.exe .\models\7B\ggml-model-f16.bin .\models\7B\ggml-model-q4_0.bin 2

.\build\Release\llama.exe -m .\models\7B\ggml-model-q4_0.bin -t 8 -n 128

The current README points to a shell script for quantizing, but you can refer to an older version of the README for manual instructions.

Thank you @cgcha , my CPU is very old and only support AVX, doesn't support AVX2, so I have to compile it on my Windows PC.

I test it today, the CMAKE_BUILD_TYPE is not necessary. Instead, if you want to utilize your Nvidia GPU like me, do the following:

  • Install CUDA 12, and use nvcc --version to see whether it's install successfully.
  • add parameter to cmake to let it know you want to compile with CUDA feature
cmake -S . -B build/ -D LLAMA_CUBLAS=on
cmake --build build/ --config Release

This took much longer to compile then the pure CPU one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers windows Issues specific to Windows
Projects
None yet
Development

No branches or pull requests