Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Fail to compile after commit 202084d31d4247764fc6d6d40d2e2bda0c89a73a #9554

Closed
AntonioLucibello opened this issue Sep 19, 2024 · 5 comments · Fixed by #9562
Closed

Bug: Fail to compile after commit 202084d31d4247764fc6d6d40d2e2bda0c89a73a #9554

AntonioLucibello opened this issue Sep 19, 2024 · 5 comments · Fixed by #9562
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)

Comments

@AntonioLucibello
Copy link

AntonioLucibello commented Sep 19, 2024

What happened?

Compilation fails on CUDA 11 on any version after (and including) commit 202084d, which I've tracked down via git bisect.
In case it may be useful, this is the output of nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

Operating system is Pop!_OS jammy 22.04 x86_64

The build command used is:

make -j GGML_CUDA=1 GGML_CUDA_MMV_Y=2 GGML_DISABLE_LOGS=1 CUDA_DOCKER_ARCH=sm_86

from a clean directory. Compilation fails independently of me setting GGML_CUDA_MMV_Y=2.

Name and Version

version: 3694 (202084d)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

No response

Relevant log output

c++ -std=c++11 -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -I/usr/local/cuda/include -I/usr/local/cuda/targets/x86_64-linux/include -DGGML_CUDA_USE_GRAPHS  examples/deprecation-warning/deprecation-warning.o -o server -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/usr/lib64 -L/usr/local/cuda/targets/x86_64-linux/lib -L/usr/local/cuda/lib64/stubs -L/usr/lib/wsl/lib
ggml/src/ggml-cuda.cu(2444): warning #177-D: function "set_ggml_graph_node_properties" was declared but never referenced

ggml/src/ggml-cuda.cu(2456): warning #177-D: function "ggml_graph_node_has_matching_properties" was declared but never referenced

NOTICE: The 'server' binary is deprecated. Please use 'llama-server' instead.
NOTICE: The 'main' binary is deprecated. Please use 'llama-cli' instead.
ggml/src/ggml-cuda.cu: In function ‘bool ggml_backend_cuda_register_host_buffer(void*, size_t)’:
ggml/src/ggml-cuda.cu:3089:51: warning: unused parameter ‘buffer’ [-Wunused-parameter]
 3089 | GGML_CALL bool ggml_backend_cuda_register_host_buffer(void * buffer, size_t size) {
      |                                             ~~~~~~^~~~~~
ggml/src/ggml-cuda.cu:3089:66: warning: unused parameter ‘size’ [-Wunused-parameter]
 3089 | GGML_CALL bool ggml_backend_cuda_register_host_buffer(void * buffer, size_t size) {
      |                                                           ~~~~~~~^~~~
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
make: *** [Makefile:738: ggml/src/ggml-cuda/sum.o] Error 1
make: *** Waiting for unfinished jobs....
@AntonioLucibello AntonioLucibello added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Sep 19, 2024
@max-krasnyansky
Copy link
Collaborator

+1 here. Thanks for creating the issue.
I ran into this exact error yesterday and was going to create one as well.

llama.cpp-master$ cmake -G Ninja -B build-cuda-tp -D GGML_CUDA=ON -D GGML_OPENMP=OFF
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Using llamafile
-- Found CUDAToolkit: /usr/include (found version "11.5.119") 
-- CUDA found
-- Using CUDA architectures: 52;61;70;75
-- The CUDA compiler identification is NVIDIA 11.5.119
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- CUDA host compiler is GNU 11.4.0
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (3.4s)
-- Generating done (0.1s)

 $ cmake --build build-cuda-tp/
[1/80] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/sum.cu.o
FAILED: ggml/src/CMakeFiles/ggml.dir/ggml-cuda/sum.cu.o 
ccache /usr/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BUILD -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_EXPORTS -I/home/maxk/src/llama.cpp-master/ggml/src/../include -I/home/maxk/src/llama.cpp-master/ggml/src/. -O3 -DNDEBUG -std=c++11 --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT ggml/src/CMakeFiles/ggml.dir/ggml-cuda/sum.cu.o -MF ggml/src/CMakeFiles/ggml.dir/ggml-cuda/sum.cu.o.d -x cu -c /home/maxk/src/llama.cpp-master/ggml/src/ggml-cuda/sum.cu -o ggml/src/CMakeFiles/ggml.dir/ggml-cuda/sum.cu.o
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’

@slaren
Copy link
Collaborator

slaren commented Sep 19, 2024

The easiest solution would be to use a more recent version of CUDA. The oldest version tested in the CI is 11.7, older versions are not actively supported, but patches to improve compatibility are always welcome.

@max-krasnyansky
Copy link
Collaborator

Sounds good to me.
I'm going to upgrade to Ubuntu 24.04 (the above is 22.04 with all latest in apt repos), bump up all versions and retest.

@JohannesGaessler
Copy link
Collaborator

Please confirm whether or not this fix works: #9562

@AntonioLucibello
Copy link
Author

AntonioLucibello commented Sep 20, 2024

(Edit: my bad, hadn't checked out the fix branch)

Yes, it compiles properly with the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants