-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuBLAS fails on same model - terminate called after throwing an instance of 'std::runtime_error' what(): unexpectedly reached end of file #1620
Comments
If you got it from a source the one that starts with an H, the first thing would be to check if the SHA256 matches the file you downloaded. It's possible the file you have is truncated/corrupt. llama.cpp |
I have my doubts on the error report. I'd like to see a fresh compiled CPU version that "works fine" with your model. |
I also had this error and resolved it with help of CRD716 on discord. Essentially the model is too old. Yes, I mean you from six-weeks-ago. |
I get the same error even on a CPU. I picked up an earlier code from April and it works fine but the new code does not. @asctime which ggml model should we use then? q4_0 doesn't work? In that case, the readme front page needs to change. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Hello, I tried to see if anyone else had this issue but closest was #1596 and seems different in my situation.
Crashes only for GPU cuBLAS
make LLAMA_CUBLAS=1
Makefile edited NVCCFLAGS = --forward-unknown-to-host-compiler --gpu-architecture=sm_86
3 ./main -m ./models/ggml-vicuna-13b-1.1-q4_1.bin -p "Building a website can be done in 10 simple steps:" -n 512
main: build = 0 (unknown)
main: seed = 1685251133
llama.cpp: loading model from ./models/ggml-vicuna-13b-1.1-q4_1.bin
terminate called after throwing an instance of 'std::runtime_error'
what(): unexpectedly reached end of file
Aborted (core dumped)
root@netaisyslog:~/app/app/netai_llm#
But for everything else same, make clean and plain make - No GPU
it works fine.
Any guidance on how to go about figuring out what is going on here ?
thanks
Background
nvidia-smi
nvidia-smi
Sun May 28 05:24:26 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.05 Driver Version: 525.85.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A40-4Q On | 00000000:04:00.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 0MiB / 4096MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
root@netaisyslog:~/app/app/netai_llm#
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
root@netaisyslog:~/app/app/netai_llm#
Makefile
cat Makefile
Define the default target now so that it is always the first target
BUILD_TARGETS = main quantize quantize-stats perplexity embedding vdot
ifdef LLAMA_BUILD_SERVER
BUILD_TARGETS += server
endif
default: $(BUILD_TARGETS)
ifndef UNAME_S
UNAME_S := $(shell uname -s)
endif
ifndef UNAME_P
UNAME_P := $(shell uname -p)
endif
ifndef UNAME_M
UNAME_M := $(shell uname -m)
endif
CCV :=$(shell $ (CC) --version | head -n 1)$(shell $ (CXX) --version | head -n 1)
CXXV :=
Mac OS + Arm can report x86_64
ref: ggerganov/whisper.cpp#66 (comment)
ifeq ($(UNAME_S),Darwin)
ifneq ($(UNAME_P),arm)
SYSCTL_M := $(shell sysctl -n hw.optional.arm64 2>/dev/null)
ifeq ($(SYSCTL_M),1)
# UNAME_P := arm
# UNAME_M := arm64
warn := $(warning Your arch is announced as x86_64, but it seems to actually be ARM64. Not fixing that can lead to bad performance. For more info see: https://github.com/ggerganov/whisper.cpp/issues/66\#issuecomment-1282546789)
endif
endif
endif
Compile flags
keep standard at C11 and C++11
CFLAGS = -I. -O3 -std=c11 -fPIC
CXXFLAGS = -I. -I./examples -O3 -std=c++11 -fPIC
LDFLAGS =
ifndef LLAMA_DEBUG
CFLAGS += -DNDEBUG
CXXFLAGS += -DNDEBUG
endif
warnings
CFLAGS += -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith
CXXFLAGS += -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar
OS specific
TODO: support Windows
ifeq ($(UNAME_S),Linux)
CFLAGS += -pthread
CXXFLAGS += -pthread
endif
ifeq ($(UNAME_S),Darwin)
CFLAGS += -pthread
CXXFLAGS += -pthread
endif
ifeq ($(UNAME_S),FreeBSD)
CFLAGS += -pthread
CXXFLAGS += -pthread
endif
ifeq ($(UNAME_S),NetBSD)
CFLAGS += -pthread
CXXFLAGS += -pthread
endif
ifeq ($(UNAME_S),OpenBSD)
CFLAGS += -pthread
CXXFLAGS += -pthread
endif
ifeq ($(UNAME_S),Haiku)
CFLAGS += -pthread
CXXFLAGS += -pthread
endif
ifdef LLAMA_GPROF
CFLAGS += -pg
CXXFLAGS += -pg
endif
ifdef LLAMA_PERF
CFLAGS += -DGGML_PERF
CXXFLAGS += -DGGML_PERF
endif
Architecture specific
TODO: probably these flags need to be tweaked on some architectures
feel free to update the Makefile for your architecture and send a pull request or issue
ifeq ($(UNAME_M),$ (filter $(UNAME_M),x86_64 i686))
# Use all CPU extensions that are available:
CFLAGS += -march=native -mtune=native
CXXFLAGS += -march=native -mtune=native
endif$(filter ppc64%,$ (UNAME_M)),)
ifneq (
POWER9_M := $(shell grep "POWER9" /proc/cpuinfo)
ifneq (,$(findstring POWER9,$(POWER9_M)))
CFLAGS += -mcpu=power9
CXXFLAGS += -mcpu=power9
endif
# Require c++23's std::byteswap for big-endian support.
ifeq ($(UNAME_M),ppc64)
CXXFLAGS += -std=c++23 -DGGML_BIG_ENDIAN
endif
endif
ifndef LLAMA_NO_ACCELERATE
# Mac M1 - include Accelerate framework.
#
-framework Accelerate
works on Mac Intel as well, with negliable performance boost (as of the predict time).ifeq ($(UNAME_S),Darwin)
CFLAGS += -DGGML_USE_ACCELERATE
LDFLAGS += -framework Accelerate
endif
endif
ifdef LLAMA_OPENBLAS
CFLAGS += -DGGML_USE_OPENBLAS -I/usr/local/include/openblas -I/usr/include/openblas
ifneq ($(shell grep -e "Arch Linux" -e "ID_LIKE=arch" /etc/os-release 2>/dev/null),)
LDFLAGS += -lopenblas -lcblas
else
LDFLAGS += -lopenblas
endif
endif
ifdef LLAMA_BLIS
CFLAGS += -DGGML_USE_OPENBLAS -I/usr/local/include/blis -I/usr/include/blis
LDFLAGS += -lblis -L/usr/local/lib
endif
ifdef LLAMA_CUBLAS
CFLAGS += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include
CXXFLAGS += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include
LDFLAGS += -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/x86_64-linux/lib
OBJS += ggml-cuda.o
NVCC = nvcc
NVCCFLAGS = --forward-unknown-to-host-compiler -arch=native
ifdef LLAMA_CUDA_DMMV_X
$(NVCC) $ (NVCCFLAGS) $(CXXFLAGS) -Wno-pedantic -c $ < -o $@
$(CXX) $ (CXXFLAGS) -c $< -o $ @$(filter aarch64%,$ (UNAME_M)),)$(filter armv6%,$ (UNAME_M)),)$(filter armv7%,$ (UNAME_M)),)$(filter armv8%,$ (UNAME_M)),)
NVCCFLAGS += -DGGML_CUDA_DMMV_X=$(LLAMA_CUDA_DMMV_X)
else
NVCCFLAGS += -DGGML_CUDA_DMMV_X=32
endif # LLAMA_CUDA_DMMV_X
ifdef LLAMA_CUDA_DMMV_Y
NVCCFLAGS += -DGGML_CUDA_DMMV_Y=$(LLAMA_CUDA_DMMV_Y)
else
NVCCFLAGS += -DGGML_CUDA_DMMV_Y=1
endif # LLAMA_CUDA_DMMV_Y
ggml-cuda.o: ggml-cuda.cu ggml-cuda.h
endif # LLAMA_CUBLAS
ifdef LLAMA_CLBLAST
CFLAGS += -DGGML_USE_CLBLAST
CXXFLAGS += -DGGML_USE_CLBLAST
# Mac provides OpenCL as a framework
ifeq ($(UNAME_S),Darwin)
LDFLAGS += -lclblast -framework OpenCL
else
LDFLAGS += -lclblast -lOpenCL
endif
OBJS += ggml-opencl.o
ggml-opencl.o: ggml-opencl.cpp ggml-opencl.h
endif
ifneq (
# Apple M1, M2, etc.
# Raspberry Pi 3, 4, Zero 2 (64-bit)
CFLAGS += -mcpu=native
CXXFLAGS += -mcpu=native
endif
ifneq (
# Raspberry Pi 1, Zero
CFLAGS += -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access
endif
ifneq (
# Raspberry Pi 2
CFLAGS += -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations
endif
ifneq (
# Raspberry Pi 3, 4, Zero 2 (32-bit)
CFLAGS += -mfp16-format=ieee -mno-unaligned-access
endif
Print build information
$(info I llama.cpp build info: )
$(info I UNAME_S: $ (UNAME_S))
$(info I UNAME_P: $ (UNAME_P))
$(info I UNAME_M: $ (UNAME_M))
$(info I CFLAGS: $ (CFLAGS))
$(info I CXXFLAGS: $ (CXXFLAGS))
$(info I LDFLAGS: $ (LDFLAGS))
$(info I CC: $ (CCV))
$(info I CXX: $ (CXXV))
$(info )
Build library
ggml.o: ggml.c ggml.h ggml-cuda.h
$(CC) $ (CFLAGS) -c $< -o $ @
llama.o: llama.cpp ggml.h ggml-cuda.h llama.h llama-util.h
$(CXX) $ (CXXFLAGS) -c $< -o $ @
common.o: examples/common.cpp examples/common.h
$(CXX) $ (CXXFLAGS) -c $< -o $ @
libllama.so: llama.o ggml.o $(OBJS)
$(CXX) $ (CXXFLAGS) -shared -fPIC -o $@ $ ^ $(LDFLAGS)
clean:
rm -vf *.o main quantize quantize-stats perplexity embedding benchmark-matmult save-load-state server vdot build-info.h
Examples
main: examples/main/main.cpp build-info.h ggml.o llama.o common.o $(OBJS)
$(CXX) $ (CXXFLAGS) $(filter-out %.h,$ ^) -o $@ $ (LDFLAGS)
@echo
@echo '==== Run ./main -h for help. ===='
@echo
quantize: examples/quantize/quantize.cpp build-info.h ggml.o llama.o $(OBJS)
$(CXX) $ (CXXFLAGS) $(filter-out %.h,$ ^) -o $@ $ (LDFLAGS)
quantize-stats: examples/quantize-stats/quantize-stats.cpp build-info.h ggml.o llama.o $(OBJS)
$(CXX) $ (CXXFLAGS) $(filter-out %.h,$ ^) -o $@ $ (LDFLAGS)
perplexity: examples/perplexity/perplexity.cpp build-info.h ggml.o llama.o common.o $(OBJS)
$(CXX) $ (CXXFLAGS) $(filter-out %.h,$ ^) -o $@ $ (LDFLAGS)
embedding: examples/embedding/embedding.cpp build-info.h ggml.o llama.o common.o $(OBJS)
$(CXX) $ (CXXFLAGS) $(filter-out %.h,$ ^) -o $@ $ (LDFLAGS)
save-load-state: examples/save-load-state/save-load-state.cpp build-info.h ggml.o llama.o common.o $(OBJS)
$(CXX) $ (CXXFLAGS) $(filter-out %.h,$ ^) -o $@ $ (LDFLAGS)
server: examples/server/server.cpp examples/server/httplib.h examples/server/json.hpp build-info.h ggml.o llama.o common.o $(OBJS)
$(CXX) $ (CXXFLAGS) -Iexamples/server $(filter-out %.h,$ (filter-out %.hpp,$^)) -o $@ $ (LDFLAGS)
build-info.h: $(wildcard .git/index) scripts/build-info.sh$@.tmp $ @; then $@.tmp $ @;
@sh scripts/build-info.sh > $@.tmp
@if ! cmp -s
mv
else
rm $@.tmp;
fi
Tests
benchmark-matmult: examples/benchmark/benchmark-matmult.cpp build-info.h ggml.o $(OBJS)
$(CXX) $ (CXXFLAGS) $(filter-out %.h,$ ^) -o $@ $ (LDFLAGS)
./$@
vdot: pocs/vdot/vdot.cpp ggml.o $(OBJS)
$(CXX) $ (CXXFLAGS) $^ -o $ @ $(LDFLAGS)
.PHONY: tests clean
tests:
bash ./tests/run-tests.sh
root@netaisyslog:~/app/app/netai_llm#
The text was updated successfully, but these errors were encountered: