Getting SIGSEGV with llama backend #973

iamjackg · 2023-08-29T03:57:35Z

LocalAI version:
v1.25.0

Environment, CPU architecture, OS, and Version:
Linux hostname 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

13th Gen Intel(R) Core(TM) i7-13700KF
3070 Ti

Describe the bug
Trying to run any GGUF model with the llama backend results in SIGSEGV as soon as the model tries to load. (output in Logs section)

Note that running the main binary of llama.cpp from LocalAI/go-llama/build/bin/ runs totally fine, e.g.

./main -t 6 --low-vram -m ~/gits/llama.cpp/models/phind-codellama-34b-v1.Q4_K_M.gguf --temp 0 -ngl 14 --color --rope-freq-base 1e6 -p $'# this python function determines whether an object is JSON-serializable or not, without using json.dumps\ndef is_json_serializable(thing):'

To Reproduce

Any request seems to do this. I tried with both codellama-13b-python.Q4_K_S.gguf and phind-codellama-34b-v1.Q4_K_M.gguf for good measure. Both work when running llama.cpp directly.

Expected behavior

Logs

11:50PM DBG Loading GRPC Model llama: {backendString:llama model:codellama-13b-python.Q4_K_S.gguf threads:6 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0002dc000 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false}
11:50PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama
11:50PM DBG GRPC Service for codellama-13b-python.Q4_K_S.gguf will be running at: '127.0.0.1:38915'
11:50PM DBG GRPC Service state dir: /tmp/go-processmanager2204529450
11:50PM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38915: connect: connection refused"
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr 2023/08/28 23:50:18 gRPC Server listening at 127.0.0.1:38915
11:50PM DBG GRPC Service Ready
11:50PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:codellama-13b-python.Q4_K_S.gguf ContextSize:4096 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:14 MainGPU: TensorSplit: Threads:6 LibrarySearchPath: RopeFreqBase:1e+06 RopeFreqScale:1 RMSNormEps:0 NGQA:0 ModelFile:/home/jack/gits/llama.cpp/models/codellama-13b-python.Q4_K_S.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false}
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr create_gpt_params: loading model /home/jack/gits/llama.cpp/models/codellama-13b-python.Q4_K_S.gguf
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr SIGSEGV: segmentation violation
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr PC=0x7f5937be9fbd m=5 sigcode=1
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr signal arrived during cgo execution
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr 
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr goroutine 34 [syscall]:
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr runtime.cgocall(0x81cbd0, 0xc0001815f0)
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr 	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0001815c8 sp=0xc000181590 pc=0x4161cb
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0x7f58b4000b70, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xe, 0x200, ...)
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr 	_cgo_gotypes.go:266 +0x4c fp=0xc0001815f0 sp=0xc0001815c8 pc=0x8131ac
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr github.com/go-skynet/go-llama%2ecpp.New({0xc0002a8000, 0x41}, {0xc000110700, 0x8, 0x926a60?})
11:50PM DBG GRPC(codellama-13b-python.Q4_K_S.gguf-127.0.0.1:38915): stderr 	/home/jack/gits/LocalAI/go-llama/llama.go:39 +0x3aa fp=0xc0001817f0 sp=0xc0001815f0 pc=0x813a6a
[...]

The text was updated successfully, but these errors were encountered:

coreywagehoft · 2023-08-29T14:11:06Z

I was going to submit an issue for this as well. I am getting the same error with similar codellama models in the gguf format with version 1.25.0

racerxdl · 2023-08-30T03:37:06Z

Same here :(

SVerkuil · 2023-08-30T09:21:27Z

I have similar problems on all GGUF files at the moment.

See below some additional logs if they can be of help.

9:16AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:phind-codellama-34b-v2.Q5_K_M.gguf ContextSize:4096 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:true Embeddings:false NUMA:false NGPULayers:32 MainGPU: TensorSplit: Threads:14 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/phind-codellama-34b-v2.Q5_K_M.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false}
9:16AM DBG GRPC(phind-codellama-34b-v2.Q5_K_M.gguf-127.0.0.1:43215): stderr create_gpt_params: loading model /models/phind-codellama-34b-v2.Q5_K_M.gguf
9:16AM DBG GRPC(phind-codellama-34b-v2.Q5_K_M.gguf-127.0.0.1:43215): stderr SIGSEGV: segmentation violation
9:16AM DBG GRPC(phind-codellama-34b-v2.Q5_K_M.gguf-127.0.0.1:43215): stderr PC=0x7fc7452f8789 m=0 sigcode=128
9:16AM DBG GRPC(phind-codellama-34b-v2.Q5_K_M.gguf-127.0.0.1:43215): stderr signal arrived during cgo execution

guidevops · 2023-08-30T23:09:49Z

change backend from llama to llama-stable work for me

iamjackg · 2023-08-30T23:32:24Z

The lama-stable backend doesn't support GGUF models though, does it?

johndpope · 2023-08-31T01:41:06Z

I suspect this PR may fix things. The llama.cpp need a bump to get working with gguf - so golang would be behind. #977

iamjackg · 2023-08-31T03:30:25Z

Everything had already been bumped for v1.25, which split the llama backend into llama (GGUF support) and llama-stable (GGML support). That PR is just an automated bump to the latest version, and is unrelated to this issue.

kratosok · 2023-09-07T22:42:32Z

Your mileage may vary, but I ran into the SIGSEGV issue with the current(as of last night) docker image. Building locally, or rebuilding the container image has taken care of the problem in my case.

iamjackg · 2023-09-07T22:52:50Z

Unfortunately I'm also building locally.

jadams · 2023-09-08T18:36:36Z

Having the same problem here with wizardcoder-python-34b-v1.0.Q4_K_M.gguf

Tried prebuilt docker tags v1.25.0, master, and rebuilding

Log snippet:

[...]
6:27PM DBG GRPC(wizardcoder-python-34b-v1.0.Q4_K_M.gguf-127.0.0.1:38501): stderr create_gpt_params: loading model /models/wizardcoder-python-34b-v1.0.Q4_K_M.gguf
6:27PM DBG GRPC(wizardcoder-python-34b-v1.0.Q4_K_M.gguf-127.0.0.1:38501): stderr SIGSEGV: segmentation violation
6:27PM DBG GRPC(wizardcoder-python-34b-v1.0.Q4_K_M.gguf-127.0.0.1:38501): stderr PC=0x7fc1bb667789 m=5 sigcode=1
6:27PM DBG GRPC(wizardcoder-python-34b-v1.0.Q4_K_M.gguf-127.0.0.1:38501): stderr signal arrived during cgo execution
6:27PM DBG GRPC(wizardcoder-python-34b-v1.0.Q4_K_M.gguf-127.0.0.1:38501): stderr
6:27PM DBG GRPC(wizardcoder-python-34b-v1.0.Q4_K_M.gguf-127.0.0.1:38501): stderr goroutine 35 [syscall]:
6:27PM DBG GRPC(wizardcoder-python-34b-v1.0.Q4_K_M.gguf-127.0.0.1:38501): stderr runtime.cgocall(0x820470, 0xc0002ad580)
6:27PM DBG GRPC(wizardcoder-python-34b-v1.0.Q4_K_M.gguf-127.0.0.1:38501): stderr 	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0002ad558 sp=0xc0002ad520 pc=0x417e6b
6:27PM DBG GRPC(wizardcoder-python-34b-v1.0.Q4_K_M.gguf-127.0.0.1:38501): stderr github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0x7fc134000b60, 0x800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x200, ...)
[...]

Works perfectly fine when running

/build/go-llama/build/bin/main -t 10 -ngl 32 -m /models/wizardcoder-python-34b-v1.0.Q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"

sandros94 · 2023-09-09T11:09:20Z

@jadams could you share your nvidia-smi and nvcc --version?

I'm facing this issue too but I'm getting segmentation fault when trying to use llama.cpp directly.

While checking my nvidia-smi and nvcc --version I noticed the first is reporting cuda v12.2, while the second is trying to use 12.1. I'm almost sure this shouldn't be the problem but I'm starting to consider everything.

Testing v1.25.0 and master -cuda12 containerized images on a windows machine with a gtx 1660 ti.

For reference, (after rebuild) tested with:

/build/go-llama/build/bin/main -t 8 -ngl 1 -lv -m /models/orca_mini_v3_7b.Q6_K.gguf --color -c 512 --temp 0.7 -p "### Instruction: Write a story about llamas\n### Response:"

jadams · 2023-09-09T15:38:57Z

nvidia-smi:

user@machine:~$ kubectl exec -it localai-local-ai-5c745dfdbd-7n7ls -- nvidia-smi
Defaulted container "localai-local-ai" out of: localai-local-ai, download-model (init)
Sat Sep  9 15:33:00 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100S-PCIE-32GB          On  | 00000000:B0:00.0 Off |                    0 |
| N/A   24C    P0              22W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

nvcc --version

user@machine:~$ kubectl exec -it localai-local-ai-5c745dfdbd-7n7ls -- nvcc --version
Defaulted container "localai-local-ai" out of: localai-local-ai, download-model (init)
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

seems to be the same as you, cuda 12.2 on nvidia-smi and cuda 12.1 on nvcc

flotos · 2023-09-12T13:42:14Z

I am getting the same issue. Orca-mini 3b works but not other models, doing inference on CPU only

Llamatron2112 · 2023-09-19T06:39:38Z

My nvcc and nvidia-smi cuda versions match, but I have a similar output with SIGSEGV when trying to load a GGUF model

Dbone29 · 2023-10-17T16:13:23Z

I think the problem was something related to gguf v2 format. Can you test it with localai 1.30? Maybe it is fixed ...

iamjackg added the bug Something isn't working label Aug 29, 2023

iamjackg assigned mudler Aug 29, 2023

mudler mentioned this issue May 3, 2024

feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants #2232

Merged

mudler closed this as completed in #2232 May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting SIGSEGV with llama backend #973

Getting SIGSEGV with llama backend #973

iamjackg commented Aug 29, 2023 •

edited

Loading

coreywagehoft commented Aug 29, 2023

racerxdl commented Aug 30, 2023

SVerkuil commented Aug 30, 2023

guidevops commented Aug 30, 2023

iamjackg commented Aug 30, 2023

johndpope commented Aug 31, 2023

iamjackg commented Aug 31, 2023 •

edited

Loading

kratosok commented Sep 7, 2023

iamjackg commented Sep 7, 2023

jadams commented Sep 8, 2023 •

edited

Loading

sandros94 commented Sep 9, 2023 •

edited

Loading

jadams commented Sep 9, 2023

flotos commented Sep 12, 2023 •

edited

Loading

Llamatron2112 commented Sep 19, 2023

Dbone29 commented Oct 17, 2023

Getting SIGSEGV with llama backend #973

Getting SIGSEGV with llama backend #973

Comments

iamjackg commented Aug 29, 2023 • edited Loading

coreywagehoft commented Aug 29, 2023

racerxdl commented Aug 30, 2023

SVerkuil commented Aug 30, 2023

guidevops commented Aug 30, 2023

iamjackg commented Aug 30, 2023

johndpope commented Aug 31, 2023

iamjackg commented Aug 31, 2023 • edited Loading

kratosok commented Sep 7, 2023

iamjackg commented Sep 7, 2023

jadams commented Sep 8, 2023 • edited Loading

sandros94 commented Sep 9, 2023 • edited Loading

jadams commented Sep 9, 2023

flotos commented Sep 12, 2023 • edited Loading

Llamatron2112 commented Sep 19, 2023

Dbone29 commented Oct 17, 2023

iamjackg commented Aug 29, 2023 •

edited

Loading

iamjackg commented Aug 31, 2023 •

edited

Loading

jadams commented Sep 8, 2023 •

edited

Loading

sandros94 commented Sep 9, 2023 •

edited

Loading

flotos commented Sep 12, 2023 •

edited

Loading