Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: cannot create std::vector larger than max_size() #9391

Open
imhoffman opened this issue Sep 9, 2024 · 10 comments
Open

Bug: cannot create std::vector larger than max_size() #9391

imhoffman opened this issue Sep 9, 2024 · 10 comments
Labels
bug Something isn't working medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)

Comments

@imhoffman
Copy link

imhoffman commented Sep 9, 2024

What happened?

My usual build recipe and run scripts do not work after b3680. Something changed in b3681, but I don't know what.
I see this same failure across models and cli flags, so it seems to be deeper than a single feature choice, so I have excluded the launch script.

This is the actual error:

...
terminate called after throwing an instance of 'std::length_error'
  what():  cannot create std::vector larger than max_size()
<launch script name> Aborted                 (core dumped)

Here is what the binary reports at runtime:

system_info: n_threads = 24 (n_threads_batch = 24) / 48 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
main: interactive mode on.

Here is how I configure the build:

cmake -DGGML_AVX=ON -DGGML_AVX2=ON -DBUILD_SHARED_LIBS=ON -DGGML_CUDA=ON -DGGML_CUDA_F16=ON -DGGML_F16C=ON -DCMAKE_C_COMPILER=gcc-12 -DCMAKE_CXX_COMPILER=g++-12 -DCMAKE_CUDA_FLAGS='-ccbin=gcc-12' -DCMAKE_INSTALL_PREFIX=/opt/llama ..

and some other system info:

$ lscpu | grep "Model name:"
Model name:                           Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
$ uname -srv
Linux 6.10.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 19 Aug 2024 17:02:39 +0000
$ cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  550.107.02  Wed Jul 24 23:53:00 UTC 2024
GCC version:  gcc version 14.2.1 20240805 (GCC) 
$ gcc-12 --version
gcc-12 (GCC) 12.3.0

Name and Version

$ /opt/llama/bin/llama-cli --version
version: 3681 (df270ef)
built with gcc-12 (GCC) 12.3.0 for x86_64-pc-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

No response

@imhoffman imhoffman added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Sep 9, 2024
@ggerganov
Copy link
Member

It's likely something related to the sampling, but without the actual command or stacktrace it's hard to say what's wrong

@imhoffman
Copy link
Author

This fails the same way for a variety of input models and cli options, but I can certainly provide one of them in detail.
And, how would you like me to produce the stacktrace?

@imhoffman
Copy link
Author

Here is the launch script:

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/llama/lib CUDA_VISIBLE_DEVICES=0,1,2 OMP_NUM_THREADS=
48 OMP_PROC_BIND=spread OMP_PLACES=cores /opt/llama/bin/llama-cli \
  --color \
  --threads 48 \
  --n-predict -1 \
  --ctx-size 8192 \
  --batch-size 32 --cont-batching \
  --parallel 48 --sequences 48 \
  --temp 0.95 --dynatemp-range 0.175 \
  --gpu-layers 77 \
  --repeat-last-n -1 --repeat-penalty 1.10 \
  --model /opt/llama/models/Meta-Llama-3.1-70B-Instruct-Q5_K_S.gguf \
  --conversation \
  --file <local prompt text file> \
  --keep -1 \
  --reverse-prompt "Prompter:" \
  --log-enable

then llama.log is an empty file.

@imhoffman
Copy link
Author

Here is all that I can get so far out of the core dump from gdb:

...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `/opt/llama/bin/llama-cli --color --threads 48 --n-predict -1 --ctx-size 8192 --'.
Program terminated with signal SIGABRT, Aborted.
#0  0x0000771d2a4a53f4 in ?? () from /usr/lib/libc.so.6

@imhoffman
Copy link
Author

And, yeah, here is the fail at the sampler:

...
#5  0x0000771d2a69752a in std::terminate () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
No locals.
#6  0x0000771d2a6ae2b6 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x771d2a876da8 <typeinfo for std::length_error>, dest=0x771d2a6c57c0 <std::length_error::~length_error()>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:98
        globals = <optimized out>
        header = 0x55e43cf6ce10
#7  0x0000771d2a69b247 in std::__throw_length_error (__s=0x771d42ab32b0 "cannot create std::vector larger than max_size()") at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/functexcept.cc:82
No locals.
#8  0x0000771d42a7f7a9 in llama_sampler_init_penalties () from /opt/llama/lib/libllama.so
No symbol table info available.
#9  0x000055e40eb4e9fd in gpt_sampler_init(llama_model const*, gpt_sampler_params const&) ()
No symbol table info available.
#10 0x000055e40eafd029 in main ()
No symbol table info available.

@slaren
Copy link
Member

slaren commented Sep 9, 2024

It's a bug. In the meanwhile, you can replace --repeat-last-n -1 with --repeat-last-n 0.

@slaren slaren added bug Something isn't working and removed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Sep 9, 2024
@Gryphe
Copy link

Gryphe commented Sep 9, 2024

It's a bug. In the meanwhile, you can replace --repeat-last-n -1 with --repeat-last-n 0.

I can confirm this fixes the crash, but it appears samplers no longer function on llama-server. Every time I regenerate a response, it's exactly the same.

@slaren
Copy link
Member

slaren commented Sep 9, 2024

@Gryphe please create a new issue and provide instructions to reproduce this (ideally using curl as the client).

@slaren
Copy link
Member

slaren commented Sep 9, 2024

@ggerganov Maybe that is caused by the reset function of the dist sampler? I see there is a gpt_sampler_reset in update_slots. Possibly related to #8971 as well.

@ggerganov
Copy link
Member

@ggerganov Maybe that is caused by the reset function of the dist sampler? I see there is a gpt_sampler_reset in update_slots. Possibly related to #8971 as well.

It looks like it's because of passing the -1 value for the penalty_last_n argument. #9398 seems to resolve it by clamping the value to 0.

To fix this issue we should update gpt_sampler_init to pass the context size when the params.penalty_last_n == -1.

@slaren slaren added medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) and removed bug-unconfirmed labels Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Projects
None yet
Development

No branches or pull requests

4 participants