-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
starcoder -- not enough space in the context's memory pool #158
Comments
Interesting find! Thank you for raising this. Two questions:
|
Just tried example code I used to test santacoder (note, this isn't directly on ggml executable, but through Python 3.10.11 (main, Apr 12 2023, 14:46:22) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lambdaprompt as lp
>>> import os
>>> os.environ['LAMBDAPROMPT_BACKEND'] = 'SantaCoderGGML'
>>> comp = lp.Completion("# Some code to print fibonacci numbers\n"*100, max_new_tokens=100)
>>> comp()
Fetching 0 files: 0it [00:00, ?it/s]
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 25575.02it/s]
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 268617232, available 268435456)
Segmentation fault (core dumped) (I did one other test with >>> len(lp.backends.backends['completion'].model.tokenize("# Some code to print fibonacci numbers\n"*60))
720
>>> len(lp.backends.backends['completion'].model.tokenize("# Some code to print fibonacci numbers\n"*100))
1200 I'll try out the |
bigcode-project/starcoder.cpp#3 Seems someone else has run into this on the |
I tried looking into this but the python script from the example fails to download the model on Mac OS: $ ▶ python3 examples/starcoder/convert-hf-to-ggml.py bigcode/gpt_bigcode-santacoder
Loading model: bigcode/gpt_bigcode-santacoder
Traceback (most recent call last):
File "/Users/ggerganov/development/github/ggml/examples/starcoder/convert-hf-to-ggml.py", line 56, in <module>
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 766, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 473, in __getitem__
raise KeyError(key)
KeyError: 'gpt_bigcode' Any ideas how to fix this? |
@ggerganov I think you're on an old version of |
@ggerganov I've been trying to increase context's memory pool by modifying this part of the code ctx_size += 10 * 1024 * 1024; // TODO: tune this
printf("%s: ggml ctx size = %6.2f MB\n", __func__, ctx_size/(1024.0*1024.0)); but it doesnt seem to affect Any idea how to increase |
The problem is in the "eval" context: ggml/examples/starcoder/main.cpp Lines 415 to 431 in c2fab8a
Currently, it starts with a 256 MB buffer and is increased based on Here I tried to improve this using scratch buffers: #176 Please give it a try and let me know if your tests still crash using this version |
I am observing a similar issue with the python wrapper llama-cpp-llama: |
Hi I was trying GPT4all 1.3 groovy model and i faced the same issue. i am not able to understand why this is happening, Can anybody provide me with some solution for it. |
@eshaanagarwal the only "solution" that I found was a reboot. Since rebooting is not an option I had to switch to different models. For me all 30B/33B LLM models eventually develop this error when the input context is reaching the upper limit. This does not affect the 65B models. I do not know about any other relationships as this is my use case. |
@ggerganov can the memory leak or the issue be fixed ? Or any possible direction as to how to fix it ? Because I really need for this model to work |
@eshaanagarwal If you are using the latest version of the If the issue occur, please provide more details about the model that you are using, your system information and the parameters with which you trigger the error |
I'm getting errors with starcoder models when I try to include any non-trivial amount of tokens. I'm getting this with both my raw model (direct .bin) and quantized model regardless of version (pre Q4/Q5 changes and post Q4/Q5 changes).
Relevant error:
Example:
./build/bin/starcoder -m /workspaces/research/models/starcoder/starcoder-ggml.bin -p "def fibo( fibo fib fibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test " --top_k 0 --top_p 0.95 --temp 0.2
will cause the error
(Here's another output from the quantized model)
Best I can find in the past was ggerganov/llama.cpp#29
But, maybe that was fixed in llama models, but the problem has returned for starcoder?
Based on: #146
Specifically hoping that @NouamaneTazi might have some clarity on why this might be happening?
The text was updated successfully, but these errors were encountered: