-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent user from setting a context size that is too big #266
Comments
I think there might be a limt to the size of the context. Which context size did you set for your test? |
limit is 2048, anything above behaves very badly. |
I have set it to 8096
That explains it, there should be a warning in llama.cpp when using anything above that. |
A PR would be appreciated if you’re interested in adding it :) The arguments are currently parsed here: |
Is this proven? My outputs are still coherent with 4096 context (and I was running 8192 and even 12288 for other experiments). sample (13B model)
[Begin chat] I've also disabled the fixed token output limit here (so it only stops at while (true) { // remaining_tokens > 0) { |
Feel like mileage may vary here. Due to session output being finite and tied to context size, I personally run upwards to 16k context size on 7B and 8k context on 13B for small chatbot experiments and only once or twice has generation gone completely off the rails for me. Happening after dozens and dozens of responses. I suspect for larger models this may be less valid. |
edit: nevermind, i was just hallucinating it is used earlier. |
I'd consider this fixed by #274. |
Don't mean to keep this issue active but I was a little confused by this. Does this mean if your |
yes, that was what i observed for token predictions past the 2048. -> |
Hey!
I tasked the 30B model to write a little story... it worked really well until some point where it went off rails from one line to the next, suddenly talking about some girl and stuff that has nothing to do with the rest:
The model is quantized (q4_0) and I am on Linux (x86_64) with 64 GB of RAM.
The text was updated successfully, but these errors were encountered: