Set n_ctx for llama.cpp models when loading/reloading #1872

digiwombat · 2023-05-07T04:28:21Z

Currently, n_ctx is locked to 2048, but with people starting to experiment with ALiBi models (BluemoonRP, MTP whenever that gets sorted out properly) and RedPajamas talking about hyena and StableLM aiming for 4k context potentially, the ability to bump context numbers for llama.cpp models is going to be something very useful to have going forward. Especially since most of those models are likely to be run on CPU for most consumer hardware people.

I also think expected behavior is that whatever context limit I'm setting in the UI should be passed through to the inference backend. I think requiring a model reload is fine for changing the setting, but it should pass through the value when a ggml model is loading. A "reload model on context size change" setting could be nice to have if there's a clean spot for it though, assuming it would be useful for more than just ggml files. Maybe instead of a checkbox, just a convenient button that pops up after it's changed to cue people to reload the model, since knowing when the user is done adjusting context is hard and reloading is fairly heavy.

Likewise, I think --n_ctx should be a flag that can be set for people who want to automate sh/bat loading of larger context models.

LaaZa · 2023-05-07T06:06:29Z

Correct me if I'm wrong but doesn't textgen use specifically llama.cpp via llama-cpp-python, which are for LLaMA?
Maybe it could be useful to be able to change this value anyway though.

digiwombat · 2023-05-07T06:13:57Z

I'm pretty sure there was recently a merge for llama.cpp that added loading for GPT NeoX models, I may have misread that somewhere. It might not be in yet. Though it is clear supporting GPT NeoX in llama.cpp isn't being considered a different-project sort of thing.

Discussion here: ggerganov/llama.cpp#1063
And MPT here: ggerganov/llama.cpp#1333

And BluemoonRP 13B is a llama model with ALiBi support baked in (or however that should be phrased) so is loadable and usable today in base llama.cpp without changes.

digiwombat added the enhancement New feature or request label May 7, 2023

digiwombat closed this as completed Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set n_ctx for llama.cpp models when loading/reloading #1872

Set n_ctx for llama.cpp models when loading/reloading #1872

digiwombat commented May 7, 2023 •

edited

Loading

LaaZa commented May 7, 2023

digiwombat commented May 7, 2023

Set n_ctx for llama.cpp models when loading/reloading #1872

Set n_ctx for llama.cpp models when loading/reloading #1872

Comments

digiwombat commented May 7, 2023 • edited Loading

LaaZa commented May 7, 2023

digiwombat commented May 7, 2023

digiwombat commented May 7, 2023 •

edited

Loading