Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set n_ctx for llama.cpp models when loading/reloading #1872

Closed
digiwombat opened this issue May 7, 2023 · 2 comments
Closed

Set n_ctx for llama.cpp models when loading/reloading #1872

digiwombat opened this issue May 7, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@digiwombat
Copy link

digiwombat commented May 7, 2023

Currently, n_ctx is locked to 2048, but with people starting to experiment with ALiBi models (BluemoonRP, MTP whenever that gets sorted out properly) and RedPajamas talking about hyena and StableLM aiming for 4k context potentially, the ability to bump context numbers for llama.cpp models is going to be something very useful to have going forward. Especially since most of those models are likely to be run on CPU for most consumer hardware people.

I also think expected behavior is that whatever context limit I'm setting in the UI should be passed through to the inference backend. I think requiring a model reload is fine for changing the setting, but it should pass through the value when a ggml model is loading. A "reload model on context size change" setting could be nice to have if there's a clean spot for it though, assuming it would be useful for more than just ggml files. Maybe instead of a checkbox, just a convenient button that pops up after it's changed to cue people to reload the model, since knowing when the user is done adjusting context is hard and reloading is fairly heavy.

Likewise, I think --n_ctx should be a flag that can be set for people who want to automate sh/bat loading of larger context models.

@digiwombat digiwombat added the enhancement New feature or request label May 7, 2023
@LaaZa
Copy link
Contributor

LaaZa commented May 7, 2023

Correct me if I'm wrong but doesn't textgen use specifically llama.cpp via llama-cpp-python, which are for LLaMA?
Maybe it could be useful to be able to change this value anyway though.

@digiwombat
Copy link
Author

I'm pretty sure there was recently a merge for llama.cpp that added loading for GPT NeoX models, I may have misread that somewhere. It might not be in yet. Though it is clear supporting GPT NeoX in llama.cpp isn't being considered a different-project sort of thing.

Discussion here: ggerganov/llama.cpp#1063
And MPT here: ggerganov/llama.cpp#1333

And BluemoonRP 13B is a llama model with ALiBi support baked in (or however that should be phrased) so is loadable and usable today in base llama.cpp without changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants