Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ollama models context size not properly imported/reflected #309

Closed
XReyRobert opened this issue Dec 27, 2023 · 5 comments
Closed

[BUG] ollama models context size not properly imported/reflected #309

XReyRobert opened this issue Dec 27, 2023 · 5 comments
Milestone

Comments

@XReyRobert
Copy link

Describe the bug
ollama models context size not properly imported/reflected

Where is it happening?
To Reproduce
import 128K ollama model (ex Yarn-mistral 7b-128k) show model details / max model tokens in UI

Expected behavior

Screenshots / context

If applicable, please add screenshots or additional context

Capture d’écran 2023-12-27 à 15 01 21

@XReyRobert XReyRobert changed the title [BUG] [BUG] ollama models context size not properly imported/reflected Dec 27, 2023
@enricoros
Copy link
Owner

Thanks @XReyRobert . Unfortunately Ollama does not usually provide the context size, so it's assumed to be 4k across the board.

The /models API does not provide it, and the models list did not.

In your particular case, the name of the model has the context size, but that's a rarity.

What's the best way to deal with this, or to get context sizes for all models?

@XReyRobert
Copy link
Author

XReyRobert commented Dec 27, 2023

Hi @enricoros,

There's a "show" endpoint that gives additional parameters when available:
for example mistrallite:latest and yarn-mistral:7b-128k will display this "num_ctx" parameter.

curl http://localhost:11434/api/show -d '{
  "name": "mistrallite:latest"
}' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   992  100   958  100    34   656k  23876 --:--:-- --:--:-- --:--:--  968k
{
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM mistrallite:latest\n\nFROM /usr/share/ollama/.ollama/models/blobs/sha256:fcfc737faf6b2bb5050752602ca341e92ec4d8208f2b5762bd656d447be9910e\nTEMPLATE \"\"\"<|prompter|>{{ .System }} {{ .Prompt }}</s><|assistant|>\n\"\"\"\nPARAMETER num_ctx 32768\nPARAMETER stop \"<|prompter|>\"\nPARAMETER stop \"<|assistant|>\"\nPARAMETER stop \"</s>\"",
  "parameters": "num_ctx                        32768\nstop                           <|prompter|>\nstop                           <|assistant|>\nstop                           </s>",
  "template": "<|prompter|>{{ .System }} {{ .Prompt }}</s><|assistant|>\n",
  "details": {
    "format": "gguf",
    "family": "llama",
    "families": null,
    "parameter_size": "7B",
    "quantization_level": "Q4_0"
  }
}
curl http://localhost:11434/api/show -d '{
  "name": "yarn-mistral:7b-128k"
}' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   568  100   532  100    36   423k  29315 --:--:-- --:--:-- --:--:--  554k
{
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM yarn-mistral:7b-128k\n\nFROM /usr/share/ollama/.ollama/models/blobs/sha256:14f2e225961b80d791d14c88def05fca31abc44ab1a7a12ba8e8f2365442e6e6\nTEMPLATE \"\"\"{{ .Prompt }}\"\"\"\nPARAMETER num_ctx 131072",
  "parameters": "num_ctx                        131072",
  "template": "{{ .Prompt }}",
  "details": {
    "format": "gguf",
    "family": "llama",
    "families": null,
    "parameter_size": "7B",
    "quantization_level": "Q4_0"
  }
}

@enricoros enricoros added this to the 1.10.0 milestone Dec 28, 2023
@enricoros enricoros added the good first issue Good for newcomers label Dec 28, 2023
@enricoros enricoros removed this from the 1.10.0 milestone Jan 6, 2024
@peperunas
Copy link

I confirm the bug. Also, for what it's worth, this Ollama release changelog specifies how to pass a 32k context window to Mixtral (and I suppose other models as well). https://github.com/jmorganca/ollama/releases/tag/v0.1.19

@enricoros
Copy link
Owner

I confirm the bug. Also, for what it's worth, this Ollama release changelog specifies how to pass a 32k context window to Mixtral (and I suppose other models as well). https://github.com/jmorganca/ollama/releases/tag/v0.1.19

Thanks! I'll prioritize this issue. I can quickly fix it as far as knowing the context size.

For the "32k Mixtral" the weird part is that it should not be the developer to tell the API what the context window is, but the other way around. Commonly, APIs usually pass a "max_tokens" parameter as a hard limit to the response length - I'm sure the Ollama folks will make the API more standard. Their recent /chat endpoint shows that they're on a good path.

Prioritized.

@enricoros
Copy link
Owner

@XReyRobert implemented, releasing in 3 hours in 1.12.0. Context size is inferred from num_ctx where available and set correctly. Please refer to Ollama / Jeffrey's post (https://github.com/jmorganca/ollama/releases/tag/v0.1.19) to alter that on your Ollama files.

@enricoros enricoros removed the good first issue Good for newcomers label Jan 26, 2024
@enricoros enricoros mentioned this issue Jan 26, 2024
23 tasks
jimjonesbabyfreshout pushed a commit to jimjonesbabyfreshout/big-AGI that referenced this issue Feb 19, 2024
Note that from testing, only yarn-mistral has a number set that's not 4096,
while some models don't have parameters, don't have a 'num_ctx' value to parse
within, or have it set to 4096.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants