feat(provider): auto-detect Ollama context limits#10758
feat(provider): auto-detect Ollama context limits#10758felipemadero wants to merge 2 commits intoanomalyco:devfrom
Conversation
Query Ollama API to get model context limits (num_ctx) for proper context percentage display in status bar. Detects Ollama servers by checking if root endpoint returns "Ollama is running", then fetches model info via /api/show. Falls back to 4096 default.
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
|
The following comment was made by an LLM, it may be inaccurate: Based on my search, I found the following potentially related PRs:
These are the closest matches, but none appear to be exact duplicates of PR #10758. You should review PR #3726 and #8359 for any overlap in Ollama auto-detection implementation logic. |
When limit.output is 0, fall back to reserving 20% of context (capped at OUTPUT_TOKEN_MAX) instead of hardcoded 32000. This fixes compaction triggering immediately on small context models like 16k Ollama models.
f083bcc to
10aa64b
Compare
b8ed4d9 to
10aa64b
Compare
00637c0 to
71e0ba2
Compare
f1ae801 to
08fa7f7
Compare
Fixes #10759
Summary
GET /returns"Ollama is running"POST /api/showto getnum_ctxfrom Modelfile parameterslimit.contexttakes priority if set by userlimit.output: reserve 10% of context instead of hardcoded 32000This enables context percentage display in the status bar for Ollama models without manual configuration, and fixes compaction triggering immediately on small context models.