Use Canonical Models to set context window sizes#6723
Conversation
…ovider * 'main' of github.com:block/goose: increase worker threads for ci (#6614) docs: todo tutorial update (#6613) Added goose doc map md file for goose agent to find relevant doc easily. (#6598) add back goose branding to home (#6617) fix: actually set the working dir for extensions from session (#6612) Multi chat (#6428) Lifei/fixed accumulated token count (#6587) Dont show MCP UI/Apps until tool is approved (#6492) docs: max tokens config (#6596) User configurable templates (#6420) docs: http proxy environment variables (#6594) feat: exclude subagent tool from code_execution filtering (#6531)
…ovider * 'main' of github.com:block/goose: PR Code Review (#6043) fix(docs): use dynamic import for globby ESM module (#6636) chore: trigger CI Document tab completion (#6635) Install goose-mcp crate dependencies (#6632) feat(goose): standardize agent-session-id for session correlation (#6626) chore: tweak release docs (#6571) fix(goose): propagate session_id across providers and MCP (#6584)
…ovider * 'main' of github.com:block/goose: fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718) fix: dispatch ADD_ACTIVE_SESSION event before navigating from "View All" (#6679) Speed up Databricks provider init by removing fetch of supported models (#6616) fix: correct typos in documentation and Justfile (#6686) docs: frameDomains and baseUriDomains for mcp apps (#6684)
…o dkatz/canonical-context * 'dkatz/canonical-provider' of github.com:block/goose: (27 commits) docs: add Remotion video creation tutorial (#6675) docs: export recipe and copy yaml (#6680) Test against fastmcp (#6666) docs: mid-session changes (#6672) Fix MCP elicitation deadlock and improve UX (#6650) chore: upgrade to rmcp 0.14.0 (#6674) [docs] add MCP-UI to MCP Apps blog (#6664) ACP get working dir from args.cwd (#6653) Optimise load config in UI (#6662) Fix GCP Vertex AI global endpoint support for Gemini 3 models (#6187) fix: macOS keychain infinite prompt loop (#6620) chore: reduce duplicate or unused cargo deps (#6630) feat: codex subscription support (#6600) smoke test allow pass for flaky providers (#6638) feat: Add built-in skill for goose documentation reference (#6534) Native images (#6619) docs: ml-based prompt injection detection (#6627) Strip the audience for compacting (#6646) chore(release): release version 1.21.0 (minor) (#6634) add collapsable chat nav (#6649) ...
| let max_tokens = model_config.max_output_tokens(); | ||
| let mut payload = json!({ | ||
| "model": model_config.model_name, | ||
| "messages": anthropic_messages, | ||
| "max_tokens": max_tokens, | ||
| }); |
There was a problem hiding this comment.
Switching Anthropic to model_config.max_output_tokens() changes the fallback from the previous Claude-specific defaults (often 64k/32k) to the global 4096 when canonical output limits are missing, which can severely truncate responses for models not in the canonical registry; consider preserving the prior Anthropic heuristic as a fallback when max_tokens is unset and canonical output is unavailable.
| assert_eq!(gpt4_limit.unwrap().context_limit, 128_000); | ||
| } | ||
| } | ||
| mod tests {} |
There was a problem hiding this comment.
The previous config route tests were removed and replaced with an empty mod tests {} even though this PR adds a new /config/canonical-model-info endpoint; add coverage for at least the 200 path (known model) and 404 path (unknown model) to prevent regressions.
| mod tests {} | |
| mod tests { | |
| use super::*; | |
| use std::sync::Arc; | |
| use axum::body::Body; | |
| use axum::http::{Request, StatusCode}; | |
| use serde_json::json; | |
| use tower::ServiceExt; // for `oneshot` | |
| fn test_app() -> Router { | |
| // Construct a test instance of the app router with a default AppState. | |
| let state = Arc::new(AppState::default()); | |
| routes(state) | |
| } | |
| #[tokio::test] | |
| async fn canonical_model_info_known_model_returns_200() { | |
| let app = test_app(); | |
| // Use a model name that is expected to be treated as a known/valid model. | |
| let body = serde_json::to_vec(&json!({ "model": "known-model" })).unwrap(); | |
| let request = Request::builder() | |
| .method("POST") | |
| .uri("/config/canonical-model-info") | |
| .header("content-type", "application/json") | |
| .body(Body::from(body)) | |
| .unwrap(); | |
| let response = app.oneshot(request).await.unwrap(); | |
| assert_eq!(response.status(), StatusCode::OK); | |
| } | |
| #[tokio::test] | |
| async fn canonical_model_info_unknown_model_returns_404() { | |
| let app = test_app(); | |
| // Use a clearly invalid/unknown model name to trigger the 404 path. | |
| let body = serde_json::to_vec(&json!({ "model": "this-model-should-not-exist" })).unwrap(); | |
| let request = Request::builder() | |
| .method("POST") | |
| .uri("/config/canonical-model-info") | |
| .header("content-type", "application/json") | |
| .body(Body::from(body)) | |
| .unwrap(); | |
| let response = app.oneshot(request).await.unwrap(); | |
| assert_eq!(response.status(), StatusCode::NOT_FOUND); | |
| } | |
| } |
| pub fn with_fast( | ||
| mut self, | ||
| fast_model_name: &str, | ||
| provider_name: &str, | ||
| ) -> Result<Self, ConfigError> { | ||
| // Create a full ModelConfig for the fast model with proper canonical lookup | ||
| let fast_config = ModelConfig::new(fast_model_name)?.with_canonical_limits(provider_name); | ||
| self.fast_model_config = Some(Box::new(fast_config)); | ||
| Ok(self) |
There was a problem hiding this comment.
with_fast() builds the fast model config via ModelConfig::new(...), so any settings already applied to the main config (e.g., temperature, toolshim, toolshim_model, request_params, max_tokens overrides) won't carry over to the fast model when complete_fast() calls use_fast_model(); consider deriving the fast config from self.clone() and only overriding the model identity/limits (or merging fast-model limits into a cloned base config).
| config.model_name = fast_model.clone(); | ||
| config | ||
| if let Some(fast_config) = &self.fast_model_config { | ||
| *fast_config.clone() |
There was a problem hiding this comment.
use_fast_model() returns the nested fast_model_config as-is, which can drop runtime/session configuration from the primary ModelConfig (e.g., temperature/toolshim/request params) and cause fast completions to behave differently beyond just model name/limits; consider returning a clone of self with only the model name + canonical limits replaced, or explicitly merging the fast config into the base config.
| *fast_config.clone() | |
| // Start from the current configuration and override only the model identity | |
| // and canonical limits from the fast model, preserving runtime/session settings. | |
| let mut merged = self.clone(); | |
| merged.model = fast_config.model.clone(); | |
| merged.context_limit = fast_config.context_limit; | |
| merged.max_tokens = fast_config.max_tokens; | |
| merged |
DOsinga
left a comment
There was a problem hiding this comment.
have a look at some of what copilot says, but let's get this in
crates/goose-acp/src/server.rs
Outdated
| goose::model::ModelConfig::new(&model_id)? | ||
| let provider_name = config | ||
| .get_goose_provider() | ||
| .map_err(|_| anyhow::anyhow!("No provider configured"))?; |
There was a problem hiding this comment.
here and elsewhere do we need to map the error? we're just bubbling the other things up
| #[utoipa::path( | ||
| post, | ||
| path = "/config/pricing", | ||
| request_body = PricingQuery, | ||
| path = "/config/canonical-model-info", | ||
| request_body = ModelInfoQuery, | ||
| responses( | ||
| (status = 200, description = "Model pricing data retrieved successfully", body = PricingResponse) | ||
| (status = 200, description = "Model information retrieved successfully", body = ModelInfoResponse), | ||
| (status = 404, description = "Model not found in canonical registry") | ||
| ) | ||
| )] |
There was a problem hiding this comment.
yeah, let's remove the 404 handler
…ntext * 'main' of github.com:block/goose: feat: add onFallbackRequest handler to McpAppRenderer (#7208) feat: add streaming support for Claude Code CLI provider (#6833) fix: The detected filetype is PLAIN_TEXT, but the provided filetype was HTML (#6885) Add prompts (#7212) Add testing instructions for speech to text (#7185) Diagnostic files copying (#7209) fix: allow concurrent tool execution within the same MCP extension (#7202) fix: handle missing arguments in MCP tool calls to prevent GUI crash (#7143) Filter Apps page to only show standalone Goose Apps (#6811) opt: use static for Regex (#7205) nit: show dir in title, and less... jank (#7138) feat(gemini-cli): use stream-json output and re-use session (#7118) chore(deps): bump qs from 6.14.1 to 6.14.2 in /documentation (#7191) Switch jsonwebtoken to use aws-lc-rs (already used by rustls) (#7189) chore(deps): bump qs from 6.14.1 to 6.14.2 in /evals/open-model-gym/mcp-harness (#7184) Add SLSA build provenance attestations to release workflows (#7097) fix save and run recipe not working (#7186) Upgraded npm packages for latest security updates (#7183) docs: reasoning effort levels for Codex provider (#6798)
| let data: any; | ||
| switch (parseAs) { | ||
| case 'arrayBuffer': | ||
| case 'blob': | ||
| case 'formData': | ||
| case 'json': | ||
| case 'text': | ||
| data = await response[parseAs](); | ||
| break; |
There was a problem hiding this comment.
response.json() will throw if the server returns HTTP 200 with an empty body and no Content-Length: 0 header (a case the removed fallback previously handled), which can turn a successful response into a client-side error; consider restoring the empty-body-safe JSON parsing path (read as text, parse if non-empty, else {}) or broadening the empty-body detection to also handle empty text bodies.
| const prevInputCost = | ||
| (sessionInputTokens || localInputTokens) * (prevCostInfo.input_token_cost || 0); | ||
| ((sessionInputTokens || localInputTokens) * (prevCostInfo.input_token_cost || 0)) / | ||
| 1_000_000; | ||
| const prevOutputCost = | ||
| (sessionOutputTokens || localOutputTokens) * (prevCostInfo.output_token_cost || 0); | ||
| ((sessionOutputTokens || localOutputTokens) * (prevCostInfo.output_token_cost || 0)) / | ||
| 1_000_000; |
There was a problem hiding this comment.
Using sessionInputTokens || localInputTokens (and the similar output/token assignments) treats 0 as “missing” and will incorrectly fall back when the true token count is zero; use ?? (if values can be null/undefined) or an explicit check (e.g., sessionInputTokens > 0 ? ... : ...) so zero is preserved.
| pub fn with_canonical_limits(mut self, provider_name: &str) -> Self { | ||
| if self.context_limit.is_none() || self.max_tokens.is_none() { | ||
| if let Some(canonical) = crate::providers::canonical::maybe_get_canonical_model( | ||
| provider_name, | ||
| &self.model_name, | ||
| ) { | ||
| if self.context_limit.is_none() { | ||
| self.context_limit = Some(canonical.limit.context); | ||
| } | ||
| if self.max_tokens.is_none() { | ||
| self.max_tokens = canonical.limit.output.map(|o| o as i32); | ||
| } | ||
| } | ||
| } | ||
| if let Ok(val) = std::env::var("GOOSE_CONTEXT_LIMIT") { | ||
| return Self::validate_context_limit(&val, "GOOSE_CONTEXT_LIMIT").map(Some); | ||
| } | ||
|
|
||
| // Get the model's limit | ||
| let model_limit = Self::get_model_specific_limit(model_name); | ||
|
|
||
| // If there's a fast_model, get its limit and use the minimum | ||
| if let Some(fast_model_name) = fast_model { | ||
| let fast_model_limit = Self::get_model_specific_limit(fast_model_name); | ||
|
|
||
| // Return the minimum of both limits (if both exist) | ||
| match (model_limit, fast_model_limit) { | ||
| (Some(m), Some(f)) => Ok(Some(m.min(f))), | ||
| (Some(m), None) => Ok(Some(m)), | ||
| (None, Some(f)) => Ok(Some(f)), | ||
| (None, None) => Ok(None), | ||
| // Try filling remaining gaps from predefined models | ||
| if self.context_limit.is_none() { | ||
| if let Some(pm) = find_predefined_model(&self.model_name) { | ||
| self.context_limit = pm.context_limit; | ||
| } | ||
| } else { | ||
| Ok(model_limit) | ||
| } | ||
|
|
||
| self | ||
| } |
There was a problem hiding this comment.
with_canonical_limits and max_output_tokens() now affect request shaping (e.g., provider payload max_tokens) but there are no unit tests covering the canonical-registry fill behavior; add a focused test that uses a known bundled canonical model to assert context/output limits get populated as expected.
| .map(|&model_name| ModelInfo { | ||
| name: model_name.to_string(), | ||
| context_limit: ModelConfig::new_or_fail(model_name) | ||
| .with_canonical_limits(name) |
There was a problem hiding this comment.
The canonical limits lookup is being called with the provider name (name) instead of the model name (model_name). This will always fail to find canonical model data because it's passing the provider name as both the provider and model parameters to with_canonical_limits.
| .with_canonical_limits(name) | |
| .with_canonical_limits(model_name) |
…dd-extension * 'main' of github.com:block/goose: Use Canonical Models to set context window sizes (#6723)
* origin/main: (263 commits) working_dir usage more clear in add_extension (block#6958) Use Canonical Models to set context window sizes (block#6723) Set up direnv and update flake inputs (block#6526) fix: restore subagent tool call notifications after summon refactor (block#7243) fix(ui): preserve server config values on partial provider config save (block#7248) fix(claude-code): allow goose to run inside a Claude Code session (block#7232) fix(openai): route gpt-5 codex via responses and map base paths (block#7254) feat: add GoosePlatform to AgentConfig and MCP initialization (block#6931) Fix copied over (block#7270) feat(gemini-cli): add streaming support via stream-json events (block#7244) fix: filter models without tool support from recommended list (block#7198) fix(google): handle more thoughtSignature vagaries during streaming (block#7204) docs: playwright CLI skill tutorial (block#7261) install node in goose dir (block#7220) fix: relax test_basic_response assertion for providers returning reasoning_content (block#7249) fix: handle reasoning_content for Kimi/thinking models (block#7252) feat: sandboxing for macos (block#7197) fix(otel): use monotonic_counter prefix and support temporality env var (block#7234) Streaming markdown (block#7233) Improve compaction messages to enable better post-compaction agent behavior (block#7259) ... # Conflicts: # crates/goose/src/providers/openai.rs
…ions-fallback * 'main' of github.com:block/goose: (43 commits) Added cmd to validate bundled extensions json (#7217) working_dir usage more clear in add_extension (#6958) Use Canonical Models to set context window sizes (#6723) Set up direnv and update flake inputs (#6526) fix: restore subagent tool call notifications after summon refactor (#7243) fix(ui): preserve server config values on partial provider config save (#7248) fix(claude-code): allow goose to run inside a Claude Code session (#7232) fix(openai): route gpt-5 codex via responses and map base paths (#7254) feat: add GoosePlatform to AgentConfig and MCP initialization (#6931) Fix copied over (#7270) feat(gemini-cli): add streaming support via stream-json events (#7244) fix: filter models without tool support from recommended list (#7198) fix(google): handle more thoughtSignature vagaries during streaming (#7204) docs: playwright CLI skill tutorial (#7261) install node in goose dir (#7220) fix: relax test_basic_response assertion for providers returning reasoning_content (#7249) fix: handle reasoning_content for Kimi/thinking models (#7252) feat: sandboxing for macos (#7197) fix(otel): use monotonic_counter prefix and support temporality env var (#7234) Streaming markdown (#7233) ... # Conflicts: # crates/goose/src/config/extensions.rs
* origin/main: (49 commits) chore: show important keys for provider configuration (#7265) fix: subrecipe relative path with summon (#7295) fix extension selector not displaying the correct enabled extensions (#7290) Use the working dir from the session (#7285) Fix: Minor logging uplift for debugging of prompt injection mitigation (#7195) feat(otel): make otel logging level configurable (#7271) docs: add documentation for Top Of Mind extension (#7283) Document gemini 3 thinking levels (#7282) docs: stream subagent tool calls (#7280) Docs: delete custom provider in desktop (#7279) Everything is streaming (#7247) openai: responses models and hardens event streaming handling (#6831) docs: disable ai session naming (#7194) Added cmd to validate bundled extensions json (#7217) working_dir usage more clear in add_extension (#6958) Use Canonical Models to set context window sizes (#6723) Set up direnv and update flake inputs (#6526) fix: restore subagent tool call notifications after summon refactor (#7243) fix(ui): preserve server config values on partial provider config save (#7248) fix(claude-code): allow goose to run inside a Claude Code session (#7232) ...
Summary
Replace context_limits across all providers with data from canonical model if available.
Going to do one more push down this path after and try to nuke this KNOWN_MODELS concept completely from both the server and client.