Skip to content

Use Canonical Models to set context window sizes#6723

Merged
katzdave merged 82 commits intomainfrom
dkatz/canonical-context
Feb 17, 2026
Merged

Use Canonical Models to set context window sizes#6723
katzdave merged 82 commits intomainfrom
dkatz/canonical-context

Conversation

@katzdave
Copy link
Collaborator

@katzdave katzdave commented Jan 26, 2026

Summary

Replace context_limits across all providers with data from canonical model if available.

  • API change of adding provider to model config hits a lot of files.
  • Also nesting the fast model config, so we actually use the correct context limits for that; There's already a fallback in complete_fast to use the regular model if it fails.
  • Replace pricing api with a more general canonical model fetch. Will layer more info there as the client needs.

Going to do one more push down this path after and try to nuke this KNOWN_MODELS concept completely from both the server and client.

…ovider

* 'main' of github.com:block/goose:
  increase worker threads for ci (#6614)
  docs: todo tutorial update (#6613)
  Added goose doc map md file for goose agent to find relevant doc easily. (#6598)
  add back goose branding to home (#6617)
  fix: actually set the working dir for extensions from session (#6612)
  Multi chat (#6428)
  Lifei/fixed accumulated token count (#6587)
  Dont show MCP UI/Apps until tool is approved (#6492)
  docs: max tokens config (#6596)
  User configurable templates (#6420)
  docs: http proxy environment variables (#6594)
  feat: exclude subagent tool from code_execution filtering (#6531)
…ovider

* 'main' of github.com:block/goose:
  PR Code Review (#6043)
  fix(docs): use dynamic import for globby ESM module (#6636)
  chore: trigger CI
  Document tab completion (#6635)
  Install goose-mcp crate dependencies (#6632)
  feat(goose): standardize agent-session-id for session correlation (#6626)
  chore: tweak release docs (#6571)
  fix(goose): propagate session_id across providers and MCP (#6584)
…ovider

* 'main' of github.com:block/goose:
  fix: Manual compaction does not update context window. (#6682)
  Removed the Acceptable Usage Policy (#6204)
  Document spellcheck toggle (#6721)
  fix: docs workflow cleanup and prevent cancellations (#6713)
  Docs: file bug directly (#6718)
  fix: dispatch ADD_ACTIVE_SESSION event before navigating from "View All" (#6679)
  Speed up Databricks provider init by removing fetch of supported models (#6616)
  fix: correct typos in documentation and Justfile (#6686)
  docs: frameDomains and baseUriDomains for mcp apps (#6684)
…o dkatz/canonical-context

* 'dkatz/canonical-provider' of github.com:block/goose: (27 commits)
  docs: add Remotion video creation tutorial (#6675)
  docs: export recipe and copy yaml (#6680)
  Test against fastmcp (#6666)
  docs: mid-session changes (#6672)
  Fix MCP elicitation deadlock and improve UX (#6650)
  chore: upgrade to rmcp 0.14.0 (#6674)
  [docs] add MCP-UI to MCP Apps blog (#6664)
  ACP get working dir from args.cwd (#6653)
  Optimise load config in UI (#6662)
  Fix GCP Vertex AI global endpoint support for Gemini 3 models (#6187)
  fix: macOS keychain infinite prompt loop    (#6620)
  chore: reduce duplicate or unused cargo deps (#6630)
  feat: codex subscription support (#6600)
  smoke test allow pass for flaky providers (#6638)
  feat: Add built-in skill for goose documentation reference (#6534)
  Native images (#6619)
  docs: ml-based prompt injection detection (#6627)
  Strip the audience for compacting (#6646)
  chore(release): release version 1.21.0 (minor) (#6634)
  add collapsable chat nav (#6649)
  ...
@katzdave katzdave changed the title Dkatz/canonical context Use Canonical Models to set context window sizes Jan 26, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 51 out of 52 changed files in this pull request and generated 4 comments.

Comment on lines +402 to 407
let max_tokens = model_config.max_output_tokens();
let mut payload = json!({
"model": model_config.model_name,
"messages": anthropic_messages,
"max_tokens": max_tokens,
});
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching Anthropic to model_config.max_output_tokens() changes the fallback from the previous Claude-specific defaults (often 64k/32k) to the global 4096 when canonical output limits are missing, which can severely truncate responses for models not in the canonical registry; consider preserving the prior Anthropic heuristic as a fallback when max_tokens is unset and canonical output is unavailable.

Copilot uses AI. Check for mistakes.
assert_eq!(gpt4_limit.unwrap().context_limit, 128_000);
}
}
mod tests {}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous config route tests were removed and replaced with an empty mod tests {} even though this PR adds a new /config/canonical-model-info endpoint; add coverage for at least the 200 path (known model) and 404 path (unknown model) to prevent regressions.

Suggested change
mod tests {}
mod tests {
use super::*;
use std::sync::Arc;
use axum::body::Body;
use axum::http::{Request, StatusCode};
use serde_json::json;
use tower::ServiceExt; // for `oneshot`
fn test_app() -> Router {
// Construct a test instance of the app router with a default AppState.
let state = Arc::new(AppState::default());
routes(state)
}
#[tokio::test]
async fn canonical_model_info_known_model_returns_200() {
let app = test_app();
// Use a model name that is expected to be treated as a known/valid model.
let body = serde_json::to_vec(&json!({ "model": "known-model" })).unwrap();
let request = Request::builder()
.method("POST")
.uri("/config/canonical-model-info")
.header("content-type", "application/json")
.body(Body::from(body))
.unwrap();
let response = app.oneshot(request).await.unwrap();
assert_eq!(response.status(), StatusCode::OK);
}
#[tokio::test]
async fn canonical_model_info_unknown_model_returns_404() {
let app = test_app();
// Use a clearly invalid/unknown model name to trigger the 404 path.
let body = serde_json::to_vec(&json!({ "model": "this-model-should-not-exist" })).unwrap();
let request = Request::builder()
.method("POST")
.uri("/config/canonical-model-info")
.header("content-type", "application/json")
.body(Body::from(body))
.unwrap();
let response = app.oneshot(request).await.unwrap();
assert_eq!(response.status(), StatusCode::NOT_FOUND);
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +250 to +258
pub fn with_fast(
mut self,
fast_model_name: &str,
provider_name: &str,
) -> Result<Self, ConfigError> {
// Create a full ModelConfig for the fast model with proper canonical lookup
let fast_config = ModelConfig::new(fast_model_name)?.with_canonical_limits(provider_name);
self.fast_model_config = Some(Box::new(fast_config));
Ok(self)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_fast() builds the fast model config via ModelConfig::new(...), so any settings already applied to the main config (e.g., temperature, toolshim, toolshim_model, request_params, max_tokens overrides) won't carry over to the fast model when complete_fast() calls use_fast_model(); consider deriving the fast config from self.clone() and only overriding the model identity/limits (or merging fast-model limits into a cloned base config).

Copilot uses AI. Check for mistakes.
config.model_name = fast_model.clone();
config
if let Some(fast_config) = &self.fast_model_config {
*fast_config.clone()
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_fast_model() returns the nested fast_model_config as-is, which can drop runtime/session configuration from the primary ModelConfig (e.g., temperature/toolshim/request params) and cause fast completions to behave differently beyond just model name/limits; consider returning a clone of self with only the model name + canonical limits replaced, or explicitly merging the fast config into the base config.

Suggested change
*fast_config.clone()
// Start from the current configuration and override only the model identity
// and canonical limits from the fast model, preserving runtime/session settings.
let mut merged = self.clone();
merged.model = fast_config.model.clone();
merged.context_limit = fast_config.context_limit;
merged.max_tokens = fast_config.max_tokens;
merged

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a look at some of what copilot says, but let's get this in

goose::model::ModelConfig::new(&model_id)?
let provider_name = config
.get_goose_provider()
.map_err(|_| anyhow::anyhow!("No provider configured"))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and elsewhere do we need to map the error? we're just bubbling the other things up

Comment on lines 461 to 469
#[utoipa::path(
post,
path = "/config/pricing",
request_body = PricingQuery,
path = "/config/canonical-model-info",
request_body = ModelInfoQuery,
responses(
(status = 200, description = "Model pricing data retrieved successfully", body = PricingResponse)
(status = 200, description = "Model information retrieved successfully", body = ModelInfoResponse),
(status = 404, description = "Model not found in canonical registry")
)
)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's remove the 404 handler

…ntext

* 'main' of github.com:block/goose:
  feat: add onFallbackRequest handler to McpAppRenderer (#7208)
  feat: add streaming support for Claude Code CLI provider (#6833)
  fix: The detected filetype is PLAIN_TEXT, but the provided filetype was HTML (#6885)
  Add prompts (#7212)
  Add testing instructions for speech to text (#7185)
  Diagnostic files copying (#7209)
  fix: allow concurrent tool execution within the same MCP extension (#7202)
  fix: handle missing arguments in MCP tool calls to prevent GUI crash (#7143)
  Filter Apps page to only show standalone Goose Apps (#6811)
  opt: use static for Regex (#7205)
  nit: show dir in title, and less... jank (#7138)
  feat(gemini-cli): use stream-json output and re-use session (#7118)
  chore(deps): bump qs from 6.14.1 to 6.14.2 in /documentation (#7191)
  Switch jsonwebtoken to use aws-lc-rs (already used by rustls) (#7189)
  chore(deps): bump qs from 6.14.1 to 6.14.2 in /evals/open-model-gym/mcp-harness (#7184)
  Add SLSA build provenance attestations to release workflows (#7097)
  fix save and run recipe not working (#7186)
  Upgraded npm packages for latest security updates (#7183)
  docs: reasoning effort levels for Codex provider (#6798)
Copilot AI review requested due to automatic review settings February 13, 2026 19:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 64 out of 65 changed files in this pull request and generated 3 comments.

Comment on lines 180 to 188
let data: any;
switch (parseAs) {
case 'arrayBuffer':
case 'blob':
case 'formData':
case 'json':
case 'text':
data = await response[parseAs]();
break;
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response.json() will throw if the server returns HTTP 200 with an empty body and no Content-Length: 0 header (a case the removed fallback previously handled), which can turn a successful response into a client-side error; consider restoring the empty-body-safe JSON parsing path (read as text, parse if non-empty, else {}) or broadening the empty-body detection to also handle empty text bodies.

Copilot uses AI. Check for mistakes.
Comment on lines 51 to +56
const prevInputCost =
(sessionInputTokens || localInputTokens) * (prevCostInfo.input_token_cost || 0);
((sessionInputTokens || localInputTokens) * (prevCostInfo.input_token_cost || 0)) /
1_000_000;
const prevOutputCost =
(sessionOutputTokens || localOutputTokens) * (prevCostInfo.output_token_cost || 0);
((sessionOutputTokens || localOutputTokens) * (prevCostInfo.output_token_cost || 0)) /
1_000_000;
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using sessionInputTokens || localInputTokens (and the similar output/token assignments) treats 0 as “missing” and will incorrectly fall back when the true token count is zero; use ?? (if values can be null/undefined) or an explicit check (e.g., sessionInputTokens > 0 ? ... : ...) so zero is preserved.

Copilot uses AI. Check for mistakes.
Comment on lines +110 to 133
pub fn with_canonical_limits(mut self, provider_name: &str) -> Self {
if self.context_limit.is_none() || self.max_tokens.is_none() {
if let Some(canonical) = crate::providers::canonical::maybe_get_canonical_model(
provider_name,
&self.model_name,
) {
if self.context_limit.is_none() {
self.context_limit = Some(canonical.limit.context);
}
if self.max_tokens.is_none() {
self.max_tokens = canonical.limit.output.map(|o| o as i32);
}
}
}
if let Ok(val) = std::env::var("GOOSE_CONTEXT_LIMIT") {
return Self::validate_context_limit(&val, "GOOSE_CONTEXT_LIMIT").map(Some);
}

// Get the model's limit
let model_limit = Self::get_model_specific_limit(model_name);

// If there's a fast_model, get its limit and use the minimum
if let Some(fast_model_name) = fast_model {
let fast_model_limit = Self::get_model_specific_limit(fast_model_name);

// Return the minimum of both limits (if both exist)
match (model_limit, fast_model_limit) {
(Some(m), Some(f)) => Ok(Some(m.min(f))),
(Some(m), None) => Ok(Some(m)),
(None, Some(f)) => Ok(Some(f)),
(None, None) => Ok(None),
// Try filling remaining gaps from predefined models
if self.context_limit.is_none() {
if let Some(pm) = find_predefined_model(&self.model_name) {
self.context_limit = pm.context_limit;
}
} else {
Ok(model_limit)
}

self
}
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_canonical_limits and max_output_tokens() now affect request shaping (e.g., provider payload max_tokens) but there are no unit tests covering the canonical-registry fill behavior; add a focused test that uses a known bundled canonical model to assert context/output limits get populated as expected.

Copilot uses AI. Check for mistakes.
…ntext

* 'main' of github.com:block/goose:
  Move platform extensions into their own folder (#7210)
  fix: ignore deprecated skills extension (#7139)
  Add a goosed over HTTP integration test, and test the developer tool PATH (#7178)
Copilot AI review requested due to automatic review settings February 13, 2026 22:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 53 changed files in this pull request and generated 1 comment.

.map(|&model_name| ModelInfo {
name: model_name.to_string(),
context_limit: ModelConfig::new_or_fail(model_name)
.with_canonical_limits(name)
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The canonical limits lookup is being called with the provider name (name) instead of the model name (model_name). This will always fail to find canonical model data because it's passing the provider name as both the provider and model parameters to with_canonical_limits.

Suggested change
.with_canonical_limits(name)
.with_canonical_limits(model_name)

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 17, 2026 16:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 54 out of 55 changed files in this pull request and generated no new comments.

@katzdave katzdave added this pull request to the merge queue Feb 17, 2026
Merged via the queue into main with commit 3959805 Feb 17, 2026
30 of 31 checks passed
@katzdave katzdave deleted the dkatz/canonical-context branch February 17, 2026 16:48
zanesq added a commit that referenced this pull request Feb 17, 2026
…dd-extension

* 'main' of github.com:block/goose:
  Use Canonical Models to set context window sizes (#6723)
katzdave added a commit to YusukeShimizu/goose that referenced this pull request Feb 17, 2026
* origin/main: (263 commits)
  working_dir usage more clear in add_extension (block#6958)
  Use Canonical Models to set context window sizes (block#6723)
  Set up direnv and update flake inputs (block#6526)
  fix: restore subagent tool call notifications after summon refactor (block#7243)
  fix(ui): preserve server config values on partial provider config save (block#7248)
  fix(claude-code): allow goose to run inside a Claude Code session (block#7232)
  fix(openai): route gpt-5 codex via responses and map base paths (block#7254)
  feat: add GoosePlatform to AgentConfig and MCP initialization (block#6931)
  Fix copied over (block#7270)
  feat(gemini-cli): add streaming support via stream-json events (block#7244)
  fix: filter models without tool support from recommended list (block#7198)
  fix(google): handle more thoughtSignature vagaries during streaming (block#7204)
  docs: playwright CLI skill tutorial (block#7261)
  install node in goose dir (block#7220)
  fix: relax test_basic_response assertion for providers returning reasoning_content (block#7249)
  fix: handle reasoning_content for Kimi/thinking models (block#7252)
  feat: sandboxing for macos (block#7197)
  fix(otel): use monotonic_counter prefix and support temporality env var (block#7234)
  Streaming markdown (block#7233)
  Improve compaction messages to enable better post-compaction agent behavior (block#7259)
  ...

# Conflicts:
#	crates/goose/src/providers/openai.rs
zanesq added a commit that referenced this pull request Feb 17, 2026
…ions-fallback

* 'main' of github.com:block/goose: (43 commits)
  Added cmd to validate bundled extensions json (#7217)
  working_dir usage more clear in add_extension (#6958)
  Use Canonical Models to set context window sizes (#6723)
  Set up direnv and update flake inputs (#6526)
  fix: restore subagent tool call notifications after summon refactor (#7243)
  fix(ui): preserve server config values on partial provider config save (#7248)
  fix(claude-code): allow goose to run inside a Claude Code session (#7232)
  fix(openai): route gpt-5 codex via responses and map base paths (#7254)
  feat: add GoosePlatform to AgentConfig and MCP initialization (#6931)
  Fix copied over (#7270)
  feat(gemini-cli): add streaming support via stream-json events (#7244)
  fix: filter models without tool support from recommended list (#7198)
  fix(google): handle more thoughtSignature vagaries during streaming (#7204)
  docs: playwright CLI skill tutorial (#7261)
  install node in goose dir (#7220)
  fix: relax test_basic_response assertion for providers returning reasoning_content (#7249)
  fix: handle reasoning_content for Kimi/thinking models (#7252)
  feat: sandboxing for macos (#7197)
  fix(otel): use monotonic_counter prefix and support temporality env var (#7234)
  Streaming markdown (#7233)
  ...

# Conflicts:
#	crates/goose/src/config/extensions.rs
jh-block added a commit that referenced this pull request Feb 18, 2026
* origin/main: (49 commits)
  chore: show important keys for provider configuration (#7265)
  fix: subrecipe relative path with summon (#7295)
  fix extension selector not displaying the correct enabled extensions (#7290)
  Use the working dir from the session (#7285)
  Fix: Minor logging uplift for debugging of prompt injection mitigation (#7195)
  feat(otel): make otel logging level configurable (#7271)
  docs: add documentation for Top Of Mind extension (#7283)
  Document gemini 3 thinking levels (#7282)
  docs: stream subagent tool calls (#7280)
  Docs: delete custom provider in desktop (#7279)
  Everything is streaming (#7247)
  openai: responses models and hardens event streaming handling (#6831)
  docs: disable ai session naming (#7194)
  Added cmd to validate bundled extensions json (#7217)
  working_dir usage more clear in add_extension (#6958)
  Use Canonical Models to set context window sizes (#6723)
  Set up direnv and update flake inputs (#6526)
  fix: restore subagent tool call notifications after summon refactor (#7243)
  fix(ui): preserve server config values on partial provider config save (#7248)
  fix(claude-code): allow goose to run inside a Claude Code session (#7232)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments