Skip to content

Commit d54b37e

Browse files
committed
Merge branch 'main' into cte/core-package
2 parents b45c26e + 78dc344 commit d54b37e

File tree

237 files changed

+13686
-3189
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

237 files changed

+13686
-3189
lines changed

CHANGELOG.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,97 @@
11
# Roo Code Changelog
22

3+
## [3.36.16] - 2025-12-19
4+
5+
- Fix: Normalize tool schemas for VS Code LM API to resolve error 400 when using VS Code Language Model API providers (PR #10221 by @hannesrudolph)
6+
7+
## [3.36.15] - 2025-12-19
8+
9+
![3.36.15 Release - 1M Context Window Support](/releases/3.36.15-release.png)
10+
11+
- Add 1M context window beta support for Claude Sonnet 4 on Vertex AI, enabling significantly larger context for complex tasks (PR #10209 by @hannesrudolph)
12+
- Add native tool calling support for LM Studio and Qwen-Code providers, improving compatibility with local models (PR #10208 by @hannesrudolph)
13+
- Add native tool call defaults for OpenAI-compatible providers, expanding native function calling across more configurations (PR #10213 by @hannesrudolph)
14+
- Enable native tool calls for Requesty provider (PR #10211 by @daniel-lxs)
15+
- Improve API error handling and visibility with clearer error messages and better user feedback (PR #10204 by @brunobergher)
16+
- Add downloadable error diagnostics from chat errors, making it easier to troubleshoot and report issues (PR #10188 by @brunobergher)
17+
- Fix refresh models button not properly flushing the cache, ensuring model lists update correctly (#9682 by @tl-hbk, PR #9870 by @pdecat)
18+
- Fix additionalProperties handling for strict mode compatibility, resolving schema validation issues with certain providers (PR #10210 by @daniel-lxs)
19+
20+
## [3.36.14] - 2025-12-18
21+
22+
![3.36.14 Release - Native Tool Calling for Claude on Vertex AI](/releases/3.36.14-release.png)
23+
24+
- Add native tool calling support for Claude models on Vertex AI, enabling more efficient and reliable tool interactions (PR #10197 by @hannesrudolph)
25+
- Fix JSON Schema format value stripping for OpenAI compatibility, resolving issues with unsupported format values (PR #10198 by @daniel-lxs)
26+
- Improve "no tools used" error handling with graceful retry mechanism for better reliability when tools fail to execute (PR #10196 by @hannesrudolph)
27+
28+
## [3.36.13] - 2025-12-18
29+
30+
![3.36.13 Release - Native Tool Protocol](/releases/3.36.13-release.png)
31+
32+
- Change default tool protocol from XML to native for improved reliability and performance (PR #10186 by @mrubens)
33+
- Add native tool support for VS Code Language Model API providers (PR #10191 by @daniel-lxs)
34+
- Lock task tool protocol for consistent task resumption, ensuring tasks resume with the same protocol they started with (PR #10192 by @daniel-lxs)
35+
- Replace edit_file tool alias with actual edit_file tool for improved diff editing capabilities (PR #9983 by @hannesrudolph)
36+
- Fix LiteLLM router models by merging default model info for native tool calling support (PR #10187 by @daniel-lxs)
37+
- Add PostHog exception tracking for consecutive mistake errors to improve error monitoring (PR #10193 by @daniel-lxs)
38+
39+
## [3.36.12] - 2025-12-18
40+
41+
![3.36.12 Release - Better telemetry and Bedrock fixes](/releases/3.36.12-release.png)
42+
43+
- Fix: Add userAgentAppId to Bedrock embedder for code indexing (#10165 by @jackrein, PR #10166 by @roomote)
44+
- Update OpenAI and Gemini tool preferences for improved model behavior (PR #10170 by @hannesrudolph)
45+
- Extract error messages from JSON payloads for better PostHog error grouping (PR #10163 by @daniel-lxs)
46+
47+
## [3.36.11] - 2025-12-17
48+
49+
![3.36.11 Release - Native Tool Calling Enhancements](/releases/3.36.11-release.png)
50+
51+
- Add support for Claude Code Provider native tool calling, improving tool execution performance and reliability (PR #10077 by @hannesrudolph)
52+
- Enable native tool calling by default for Z.ai models for better model compatibility (PR #10158 by @app/roomote)
53+
- Enable native tools by default for OpenAI compatible provider to improve tool calling support (PR #10159 by @daniel-lxs)
54+
- Fix: Normalize MCP tool schemas for Bedrock and OpenAI strict mode to ensure proper tool compatibility (PR #10148 by @daniel-lxs)
55+
- Fix: Remove dots and colons from MCP tool names for Bedrock compatibility (PR #10152 by @daniel-lxs)
56+
- Fix: Convert tool_result to XML text when native tools disabled for Bedrock (PR #10155 by @daniel-lxs)
57+
- Fix: Refresh Roo models cache with session token on auth state change to resolve model list refresh issues (PR #10156 by @daniel-lxs)
58+
- Fix: Support AWS GovCloud and China region ARNs in Bedrock provider for expanded regional support (PR #10157 by @app/roomote)
59+
60+
## [3.36.10] - 2025-12-17
61+
62+
![3.36.10 Release - Gemini 3 Flash Preview](/releases/3.36.10-release.png)
63+
64+
- Add support for Gemini 3 Flash Preview model in the Gemini provider (PR #10151 by @hannesrudolph)
65+
- Implement interleaved thinking mode for DeepSeek Reasoner, enabling streaming reasoning output (PR #9969 by @hannesrudolph)
66+
- Fix: Preserve reasoning_content during tool call sequences in DeepSeek (PR #10141 by @hannesrudolph)
67+
- Fix: Correct token counting for context truncation display (PR #9961 by @hannesrudolph)
68+
- Update Next.js dependency to ~15.2.8 (PR #10140 by @jr)
69+
70+
## [3.36.9] - 2025-12-15
71+
72+
![3.36.9 Release - Cross-Provider Compatibility](/releases/3.36.9-release.png)
73+
74+
- Fix: Normalize tool call IDs for cross-provider compatibility via OpenRouter, ensuring consistent handling across different AI providers (PR #10102 by @daniel-lxs)
75+
- Fix: Add additionalProperties: false to nested MCP tool schemas, improving schema validation and preventing unexpected properties (PR #10109 by @daniel-lxs)
76+
- Fix: Validate tool_result IDs in delegation resume flow, preventing errors when resuming delegated tasks (PR #10135 by @daniel-lxs)
77+
- Feat: Add full error details to streaming failure dialog, providing more comprehensive information for debugging streaming issues (PR #10131 by @roomote)
78+
- Feat: Improve evals UI with tool groups and duration fix, enhancing the evaluation interface organization and timing accuracy (PR #10133 by @hannesrudolph)
79+
80+
## [3.36.8] - 2025-12-16
81+
82+
![3.36.8 Release - Native Tools Enabled by Default](/releases/3.36.8-release.png)
83+
84+
- Implement incremental token-budgeted file reading for smarter, more efficient file content retrieval (PR #10052 by @jr)
85+
- Enable native tools by default for multiple providers including OpenAI, Azure, Google, Vertex, and more (PR #10059 by @daniel-lxs)
86+
- Enable native tools by default for Anthropic and add telemetry tracking for tool format usage (PR #10021 by @daniel-lxs)
87+
- Fix: Prevent race condition from deleting wrong API messages during streaming (PR #10113 by @hannesrudolph)
88+
- Fix: Prevent duplicate MCP tools error by deduplicating servers at source (PR #10096 by @daniel-lxs)
89+
- Remove strict ARN validation for Bedrock custom ARN users allowing more flexibility (#10108 by @wisestmumbler, PR #10110 by @roomote)
90+
- Add metadata to error details dialog for improved debugging (PR #10050 by @roomote)
91+
- Add configuration to control public sharing feature (PR #10105 by @mrubens)
92+
- Remove description from Bedrock service tiers for cleaner UI (PR #10118 by @mrubens)
93+
- Fix: Correct link to provider pricing page on web (PR #10107 by @brunobergher)
94+
395
## [3.36.7] - 2025-12-15
496

597
- Improve tool configuration for OpenAI models in OpenRouter (PR #10082 by @hannesrudolph)

apps/web-evals/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
"cmdk": "^1.1.0",
3636
"fuzzysort": "^3.1.0",
3737
"lucide-react": "^0.518.0",
38-
"next": "~15.2.6",
38+
"next": "~15.2.8",
3939
"next-themes": "^0.4.6",
4040
"p-map": "^7.0.3",
4141
"react": "^18.3.1",

apps/web-evals/src/app/runs/[id]/run.tsx

Lines changed: 40 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,15 @@ export function Run({ run }: { run: Run }) {
321321
void usageUpdatedAt
322322
const metrics: Record<number, TaskMetrics> = {}
323323

324+
// Helper to calculate duration from database timestamps when streaming duration
325+
// is unavailable (e.g., page was loaded after TaskStarted event was published)
326+
const calculateDurationFromTimestamps = (task: TaskWithMetrics): number => {
327+
if (!task.startedAt) return 0
328+
const startTime = new Date(task.startedAt).getTime()
329+
const endTime = task.finishedAt ? new Date(task.finishedAt).getTime() : Date.now()
330+
return endTime - startTime
331+
}
332+
324333
tasks?.forEach((task) => {
325334
const streamingUsage = tokenUsage.get(task.id)
326335
const dbMetrics = task.taskMetrics
@@ -331,26 +340,54 @@ export function Run({ run }: { run: Run }) {
331340
// Check if DB metrics have meaningful values (not just default/empty)
332341
const dbHasData = dbMetrics && (dbMetrics.tokensIn > 0 || dbMetrics.tokensOut > 0 || dbMetrics.cost > 0)
333342
if (dbHasData) {
334-
metrics[task.id] = dbMetrics
343+
// If DB duration is 0 but we have timestamps, calculate from timestamps
344+
const duration = dbMetrics.duration || calculateDurationFromTimestamps(task)
345+
metrics[task.id] = { ...dbMetrics, duration }
335346
} else if (streamingUsage) {
336347
// Fall back to streaming values if DB is empty/stale
348+
// Use streaming duration, or calculate from timestamps if not available
349+
const duration = streamingUsage.duration || calculateDurationFromTimestamps(task)
337350
metrics[task.id] = {
338351
tokensIn: streamingUsage.totalTokensIn,
339352
tokensOut: streamingUsage.totalTokensOut,
340353
tokensContext: streamingUsage.contextTokens,
341-
duration: streamingUsage.duration ?? 0,
354+
duration,
342355
cost: streamingUsage.totalCost,
343356
}
357+
} else {
358+
// Task finished but no DB metrics and no streaming data
359+
// (e.g., page loaded after task completed, metrics not persisted)
360+
// Still provide duration calculated from timestamps
361+
metrics[task.id] = {
362+
tokensIn: 0,
363+
tokensOut: 0,
364+
tokensContext: 0,
365+
duration: calculateDurationFromTimestamps(task),
366+
cost: 0,
367+
}
344368
}
345369
} else if (streamingUsage) {
346370
// For running tasks, use streaming values
371+
// Use streaming duration, or calculate from task.startedAt if not available
372+
// (happens when page loads after TaskStarted event was already published)
373+
const duration = streamingUsage.duration || calculateDurationFromTimestamps(task)
347374
metrics[task.id] = {
348375
tokensIn: streamingUsage.totalTokensIn,
349376
tokensOut: streamingUsage.totalTokensOut,
350377
tokensContext: streamingUsage.contextTokens,
351-
duration: streamingUsage.duration ?? 0,
378+
duration,
352379
cost: streamingUsage.totalCost,
353380
}
381+
} else if (task.startedAt) {
382+
// Task has started (has startedAt in DB) but no streaming data yet
383+
// This can happen when page loads after TaskStarted but before TokenUsageUpdated
384+
metrics[task.id] = {
385+
tokensIn: 0,
386+
tokensOut: 0,
387+
tokensContext: 0,
388+
duration: calculateDurationFromTimestamps(task),
389+
cost: 0,
390+
}
354391
}
355392
})
356393

apps/web-evals/src/components/home/run.tsx

Lines changed: 71 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -44,14 +44,22 @@ import {
4444
ScrollArea,
4545
} from "@/components/ui"
4646

47+
// Tool group type (same as in runs.tsx)
48+
type ToolGroup = {
49+
id: string
50+
name: string
51+
icon: string
52+
tools: string[]
53+
}
54+
4755
type RunProps = {
4856
run: EvalsRun
4957
taskMetrics: EvalsTaskMetrics | null
5058
toolColumns: ToolName[]
51-
consolidatedToolColumns: string[]
59+
toolGroups: ToolGroup[]
5260
}
5361

54-
export function Run({ run, taskMetrics, toolColumns, consolidatedToolColumns }: RunProps) {
62+
export function Run({ run, taskMetrics, toolColumns, toolGroups }: RunProps) {
5563
const router = useRouter()
5664
const [deleteRunId, setDeleteRunId] = useState<number>()
5765
const [showSettings, setShowSettings] = useState(false)
@@ -143,6 +151,62 @@ export function Run({ run, taskMetrics, toolColumns, consolidatedToolColumns }:
143151
[router, run.id],
144152
)
145153

154+
// Helper to render a tool group cell
155+
const renderToolGroupCell = (group: ToolGroup) => {
156+
if (!taskMetrics?.toolUsage) {
157+
return <span className="text-muted-foreground">-</span>
158+
}
159+
160+
let totalAttempts = 0
161+
let totalFailures = 0
162+
const breakdown: Array<{ tool: string; attempts: number; rate: string }> = []
163+
164+
for (const toolName of group.tools) {
165+
const usage = taskMetrics.toolUsage[toolName as ToolName]
166+
if (usage) {
167+
totalAttempts += usage.attempts
168+
totalFailures += usage.failures
169+
const rate =
170+
usage.attempts > 0
171+
? `${Math.round(((usage.attempts - usage.failures) / usage.attempts) * 100)}%`
172+
: "0%"
173+
breakdown.push({ tool: toolName, attempts: usage.attempts, rate })
174+
}
175+
}
176+
177+
if (totalAttempts === 0) {
178+
return <span className="text-muted-foreground">-</span>
179+
}
180+
181+
const successRate = ((totalAttempts - totalFailures) / totalAttempts) * 100
182+
const rateColor =
183+
successRate === 100 ? "text-muted-foreground" : successRate >= 80 ? "text-yellow-500" : "text-red-500"
184+
185+
return (
186+
<Tooltip>
187+
<TooltipTrigger>
188+
<div className="flex flex-col items-center">
189+
<span className="font-medium">{totalAttempts}</span>
190+
<span className={rateColor}>{Math.round(successRate)}%</span>
191+
</div>
192+
</TooltipTrigger>
193+
<TooltipContent>
194+
<div className="text-xs">
195+
<div className="font-semibold mb-1">{group.name}</div>
196+
{breakdown.map(({ tool, attempts, rate }) => (
197+
<div key={tool} className="flex justify-between gap-4">
198+
<span>{tool}:</span>
199+
<span>
200+
{attempts} ({rate})
201+
</span>
202+
</div>
203+
))}
204+
</div>
205+
</TooltipContent>
206+
</Tooltip>
207+
)
208+
}
209+
146210
return (
147211
<>
148212
<TableRow className="cursor-pointer hover:bg-muted/50" onClick={handleRowClick}>
@@ -170,68 +234,12 @@ export function Run({ run, taskMetrics, toolColumns, consolidatedToolColumns }:
170234
</div>
171235
)}
172236
</TableCell>
173-
{consolidatedToolColumns.length > 0 && (
174-
<TableCell className="text-xs text-center">
175-
{taskMetrics?.toolUsage ? (
176-
(() => {
177-
// Calculate aggregated stats for consolidated tools
178-
let totalAttempts = 0
179-
let totalFailures = 0
180-
const breakdown: Array<{ tool: string; attempts: number; rate: string }> = []
181-
182-
for (const toolName of consolidatedToolColumns) {
183-
const usage = taskMetrics.toolUsage[toolName as ToolName]
184-
if (usage) {
185-
totalAttempts += usage.attempts
186-
totalFailures += usage.failures
187-
const rate =
188-
usage.attempts > 0
189-
? `${Math.round(((usage.attempts - usage.failures) / usage.attempts) * 100)}%`
190-
: "0%"
191-
breakdown.push({ tool: toolName, attempts: usage.attempts, rate })
192-
}
193-
}
194-
195-
const consolidatedRate =
196-
totalAttempts > 0 ? ((totalAttempts - totalFailures) / totalAttempts) * 100 : 100
197-
const rateColor =
198-
consolidatedRate === 100
199-
? "text-muted-foreground"
200-
: consolidatedRate >= 80
201-
? "text-yellow-500"
202-
: "text-red-500"
203-
204-
return totalAttempts > 0 ? (
205-
<Tooltip>
206-
<TooltipTrigger>
207-
<div className="flex flex-col items-center">
208-
<span className="font-medium">{totalAttempts}</span>
209-
<span className={rateColor}>{Math.round(consolidatedRate)}%</span>
210-
</div>
211-
</TooltipTrigger>
212-
<TooltipContent>
213-
<div className="text-xs">
214-
<div className="font-semibold mb-1">Consolidated Tools:</div>
215-
{breakdown.map(({ tool, attempts, rate }) => (
216-
<div key={tool} className="flex justify-between gap-4">
217-
<span>{tool}:</span>
218-
<span>
219-
{attempts} ({rate})
220-
</span>
221-
</div>
222-
))}
223-
</div>
224-
</TooltipContent>
225-
</Tooltip>
226-
) : (
227-
<span className="text-muted-foreground">-</span>
228-
)
229-
})()
230-
) : (
231-
<span className="text-muted-foreground">-</span>
232-
)}
237+
{/* Tool Group Columns */}
238+
{toolGroups.map((group) => (
239+
<TableCell key={group.id} className="text-xs text-center">
240+
{renderToolGroupCell(group)}
233241
</TableCell>
234-
)}
242+
))}
235243
{toolColumns.map((toolName) => {
236244
const usage = taskMetrics?.toolUsage?.[toolName]
237245
const successRate =

0 commit comments

Comments
 (0)