Support for new Gemini Computer Use Models #1110

miguelg719 · 2025-10-07T16:27:29Z

why

Adding support for Gemini's new Computer Use model

what changed

We partnered with Google Deepmind to help integrate and test their new Computer Use models.

The new model tag gemini-2.5-pro-computer-use-preview-10-2025 is available for Stagehand Agent. You can try it today with the example cua-example.ts

To learn more, check out the blog post https://www.browserbase.com/blog/evaluating-browser-agents

changeset-bot · 2025-10-07T16:27:33Z

🦋 Changeset detected

Latest commit: b60bd28

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

greptile-apps

Greptile Overview

Summary

This PR introduces comprehensive support for Google's Computer Use Assistant (CUA) API as a third agent provider alongside existing OpenAI and Anthropic support. The changes enable users to leverage Google's Gemini models (specifically `computer-use-preview-10-2025`) for browser automation within the Stagehand framework.

The integration follows the established pattern used for other providers:

A new GoogleCUAClient class that implements the AgentClient interface
Google-specific prompt generation via buildGoogleCUASystemPrompt
Type system updates to include 'google' as a valid provider
Extended image compression utilities to handle Google's unique message format
Updated AgentProvider factory to instantiate Google clients

Additionally, the PR standardizes viewport dimensions across all CUA clients to 1288x711 (from 1024x768) and upgrades the @google/genai dependency to version 1.22.0 to use Google's unified SDK. The cuaAgentHandler has been optimized by removing visual feedback features (cursor animations) to improve performance, and the CUA example has been updated to demonstrate Google integration.

The implementation includes sophisticated coordinate normalization (Google uses 0-1000 range vs actual pixels), retry logic with exponential backoff, conversation history management with image compression, and comprehensive function call mapping to handle Google's specific CUA API structure.

PR Description Notes:

The PR description is incomplete, containing only empty section headers for "why", "what changed", and "test plan"

Important Files Changed

Changed Files

Filename	Score	Overview
lib/agent/GoogleCUAClient.ts	4/5	New Google CUA client implementation with coordinate normalization and retry logic
lib/agent/AgentProvider.ts	4/5	Added Google provider support to agent factory with model mapping
lib/agent/utils/imageCompression.ts	4/5	Extended image compression to support OpenAI and Google message formats
lib/handlers/cuaAgentHandler.ts	4/5	Removed visual feedback features and optimized timing for better performance
lib/prompt.ts	5/5	Added Google-specific system prompt generation function
package.json	3/5	Major dependency upgrade from @google/genai ^0.8.0 to ^1.22.0
lib/index.ts	4/5	Updated default viewport and added GEMINI_API_KEY fallback support
types/agent.ts	5/5	Added 'google' to agent provider type definitions
lib/agent/OpenAICUAClient.ts	4/5	Standardized viewport dimensions to match other clients
lib/agent/AnthropicCUAClient.ts	5/5	Updated viewport dimensions for consistency across providers
examples/cua-example.ts	4/5	Updated example to demonstrate Google CUA integration

Confidence score: 3/5

This PR requires careful review due to the major dependency upgrade and new provider integration
Score reflects the significant @google/genai version jump and complex coordinate normalization logic that could introduce subtle bugs
Pay close attention to package.json, GoogleCUAClient.ts, and imageCompression.ts for potential compatibility issues

Sequence Diagram

sequenceDiagram
    participant User
    participant CuaExample as cua-example.ts
    participant Stagehand
    participant AgentProvider
    participant GoogleCUAClient as GoogleCUAClient
    participant Page as Browser Page
    
    User->>CuaExample: "Run CUA example"
    CuaExample->>Stagehand: "new Stagehand(StagehandConfig)"
    CuaExample->>Stagehand: "await stagehand.init()"
    Stagehand-->>CuaExample: "Stagehand instance ready"
    
    CuaExample->>Stagehand: "stagehand.agent({provider: 'google', model: 'computer-use-preview-10-2025', ...})"
    Stagehand->>AgentProvider: "getClient(modelName, clientOptions, instructions)"
    AgentProvider->>GoogleCUAClient: "new GoogleCUAClient(type, modelName, instructions, clientOptions)"
    GoogleCUAClient-->>AgentProvider: "Client instance"
    AgentProvider-->>Stagehand: "Agent client"
    Stagehand-->>CuaExample: "Agent instance with execute method"
    
    CuaExample->>Page: "await page.goto('https://www.browserbase.com/careers')"
    Page-->>CuaExample: "Navigation complete"
    
    CuaExample->>Stagehand: "await agent.execute({instruction: 'Apply for the first engineer position...', maxSteps: 20})"
    Stagehand->>GoogleCUAClient: "execute(executionOptions)"
    
    loop Agent Execution Steps (up to maxSteps)
        GoogleCUAClient->>GoogleCUAClient: "executeStep(logger)"
        GoogleCUAClient->>GoogleCUAClient: "Generate content request to Google API"
        GoogleCUAClient->>GoogleCUAClient: "processResponse(response)"
        GoogleCUAClient->>GoogleCUAClient: "convertFunctionCallToAction(functionCall)"
        GoogleCUAClient->>Page: "Execute action (click, type, scroll, etc.)"
        Page-->>GoogleCUAClient: "Action completed"
        GoogleCUAClient->>Page: "Take screenshot"
        Page-->>GoogleCUAClient: "Screenshot captured"
        GoogleCUAClient->>GoogleCUAClient: "Add function response to history"
    end
    
    GoogleCUAClient-->>Stagehand: "AgentResult with actions, usage, completion status"
    Stagehand-->>CuaExample: "Execution result"
    CuaExample->>CuaExample: "Log result to console"
    CuaExample->>Stagehand: "await stagehand.close()"
    Stagehand-->>CuaExample: "Browser closed"

_{11 files reviewed, 6 comments}

_{Edit Code Review Agent Settings | Greptile}

lib/prompt.ts

greptile-apps · 2025-10-07T16:29:02Z

lib/agent/AgentProvider.ts

+            modelName,
+            userProvidedInstructions,
+            clientOptions,
+          );


logic: GoogleCUAClient constructor is missing the tools parameter that OpenAI and Anthropic clients receive

Prompt To Fix With AI

This is a comment left during a code review. Path: lib/agent/AgentProvider.ts Line: 75:75 Comment: **logic:** GoogleCUAClient constructor is missing the `tools` parameter that OpenAI and Anthropic clients receive How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-10-07T16:29:03Z

lib/agent/utils/imageCompression.ts

+          const data = part.functionResponse.response
+            .data as FunctionResponseData[];
+          const hasImage = data.some((dataItem) =>
+            dataItem.inlineData?.mimeType?.startsWith("image/"),
+          );


style: Complex nested type assertion pattern is repeated. Consider extracting to a type guard function for better maintainability.

Prompt To Fix With AI

This is a comment left during a code review. Path: lib/agent/utils/imageCompression.ts Line: 245:249 Comment: **style:** Complex nested type assertion pattern is repeated. Consider extracting to a type guard function for better maintainability. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-10-07T16:29:04Z

lib/agent/utils/imageCompression.ts

+              ...part,
+              functionResponse: {
+                ...part.functionResponse,
+                data: [] as FunctionResponseData[],


logic: Setting data to empty array while adding 'compressed' property creates inconsistent state - the original data structure is modified in unexpected ways.

Prompt To Fix With AI

This is a comment left during a code review. Path: lib/agent/utils/imageCompression.ts Line: 255:255 Comment: **logic:** Setting data to empty array while adding 'compressed' property creates inconsistent state - the original data structure is modified in unexpected ways. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-10-07T16:29:04Z

lib/agent/GoogleCUAClient.ts

+    let currentStep = 0;
+    let completed = false;
+    const actions: AgentAction[] = [];
+    const messageList: string[] = [];


style: messageList is declared but never used - consider removing

Suggested change

const messageList: string[] = [];

const actions: AgentAction[] = [];

let finalMessage = "";

Prompt To Fix With AI

This is a comment left during a code review. Path: lib/agent/GoogleCUAClient.ts Line: 121:121 Comment: **style:** messageList is declared but never used - consider removing ```suggestion const actions: AgentAction[] = []; let finalMessage = ""; ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-10-07T16:29:05Z

lib/agent/GoogleCUAClient.ts

+      }
+
+      default:
+        console.warn(`Unsupported Google CUA function: ${name}`);


style: Using console.warn instead of the logger parameter - should use logger for consistency

Prompt To Fix With AI

This is a comment left during a code review. Path: lib/agent/GoogleCUAClient.ts Line: 778:778 Comment: **style:** Using console.warn instead of the logger parameter - should use logger for consistency How can I resolve this? If you propose a fix, please make it concise.

…to cheetah

@tkattkat

This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/stagehand@2.5.1 ### Patch Changes - [#1082](#1082) [`8c0fd01`](8c0fd01) Thanks [@tkattkat](https://github.com/tkattkat)! - Pass stagehand object to agent instead of stagehand page - [#1104](#1104) [`a1ad06c`](a1ad06c) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix logging for stagehand agent - [#1066](#1066) [`9daa584`](9daa584) Thanks [@tkattkat](https://github.com/tkattkat)! - Add playwright arguments to agent execute response - [#1077](#1077) [`7f38b3a`](7f38b3a) Thanks [@tkattkat](https://github.com/tkattkat)! - adds support for stagehand agent in the api - [#1032](#1032) [`bf2d0e7`](bf2d0e7) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix for zod peer dependency support - [#1014](#1014) [`6966201`](6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - Replace operator handler with base of new agent - [#1089](#1089) [`536f366`](536f366) Thanks [@miguelg719](https://github.com/miguelg719)! - Fixed info logs on api session create - [#1103](#1103) [`889cb6c`](889cb6c) Thanks [@tkattkat](https://github.com/tkattkat)! - patch custom tool support in anthropic cua client - [#1056](#1056) [`6a002b2`](6a002b2) Thanks [@chrisreadsf](https://github.com/chrisreadsf)! - remove need for duplicate project id if already passed to Stagehand - [#1090](#1090) [`8ff5c5a`](8ff5c5a) Thanks [@miguelg719](https://github.com/miguelg719)! - Improve failed act error logs - [#1014](#1014) [`6966201`](6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - replace operator agent with scaffold for new stagehand agent - [#1107](#1107) [`3ccf335`](3ccf335) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: url extraction not working inside an array - [#1102](#1102) [`a99aa48`](a99aa48) Thanks [@miguelg719](https://github.com/miguelg719)! - Add current page and date context to agent - [#1110](#1110) [`dda52f1`](dda52f1) Thanks [@miguelg719](https://github.com/miguelg719)! - Add support for new Gemini Computer Use models ## @browserbasehq/stagehand-evals@1.1.0 ### Minor Changes - [#1057](#1057) [`b7be89e`](b7be89e) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - added web voyager ground truth (optional), added web bench, and subset of OSWorld evals which run on a browser ### Patch Changes - [#1072](#1072) [`dc2d420`](dc2d420) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - improve evals screenshot service - add img hashing diff to add screenshots and change to screenshot intercepts from the agent - Updated dependencies \[[`8c0fd01`](8c0fd01), [`a1ad06c`](a1ad06c), [`9daa584`](9daa584), [`7f38b3a`](7f38b3a), [`bf2d0e7`](bf2d0e7), [`6966201`](6966201), [`536f366`](536f366), [`889cb6c`](889cb6c), [`6a002b2`](6a002b2), [`8ff5c5a`](8ff5c5a), [`6966201`](6966201), [`3ccf335`](3ccf335), [`a99aa48`](a99aa48), [`dda52f1`](dda52f1)]: - @browserbasehq/stagehand@2.5.1 ## @browserbasehq/stagehand-examples@1.0.10 ### Patch Changes - Updated dependencies \[[`8c0fd01`](8c0fd01), [`a1ad06c`](a1ad06c), [`9daa584`](9daa584), [`7f38b3a`](7f38b3a), [`bf2d0e7`](bf2d0e7), [`6966201`](6966201), [`536f366`](536f366), [`889cb6c`](889cb6c), [`6a002b2`](6a002b2), [`8ff5c5a`](8ff5c5a), [`6966201`](6966201), [`3ccf335`](3ccf335), [`a99aa48`](a99aa48), [`dda52f1`](dda52f1)]: - @browserbasehq/stagehand@2.5.1 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

cheeta updates 1

3ded78e

greptile-apps bot reviewed Oct 7, 2025

View reviewed changes

tkattkat and others added 9 commits October 7, 2025 10:33

paramaterize mouse cursor

e42d3aa

add more agent logs to openai cua client

4f6d96c

fix env type

2fd73ed

update missing api key error message

451bfb8

update model name

664b0f3

updates

b5e1749

full text response in anthropic cua client

4204000

Update cua-example.ts

e3ee7dd

Update stagehand.config.ts

02f89a0

sameelarif approved these changes Oct 7, 2025

View reviewed changes

miguelg719 changed the title ~~cheeta updates 1~~ Support for new Gemini Computer Use Models Oct 7, 2025

miguelg719 and others added 3 commits October 7, 2025 13:11

changeset

a68f69c

Update prompt.ts

4b519fa

Merge branch 'cheetah' of https://github.com/browserbase/stagehand in…

b60bd28

…to cheetah

seanmcguire12 approved these changes Oct 7, 2025

View reviewed changes

miguelg719 merged commit dda52f1 into main Oct 7, 2025
20 of 28 checks passed

github-actions bot mentioned this pull request Oct 7, 2025

Version Packages #1062

Merged

github-actions bot mentioned this pull request Oct 7, 2025

Version Packages erickirt/stagehand#72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for new Gemini Computer Use Models #1110

Support for new Gemini Computer Use Models #1110

Uh oh!

miguelg719 commented Oct 7, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

greptile-apps bot Oct 7, 2025

Uh oh!

greptile-apps bot Oct 7, 2025

Uh oh!

greptile-apps bot Oct 7, 2025

Uh oh!

greptile-apps bot Oct 7, 2025

Uh oh!

greptile-apps bot Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	const messageList: string[] = [];
	const actions: AgentAction[] = [];
	let finalMessage = "";

Support for new Gemini Computer Use Models #1110

Support for new Gemini Computer Use Models #1110

Uh oh!

Conversation

miguelg719 commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

Uh oh!

changeset-bot bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Summary

Important Files Changed

Confidence score: 3/5

Sequence Diagram

Uh oh!

Uh oh!

greptile-apps bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

miguelg719 commented Oct 7, 2025 •

edited

Loading

changeset-bot bot commented Oct 7, 2025 •

edited

Loading