Skip to content

Conversation

@miguelg719
Copy link
Collaborator

@miguelg719 miguelg719 commented Oct 7, 2025

why

Adding support for Gemini's new Computer Use model

what changed

We partnered with Google Deepmind to help integrate and test their new Computer Use models.

Screenshot 2025-10-07 at 1 14 44 PM

The new model tag gemini-2.5-pro-computer-use-preview-10-2025 is available for Stagehand Agent. You can try it today with the example cua-example.ts

To learn more, check out the blog post https://www.browserbase.com/blog/evaluating-browser-agents

@changeset-bot
Copy link

changeset-bot bot commented Oct 7, 2025

🦋 Changeset detected

Latest commit: b60bd28

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR introduces comprehensive support for Google's Computer Use Assistant (CUA) API as a third agent provider alongside existing OpenAI and Anthropic support. The changes enable users to leverage Google's Gemini models (specifically `computer-use-preview-10-2025`) for browser automation within the Stagehand framework.

The integration follows the established pattern used for other providers:

  • A new GoogleCUAClient class that implements the AgentClient interface
  • Google-specific prompt generation via buildGoogleCUASystemPrompt
  • Type system updates to include 'google' as a valid provider
  • Extended image compression utilities to handle Google's unique message format
  • Updated AgentProvider factory to instantiate Google clients

Additionally, the PR standardizes viewport dimensions across all CUA clients to 1288x711 (from 1024x768) and upgrades the @google/genai dependency to version 1.22.0 to use Google's unified SDK. The cuaAgentHandler has been optimized by removing visual feedback features (cursor animations) to improve performance, and the CUA example has been updated to demonstrate Google integration.

The implementation includes sophisticated coordinate normalization (Google uses 0-1000 range vs actual pixels), retry logic with exponential backoff, conversation history management with image compression, and comprehensive function call mapping to handle Google's specific CUA API structure.

PR Description Notes:

  • The PR description is incomplete, containing only empty section headers for "why", "what changed", and "test plan"

Important Files Changed

Changed Files
Filename Score Overview
lib/agent/GoogleCUAClient.ts 4/5 New Google CUA client implementation with coordinate normalization and retry logic
lib/agent/AgentProvider.ts 4/5 Added Google provider support to agent factory with model mapping
lib/agent/utils/imageCompression.ts 4/5 Extended image compression to support OpenAI and Google message formats
lib/handlers/cuaAgentHandler.ts 4/5 Removed visual feedback features and optimized timing for better performance
lib/prompt.ts 5/5 Added Google-specific system prompt generation function
package.json 3/5 Major dependency upgrade from @google/genai ^0.8.0 to ^1.22.0
lib/index.ts 4/5 Updated default viewport and added GEMINI_API_KEY fallback support
types/agent.ts 5/5 Added 'google' to agent provider type definitions
lib/agent/OpenAICUAClient.ts 4/5 Standardized viewport dimensions to match other clients
lib/agent/AnthropicCUAClient.ts 5/5 Updated viewport dimensions for consistency across providers
examples/cua-example.ts 4/5 Updated example to demonstrate Google CUA integration

Confidence score: 3/5

  • This PR requires careful review due to the major dependency upgrade and new provider integration
  • Score reflects the significant @google/genai version jump and complex coordinate normalization logic that could introduce subtle bugs
  • Pay close attention to package.json, GoogleCUAClient.ts, and imageCompression.ts for potential compatibility issues

Sequence Diagram

sequenceDiagram
    participant User
    participant CuaExample as cua-example.ts
    participant Stagehand
    participant AgentProvider
    participant GoogleCUAClient as GoogleCUAClient
    participant Page as Browser Page
    
    User->>CuaExample: "Run CUA example"
    CuaExample->>Stagehand: "new Stagehand(StagehandConfig)"
    CuaExample->>Stagehand: "await stagehand.init()"
    Stagehand-->>CuaExample: "Stagehand instance ready"
    
    CuaExample->>Stagehand: "stagehand.agent({provider: 'google', model: 'computer-use-preview-10-2025', ...})"
    Stagehand->>AgentProvider: "getClient(modelName, clientOptions, instructions)"
    AgentProvider->>GoogleCUAClient: "new GoogleCUAClient(type, modelName, instructions, clientOptions)"
    GoogleCUAClient-->>AgentProvider: "Client instance"
    AgentProvider-->>Stagehand: "Agent client"
    Stagehand-->>CuaExample: "Agent instance with execute method"
    
    CuaExample->>Page: "await page.goto('https://www.browserbase.com/careers')"
    Page-->>CuaExample: "Navigation complete"
    
    CuaExample->>Stagehand: "await agent.execute({instruction: 'Apply for the first engineer position...', maxSteps: 20})"
    Stagehand->>GoogleCUAClient: "execute(executionOptions)"
    
    loop Agent Execution Steps (up to maxSteps)
        GoogleCUAClient->>GoogleCUAClient: "executeStep(logger)"
        GoogleCUAClient->>GoogleCUAClient: "Generate content request to Google API"
        GoogleCUAClient->>GoogleCUAClient: "processResponse(response)"
        GoogleCUAClient->>GoogleCUAClient: "convertFunctionCallToAction(functionCall)"
        GoogleCUAClient->>Page: "Execute action (click, type, scroll, etc.)"
        Page-->>GoogleCUAClient: "Action completed"
        GoogleCUAClient->>Page: "Take screenshot"
        Page-->>GoogleCUAClient: "Screenshot captured"
        GoogleCUAClient->>GoogleCUAClient: "Add function response to history"
    end
    
    GoogleCUAClient-->>Stagehand: "AgentResult with actions, usage, completion status"
    Stagehand-->>CuaExample: "Execution result"
    CuaExample->>CuaExample: "Log result to console"
    CuaExample->>Stagehand: "await stagehand.close()"
    Stagehand-->>CuaExample: "Browser closed"
Loading

11 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

modelName,
userProvidedInstructions,
clientOptions,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: GoogleCUAClient constructor is missing the tools parameter that OpenAI and Anthropic clients receive

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/agent/AgentProvider.ts
Line: 75:75

Comment:
**logic:** GoogleCUAClient constructor is missing the `tools` parameter that OpenAI and Anthropic clients receive

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +245 to +249
const data = part.functionResponse.response
.data as FunctionResponseData[];
const hasImage = data.some((dataItem) =>
dataItem.inlineData?.mimeType?.startsWith("image/"),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Complex nested type assertion pattern is repeated. Consider extracting to a type guard function for better maintainability.

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/agent/utils/imageCompression.ts
Line: 245:249

Comment:
**style:** Complex nested type assertion pattern is repeated. Consider extracting to a type guard function for better maintainability.

How can I resolve this? If you propose a fix, please make it concise.

...part,
functionResponse: {
...part.functionResponse,
data: [] as FunctionResponseData[],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Setting data to empty array while adding 'compressed' property creates inconsistent state - the original data structure is modified in unexpected ways.

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/agent/utils/imageCompression.ts
Line: 255:255

Comment:
**logic:** Setting data to empty array while adding 'compressed' property creates inconsistent state - the original data structure is modified in unexpected ways.

How can I resolve this? If you propose a fix, please make it concise.

let currentStep = 0;
let completed = false;
const actions: AgentAction[] = [];
const messageList: string[] = [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: messageList is declared but never used - consider removing

Suggested change
const messageList: string[] = [];
const actions: AgentAction[] = [];
let finalMessage = "";
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/agent/GoogleCUAClient.ts
Line: 121:121

Comment:
**style:** messageList is declared but never used - consider removing

```suggestion
    const actions: AgentAction[] = [];
    let finalMessage = "";
```

How can I resolve this? If you propose a fix, please make it concise.

}

default:
console.warn(`Unsupported Google CUA function: ${name}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Using console.warn instead of the logger parameter - should use logger for consistency

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/agent/GoogleCUAClient.ts
Line: 778:778

Comment:
**style:** Using console.warn instead of the logger parameter - should use logger for consistency

How can I resolve this? If you propose a fix, please make it concise.

@miguelg719 miguelg719 changed the title cheeta updates 1 Support for new Gemini Computer Use Models Oct 7, 2025
@miguelg719 miguelg719 merged commit dda52f1 into main Oct 7, 2025
20 of 28 checks passed
@github-actions github-actions bot mentioned this pull request Oct 7, 2025
miguelg719 pushed a commit that referenced this pull request Oct 7, 2025
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/stagehand@2.5.1

### Patch Changes

- [#1082](#1082)
[`8c0fd01`](8c0fd01)
Thanks [@tkattkat](https://github.com/tkattkat)! - Pass stagehand object
to agent instead of stagehand page

- [#1104](#1104)
[`a1ad06c`](a1ad06c)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix logging for
stagehand agent

- [#1066](#1066)
[`9daa584`](9daa584)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add playwright
arguments to agent execute response

- [#1077](#1077)
[`7f38b3a`](7f38b3a)
Thanks [@tkattkat](https://github.com/tkattkat)! - adds support for
stagehand agent in the api

- [#1032](#1032)
[`bf2d0e7`](bf2d0e7)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix for zod peer
dependency support

- [#1014](#1014)
[`6966201`](6966201)
Thanks [@tkattkat](https://github.com/tkattkat)! - Replace operator
handler with base of new agent

- [#1089](#1089)
[`536f366`](536f366)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fixed info logs
on api session create

- [#1103](#1103)
[`889cb6c`](889cb6c)
Thanks [@tkattkat](https://github.com/tkattkat)! - patch custom tool
support in anthropic cua client

- [#1056](#1056)
[`6a002b2`](6a002b2)
Thanks [@chrisreadsf](https://github.com/chrisreadsf)! - remove need for
duplicate project id if already passed to Stagehand

- [#1090](#1090)
[`8ff5c5a`](8ff5c5a)
Thanks [@miguelg719](https://github.com/miguelg719)! - Improve failed
act error logs

- [#1014](#1014)
[`6966201`](6966201)
Thanks [@tkattkat](https://github.com/tkattkat)! - replace operator
agent with scaffold for new stagehand agent

- [#1107](#1107)
[`3ccf335`](3ccf335)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: url
extraction not working inside an array

- [#1102](#1102)
[`a99aa48`](a99aa48)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add current page
and date context to agent

- [#1110](#1110)
[`dda52f1`](dda52f1)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add support for
new Gemini Computer Use models

## @browserbasehq/stagehand-evals@1.1.0

### Minor Changes

- [#1057](#1057)
[`b7be89e`](b7be89e)
Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - added
web voyager ground truth (optional), added web bench, and subset of
OSWorld evals which run on a browser

### Patch Changes

- [#1072](#1072)
[`dc2d420`](dc2d420)
Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - improve
evals screenshot service - add img hashing diff to add screenshots and
change to screenshot intercepts from the agent

- Updated dependencies
\[[`8c0fd01`](8c0fd01),
[`a1ad06c`](a1ad06c),
[`9daa584`](9daa584),
[`7f38b3a`](7f38b3a),
[`bf2d0e7`](bf2d0e7),
[`6966201`](6966201),
[`536f366`](536f366),
[`889cb6c`](889cb6c),
[`6a002b2`](6a002b2),
[`8ff5c5a`](8ff5c5a),
[`6966201`](6966201),
[`3ccf335`](3ccf335),
[`a99aa48`](a99aa48),
[`dda52f1`](dda52f1)]:
    -   @browserbasehq/stagehand@2.5.1

## @browserbasehq/stagehand-examples@1.0.10

### Patch Changes

- Updated dependencies
\[[`8c0fd01`](8c0fd01),
[`a1ad06c`](a1ad06c),
[`9daa584`](9daa584),
[`7f38b3a`](7f38b3a),
[`bf2d0e7`](bf2d0e7),
[`6966201`](6966201),
[`536f366`](536f366),
[`889cb6c`](889cb6c),
[`6a002b2`](6a002b2),
[`8ff5c5a`](8ff5c5a),
[`6966201`](6966201),
[`3ccf335`](3ccf335),
[`a99aa48`](a99aa48),
[`dda52f1`](dda52f1)]:
    -   @browserbasehq/stagehand@2.5.1

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants