-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Support for new Gemini Computer Use Models #1110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: b60bd28 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR introduces comprehensive support for Google's Computer Use Assistant (CUA) API as a third agent provider alongside existing OpenAI and Anthropic support. The changes enable users to leverage Google's Gemini models (specifically `computer-use-preview-10-2025`) for browser automation within the Stagehand framework.The integration follows the established pattern used for other providers:
- A new
GoogleCUAClientclass that implements theAgentClientinterface - Google-specific prompt generation via
buildGoogleCUASystemPrompt - Type system updates to include 'google' as a valid provider
- Extended image compression utilities to handle Google's unique message format
- Updated
AgentProviderfactory to instantiate Google clients
Additionally, the PR standardizes viewport dimensions across all CUA clients to 1288x711 (from 1024x768) and upgrades the @google/genai dependency to version 1.22.0 to use Google's unified SDK. The cuaAgentHandler has been optimized by removing visual feedback features (cursor animations) to improve performance, and the CUA example has been updated to demonstrate Google integration.
The implementation includes sophisticated coordinate normalization (Google uses 0-1000 range vs actual pixels), retry logic with exponential backoff, conversation history management with image compression, and comprehensive function call mapping to handle Google's specific CUA API structure.
PR Description Notes:
- The PR description is incomplete, containing only empty section headers for "why", "what changed", and "test plan"
Important Files Changed
Changed Files
| Filename | Score | Overview |
|---|---|---|
| lib/agent/GoogleCUAClient.ts | 4/5 | New Google CUA client implementation with coordinate normalization and retry logic |
| lib/agent/AgentProvider.ts | 4/5 | Added Google provider support to agent factory with model mapping |
| lib/agent/utils/imageCompression.ts | 4/5 | Extended image compression to support OpenAI and Google message formats |
| lib/handlers/cuaAgentHandler.ts | 4/5 | Removed visual feedback features and optimized timing for better performance |
| lib/prompt.ts | 5/5 | Added Google-specific system prompt generation function |
| package.json | 3/5 | Major dependency upgrade from @google/genai ^0.8.0 to ^1.22.0 |
| lib/index.ts | 4/5 | Updated default viewport and added GEMINI_API_KEY fallback support |
| types/agent.ts | 5/5 | Added 'google' to agent provider type definitions |
| lib/agent/OpenAICUAClient.ts | 4/5 | Standardized viewport dimensions to match other clients |
| lib/agent/AnthropicCUAClient.ts | 5/5 | Updated viewport dimensions for consistency across providers |
| examples/cua-example.ts | 4/5 | Updated example to demonstrate Google CUA integration |
Confidence score: 3/5
- This PR requires careful review due to the major dependency upgrade and new provider integration
- Score reflects the significant @google/genai version jump and complex coordinate normalization logic that could introduce subtle bugs
- Pay close attention to package.json, GoogleCUAClient.ts, and imageCompression.ts for potential compatibility issues
Sequence Diagram
sequenceDiagram
participant User
participant CuaExample as cua-example.ts
participant Stagehand
participant AgentProvider
participant GoogleCUAClient as GoogleCUAClient
participant Page as Browser Page
User->>CuaExample: "Run CUA example"
CuaExample->>Stagehand: "new Stagehand(StagehandConfig)"
CuaExample->>Stagehand: "await stagehand.init()"
Stagehand-->>CuaExample: "Stagehand instance ready"
CuaExample->>Stagehand: "stagehand.agent({provider: 'google', model: 'computer-use-preview-10-2025', ...})"
Stagehand->>AgentProvider: "getClient(modelName, clientOptions, instructions)"
AgentProvider->>GoogleCUAClient: "new GoogleCUAClient(type, modelName, instructions, clientOptions)"
GoogleCUAClient-->>AgentProvider: "Client instance"
AgentProvider-->>Stagehand: "Agent client"
Stagehand-->>CuaExample: "Agent instance with execute method"
CuaExample->>Page: "await page.goto('https://www.browserbase.com/careers')"
Page-->>CuaExample: "Navigation complete"
CuaExample->>Stagehand: "await agent.execute({instruction: 'Apply for the first engineer position...', maxSteps: 20})"
Stagehand->>GoogleCUAClient: "execute(executionOptions)"
loop Agent Execution Steps (up to maxSteps)
GoogleCUAClient->>GoogleCUAClient: "executeStep(logger)"
GoogleCUAClient->>GoogleCUAClient: "Generate content request to Google API"
GoogleCUAClient->>GoogleCUAClient: "processResponse(response)"
GoogleCUAClient->>GoogleCUAClient: "convertFunctionCallToAction(functionCall)"
GoogleCUAClient->>Page: "Execute action (click, type, scroll, etc.)"
Page-->>GoogleCUAClient: "Action completed"
GoogleCUAClient->>Page: "Take screenshot"
Page-->>GoogleCUAClient: "Screenshot captured"
GoogleCUAClient->>GoogleCUAClient: "Add function response to history"
end
GoogleCUAClient-->>Stagehand: "AgentResult with actions, usage, completion status"
Stagehand-->>CuaExample: "Execution result"
CuaExample->>CuaExample: "Log result to console"
CuaExample->>Stagehand: "await stagehand.close()"
Stagehand-->>CuaExample: "Browser closed"
11 files reviewed, 6 comments
| modelName, | ||
| userProvidedInstructions, | ||
| clientOptions, | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: GoogleCUAClient constructor is missing the tools parameter that OpenAI and Anthropic clients receive
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/agent/AgentProvider.ts
Line: 75:75
Comment:
**logic:** GoogleCUAClient constructor is missing the `tools` parameter that OpenAI and Anthropic clients receive
How can I resolve this? If you propose a fix, please make it concise.| const data = part.functionResponse.response | ||
| .data as FunctionResponseData[]; | ||
| const hasImage = data.some((dataItem) => | ||
| dataItem.inlineData?.mimeType?.startsWith("image/"), | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Complex nested type assertion pattern is repeated. Consider extracting to a type guard function for better maintainability.
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/agent/utils/imageCompression.ts
Line: 245:249
Comment:
**style:** Complex nested type assertion pattern is repeated. Consider extracting to a type guard function for better maintainability.
How can I resolve this? If you propose a fix, please make it concise.| ...part, | ||
| functionResponse: { | ||
| ...part.functionResponse, | ||
| data: [] as FunctionResponseData[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Setting data to empty array while adding 'compressed' property creates inconsistent state - the original data structure is modified in unexpected ways.
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/agent/utils/imageCompression.ts
Line: 255:255
Comment:
**logic:** Setting data to empty array while adding 'compressed' property creates inconsistent state - the original data structure is modified in unexpected ways.
How can I resolve this? If you propose a fix, please make it concise.| let currentStep = 0; | ||
| let completed = false; | ||
| const actions: AgentAction[] = []; | ||
| const messageList: string[] = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: messageList is declared but never used - consider removing
| const messageList: string[] = []; | |
| const actions: AgentAction[] = []; | |
| let finalMessage = ""; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/agent/GoogleCUAClient.ts
Line: 121:121
Comment:
**style:** messageList is declared but never used - consider removing
```suggestion
const actions: AgentAction[] = [];
let finalMessage = "";
```
How can I resolve this? If you propose a fix, please make it concise.| } | ||
|
|
||
| default: | ||
| console.warn(`Unsupported Google CUA function: ${name}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Using console.warn instead of the logger parameter - should use logger for consistency
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/agent/GoogleCUAClient.ts
Line: 778:778
Comment:
**style:** Using console.warn instead of the logger parameter - should use logger for consistency
How can I resolve this? If you propose a fix, please make it concise.This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/stagehand@2.5.1 ### Patch Changes - [#1082](#1082) [`8c0fd01`](8c0fd01) Thanks [@tkattkat](https://github.com/tkattkat)! - Pass stagehand object to agent instead of stagehand page - [#1104](#1104) [`a1ad06c`](a1ad06c) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix logging for stagehand agent - [#1066](#1066) [`9daa584`](9daa584) Thanks [@tkattkat](https://github.com/tkattkat)! - Add playwright arguments to agent execute response - [#1077](#1077) [`7f38b3a`](7f38b3a) Thanks [@tkattkat](https://github.com/tkattkat)! - adds support for stagehand agent in the api - [#1032](#1032) [`bf2d0e7`](bf2d0e7) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix for zod peer dependency support - [#1014](#1014) [`6966201`](6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - Replace operator handler with base of new agent - [#1089](#1089) [`536f366`](536f366) Thanks [@miguelg719](https://github.com/miguelg719)! - Fixed info logs on api session create - [#1103](#1103) [`889cb6c`](889cb6c) Thanks [@tkattkat](https://github.com/tkattkat)! - patch custom tool support in anthropic cua client - [#1056](#1056) [`6a002b2`](6a002b2) Thanks [@chrisreadsf](https://github.com/chrisreadsf)! - remove need for duplicate project id if already passed to Stagehand - [#1090](#1090) [`8ff5c5a`](8ff5c5a) Thanks [@miguelg719](https://github.com/miguelg719)! - Improve failed act error logs - [#1014](#1014) [`6966201`](6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - replace operator agent with scaffold for new stagehand agent - [#1107](#1107) [`3ccf335`](3ccf335) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: url extraction not working inside an array - [#1102](#1102) [`a99aa48`](a99aa48) Thanks [@miguelg719](https://github.com/miguelg719)! - Add current page and date context to agent - [#1110](#1110) [`dda52f1`](dda52f1) Thanks [@miguelg719](https://github.com/miguelg719)! - Add support for new Gemini Computer Use models ## @browserbasehq/stagehand-evals@1.1.0 ### Minor Changes - [#1057](#1057) [`b7be89e`](b7be89e) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - added web voyager ground truth (optional), added web bench, and subset of OSWorld evals which run on a browser ### Patch Changes - [#1072](#1072) [`dc2d420`](dc2d420) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - improve evals screenshot service - add img hashing diff to add screenshots and change to screenshot intercepts from the agent - Updated dependencies \[[`8c0fd01`](8c0fd01), [`a1ad06c`](a1ad06c), [`9daa584`](9daa584), [`7f38b3a`](7f38b3a), [`bf2d0e7`](bf2d0e7), [`6966201`](6966201), [`536f366`](536f366), [`889cb6c`](889cb6c), [`6a002b2`](6a002b2), [`8ff5c5a`](8ff5c5a), [`6966201`](6966201), [`3ccf335`](3ccf335), [`a99aa48`](a99aa48), [`dda52f1`](dda52f1)]: - @browserbasehq/stagehand@2.5.1 ## @browserbasehq/stagehand-examples@1.0.10 ### Patch Changes - Updated dependencies \[[`8c0fd01`](8c0fd01), [`a1ad06c`](a1ad06c), [`9daa584`](9daa584), [`7f38b3a`](7f38b3a), [`bf2d0e7`](bf2d0e7), [`6966201`](6966201), [`536f366`](536f366), [`889cb6c`](889cb6c), [`6a002b2`](6a002b2), [`8ff5c5a`](8ff5c5a), [`6966201`](6966201), [`3ccf335`](3ccf335), [`a99aa48`](a99aa48), [`dda52f1`](dda52f1)]: - @browserbasehq/stagehand@2.5.1 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
why
Adding support for Gemini's new Computer Use model
what changed
We partnered with Google Deepmind to help integrate and test their new Computer Use models.
The new model tag
gemini-2.5-pro-computer-use-preview-10-2025is available for Stagehand Agent. You can try it today with the examplecua-example.tsTo learn more, check out the blog post https://www.browserbase.com/blog/evaluating-browser-agents