-
Notifications
You must be signed in to change notification settings - Fork 1.2k
replace operator agent with base of new agent #1014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
🦋 Changeset detectedLatest commit: ed42209 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements a major architectural refactor of the agent system, replacing the entire "operator agent" implementation with a new AI SDK-based agent architecture. The changes include:
Core Architecture Changes:
- Removed
StagehandOperatorHandlerandtypes/operator.tsentirely - Introduced
StagehandAgentHandleras the new default agent implementation - Renamed the existing agent handler to
cuaAgentHandler(Computer Use Agent) for provider-specific execution - Updated the main library exports to use the new handlers while maintaining API compatibility
New Tool System:
The PR introduces a complete tool ecosystem under lib/agent/tools/ with 11 standardized tools that wrap existing Stagehand functionality:
act.ts- Web element interaction with observe-then-act patternariaTree.ts- Accessibility tree extraction for page contextclose.ts- Task completion signalingextract.ts- Data extraction from pagesfillform.ts- Optimized multi-field form fillinggoto.ts- URL navigationnavback.ts- Browser history navigationscreenshot.ts- JPEG screenshot capture with compressionscroll.ts- Page scrolling functionalitywait.ts- Time-based delaysindex.ts- Centralized tool factory function
Implementation Details:
- All tools use AI SDK's
tool()function with Zod schema validation - The new
StagehandAgentHandlerleverages AI SDK'sgenerateTextwith built-in tool calling - Added message processing utilities in
messageProcessing.tsfor context compression - Updated LLM client interface with
getLanguageModel()getter for AI SDK integration - Fixed minor issues like grammar corrections in evaluation tasks
The refactor maintains backward compatibility through the same public API while completely overhauling the internal agent execution model from custom schema-based responses to standardized AI SDK tool calling patterns.
Confidence score: 2/5
- This PR introduces significant architectural changes that could destabilize the agent system due to the complete replacement of core functionality
- Score reflects the massive scope of changes, removal of entire systems, and potential integration issues with the new AI SDK dependency requirements
- Pay close attention to
lib/handlers/stagehandAgentHandler.ts,lib/agent/tools/act.ts,lib/agent/tools/fillform.ts, andlib/handlers/cuaAgentHandler.ts
20 files reviewed, 12 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This review covers only the changes made since the last review (commit 80cb25b), not the entire PR.
The latest changes implement the final pieces of the agent architecture refactor, completing the replacement of the operator agent with a new dual-agent system. The key additions include:
-
Agent Tool Interface Standardization: The
createAgentToolsfunction now accepts an optionalAgentToolOptionsinterface with anexecutionModelparameter, providing a unified way to configure tool behavior across the agent system. -
Execution Model Support: Multiple tool files (
act.ts,extract.ts,fillform.ts) now support an optionalexecutionModelparameter that allows different models to be used for tool execution versus agent reasoning. When provided, this model is passed topage.observe()andpage.extract()operations. -
Type System Enhancement: The
AgentConfiginterface intypes/stagehand.tsnow includes an optionalexecutionModelfield with clear documentation about its format ("provider/model") and purpose for tool execution optimization. -
Agent Handler Architecture: Two new handler classes have been introduced:
StagehandAgentHandler: A new AISDK-based agent handler that serves as the default agent implementation with comprehensive error handling and step trackingCuaAgentHandler: A Computer Use Agent handler for advanced visual browser automation with providers like OpenAI and Anthropic
-
Main Library Integration: The
lib/index.tsfile has been updated to use class-based agent handlers instead of function-based ones, with the newStagehandAgentHandlerbecoming the default while maintainingCuaAgentHandlerfor advanced use cases.
This refactor enables more flexible model selection where users can specify different models for high-level reasoning versus tool execution, potentially optimizing for cost and performance by using faster models for routine operations while reserving powerful models for complex tasks.
Confidence score: 3/5
- This PR introduces significant architectural changes that require careful testing to ensure compatibility
- Score reflects the complexity of the agent system refactor and potential for integration issues
- Pay close attention to the dynamic schema evaluation in extract.ts and error handling patterns across tool files
Context used:
Rule - Use camelCase naming convention for TypeScript code and snake_case naming convention for Python code in documentation examples. (link)
Context - We enforce linting and prettier at the CI level, so no code style comments that aren't obvious. (link)
10 files reviewed, no comments
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/stagehand@2.5.1 ### Patch Changes - [#1082](#1082) [`8c0fd01`](8c0fd01) Thanks [@tkattkat](https://github.com/tkattkat)! - Pass stagehand object to agent instead of stagehand page - [#1104](#1104) [`a1ad06c`](a1ad06c) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix logging for stagehand agent - [#1066](#1066) [`9daa584`](9daa584) Thanks [@tkattkat](https://github.com/tkattkat)! - Add playwright arguments to agent execute response - [#1077](#1077) [`7f38b3a`](7f38b3a) Thanks [@tkattkat](https://github.com/tkattkat)! - adds support for stagehand agent in the api - [#1032](#1032) [`bf2d0e7`](bf2d0e7) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix for zod peer dependency support - [#1014](#1014) [`6966201`](6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - Replace operator handler with base of new agent - [#1089](#1089) [`536f366`](536f366) Thanks [@miguelg719](https://github.com/miguelg719)! - Fixed info logs on api session create - [#1103](#1103) [`889cb6c`](889cb6c) Thanks [@tkattkat](https://github.com/tkattkat)! - patch custom tool support in anthropic cua client - [#1056](#1056) [`6a002b2`](6a002b2) Thanks [@chrisreadsf](https://github.com/chrisreadsf)! - remove need for duplicate project id if already passed to Stagehand - [#1090](#1090) [`8ff5c5a`](8ff5c5a) Thanks [@miguelg719](https://github.com/miguelg719)! - Improve failed act error logs - [#1014](#1014) [`6966201`](6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - replace operator agent with scaffold for new stagehand agent - [#1107](#1107) [`3ccf335`](3ccf335) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: url extraction not working inside an array - [#1102](#1102) [`a99aa48`](a99aa48) Thanks [@miguelg719](https://github.com/miguelg719)! - Add current page and date context to agent - [#1110](#1110) [`dda52f1`](dda52f1) Thanks [@miguelg719](https://github.com/miguelg719)! - Add support for new Gemini Computer Use models ## @browserbasehq/stagehand-evals@1.1.0 ### Minor Changes - [#1057](#1057) [`b7be89e`](b7be89e) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - added web voyager ground truth (optional), added web bench, and subset of OSWorld evals which run on a browser ### Patch Changes - [#1072](#1072) [`dc2d420`](dc2d420) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - improve evals screenshot service - add img hashing diff to add screenshots and change to screenshot intercepts from the agent - Updated dependencies \[[`8c0fd01`](8c0fd01), [`a1ad06c`](a1ad06c), [`9daa584`](9daa584), [`7f38b3a`](7f38b3a), [`bf2d0e7`](bf2d0e7), [`6966201`](6966201), [`536f366`](536f366), [`889cb6c`](889cb6c), [`6a002b2`](6a002b2), [`8ff5c5a`](8ff5c5a), [`6966201`](6966201), [`3ccf335`](3ccf335), [`a99aa48`](a99aa48), [`dda52f1`](dda52f1)]: - @browserbasehq/stagehand@2.5.1 ## @browserbasehq/stagehand-examples@1.0.10 ### Patch Changes - Updated dependencies \[[`8c0fd01`](8c0fd01), [`a1ad06c`](a1ad06c), [`9daa584`](9daa584), [`7f38b3a`](7f38b3a), [`bf2d0e7`](bf2d0e7), [`6966201`](6966201), [`536f366`](536f366), [`889cb6c`](889cb6c), [`6a002b2`](6a002b2), [`8ff5c5a`](8ff5c5a), [`6966201`](6966201), [`3ccf335`](3ccf335), [`a99aa48`](a99aa48), [`dda52f1`](dda52f1)]: - @browserbasehq/stagehand@2.5.1 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Why
Replace operator agent with new agent handler
The operator agent was an older implementation that did not use tool calling and used a single model for both high-level reasoning and low-level action execution.
What Changed
Removed operator agent (
StagehandOperatorHandler)Added new agent handler (
StagehandAgentHandler)executionModeloption for dual-model architectureExecutionModel feature:
act()andextract()Test Plan