Skip to content

Computer Use Agent: Critical failures - invalid keys, crashes, false success reports #1122

@illia-fliplet

Description

@illia-fliplet

Before submitting an issue, please:

Environment Information

Stagehand:

  • Language/SDK: TypeScript
  • Stagehand version: Latest

AI Provider:

  • Provider: OpenAI, Google
  • Model: computer-use-preview-2025-03-11, gemini-2.5-computer-use-preview-10-2025

Additional Environment:

  • Node.js: Latest LTS
  • Operating System: Windows 11
  • Browser: Chromium (local and Browserbase cloud)

Issue Description

The Computer Use Agent in Stagehand has critical failures that make it unusable:

  1. Invalid Key Names: Generates keyboard key names that crash Playwright
  2. Browser Crashes: Closes browser/context during execution
  3. False Success Reports: Reports success for actions that fail
  4. Ignores Instructions: Disregards explicit instructions about keyboard usage

Steps to Reproduce

  1. Create a Computer Use Agent with explicit instructions
  2. Execute a simple task (onboarding/login or Google search)
  3. Observe agent reports success but browser shows crashes or invalid key errors
  4. Check console for invalid key name errors

Minimal Reproduction Code

import { test, expect } from '@playwright/test';
import { Stagehand } from '@browserbasehq/stagehand';

test('should complete onboarding and login process', async () => {
  const stagehand = new Stagehand({
    env: "LOCAL",
    modelName: "gpt-4o",
    modelClientOptions: { apiKey: process.env.OPENAI_API_KEY }
  });
  
  await stagehand.init();
  await stagehand.page.goto(process.env.BASE_URL);
  
  const agent = stagehand.agent({
    provider: "openai",
    model: "computer-use-preview-2025-03-11",
    instructions: "You are a web automation assistant. NEVER use keyboard navigation keys like ArrowRight, ArrowLeft, Tab, or Enter. ONLY use mouse clicks and typing in input fields.",
    options: { apiKey: process.env.OPENAI_API_KEY }
  });
  
  const task = `
1. Click "Explore More" button
2. Click "Continue" button  
3. Click "Continue" button
4. Click "Let's get started!" button
5. Enter email: ${process.env.ADMIN_EMAIL}
6. Enter password: ${process.env.ADMIN_PASSWORD}
7. Click login button

CRITICAL: NEVER use keyboard keys like ArrowRight, ArrowLeft, Tab, Enter. ONLY use mouse clicks and typing text.`;

  const result = await agent.execute({
    instruction: task,
    maxSteps: 10,
    autoScreenshot: false
  });
  
  console.log('Agent result:', result); // Shows success: false with invalid key errors
  await expect(stagehand.page).toHaveURL(/.*dashboard|.*home|.*main/);
  await stagehand.close();
});

Error Messages / Log trace

Onboarding/Login Test Results:

Starting Computer Use Agent execution...
Agent execution completed: {
  success: false,
  actions: [
    { type: 'screenshot' },
    { type: 'click', button: 'left', x: 159, y: 571 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 162, y: 573 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 1797, y: 496 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 125, y: 569 },
    { type: 'wait' },
    { type: 'wait' }
  ],
  message: '',
  completed: false,
  usage: { input_tokens: 41254, output_tokens: 402, inference_time_ms: 57316 }
}
[2025-10-10 14:36:46.002 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ARROWDOWN"

Expected Key Names: ArrowRight, ArrowLeft, Tab, Enter, ArrowDown
Actual Key Names: ARROWRIGHT, ARROWLEFT, TAB, ENTER, ARROWDOWN

Complete List of Invalid Key Errors:

[2025-10-10 14:36:46.002 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ARROWDOWN"
[2025-10-10 10:17:20.988 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ARROWRIGHT"
[2025-10-10 10:17:20.989 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ARROWLEFT"
[2025-10-10 10:17:20.990 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "TAB"
[2025-10-10 10:17:20.991 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ENTER"

Agent Console Output (showing execution but failure):

Starting Computer Use Agent execution...
Agent execution completed: {
  success: false,
  actions: [
    { type: 'screenshot' },
    { type: 'click', button: 'left', x: 159, y: 571 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 162, y: 573 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 1797, y: 496 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 125, y: 569 },
    { type: 'wait' },
    { type: 'wait' }
  ],
  message: '',
  completed: false,
  usage: { input_tokens: 41254, output_tokens: 402, inference_time_ms: 57316 }
}
[2025-10-10 14:36:46.002 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ARROWDOWN"

Browser Reality: Agent executes some clicks but crashes on invalid key names, task incomplete

Agent Behavior: Executes some actions but crashes on invalid key names, reports failure correctly

Related Issues

Are there any related issues or PRs?

  • Related to: Computer Use Agent functionality
  • Blocks: Any automation using Computer Use Agent

Additional Context

What Works (Regular Stagehand Methods):

// This works perfectly - use regular Stagehand instead
await stagehand.page.act("Type 'Stagehand Computer Use' in the search box");
await stagehand.page.act("Press Enter or click the search button");
await stagehand.page.act("Click the first search result");

Manual Navigation Works:

// This works reliably
await stagehand.page.goto('https://www.google.com');
await stagehand.page.waitForLoadState('networkidle');

Impact:

  • Computer Use Agent is unusable for production automation tasks
  • Wastes development time trying to debug non-functional features
  • Misleading success reports make debugging impossible
  • Forces developers to use regular Stagehand methods instead of advertised Computer Use Agent

Requested Fixes:

  1. Fix keyboard key names - Use proper Playwright key names (ArrowDown not ARROWDOWN)
  2. Fix browser stability - Stop causing crashes during navigation
  3. Respect instructions - Agent should follow explicit instructions about keyboard usage
  4. Fix action execution - Actions should actually complete successfully
  5. Add proper error handling - Don't report success for failed actions

Additional Information:

  • API Keys: Valid OpenAI and Google API keys with Computer Use model access
  • Permissions: Confirmed access to Computer Use models
  • Rate Limits: Not hitting rate limits
  • Network: Stable internet connection
  • Reproducible: Issues occur consistently across multiple test runs

Contact

If you need additional information, logs, or test cases, please contact me at ikolosov@fliplet.com


This bug report represents critical failures of the Computer Use Agent feature that make it unusable for production automation tasks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions