Computer Use Agent: Critical failures - invalid keys, crashes, false success reports


**Before submitting an issue, please:**

- [x] Check the [documentation](https://docs.stagehand.dev/) for relevant information
- [x] Search existing [issues](https://github.com/browserbase/stagehand/issues) to avoid duplicates

## Environment Information

**Stagehand:**
- Language/SDK: TypeScript
- Stagehand version: Latest

**AI Provider:**
- Provider: OpenAI, Google
- Model: `computer-use-preview-2025-03-11`, `gemini-2.5-computer-use-preview-10-2025`

**Additional Environment:**
- Node.js: Latest LTS
- Operating System: Windows 11
- Browser: Chromium (local and Browserbase cloud)

## Issue Description

The Computer Use Agent in Stagehand has critical failures that make it unusable:

1. **Invalid Key Names**: Generates keyboard key names that crash Playwright
2. **Browser Crashes**: Closes browser/context during execution
3. **False Success Reports**: Reports success for actions that fail
4. **Ignores Instructions**: Disregards explicit instructions about keyboard usage

### Steps to Reproduce

1. Create a Computer Use Agent with explicit instructions
2. Execute a simple task (onboarding/login or Google search)
3. Observe agent reports success but browser shows crashes or invalid key errors
4. Check console for invalid key name errors

### Minimal Reproduction Code

```typescript
import { test, expect } from '@playwright/test';
import { Stagehand } from '@browserbasehq/stagehand';

test('should complete onboarding and login process', async () => {
  const stagehand = new Stagehand({
    env: "LOCAL",
    modelName: "gpt-4o",
    modelClientOptions: { apiKey: process.env.OPENAI_API_KEY }
  });
  
  await stagehand.init();
  await stagehand.page.goto(process.env.BASE_URL);
  
  const agent = stagehand.agent({
    provider: "openai",
    model: "computer-use-preview-2025-03-11",
    instructions: "You are a web automation assistant. NEVER use keyboard navigation keys like ArrowRight, ArrowLeft, Tab, or Enter. ONLY use mouse clicks and typing in input fields.",
    options: { apiKey: process.env.OPENAI_API_KEY }
  });
  
  const task = `
1. Click "Explore More" button
2. Click "Continue" button  
3. Click "Continue" button
4. Click "Let's get started!" button
5. Enter email: ${process.env.ADMIN_EMAIL}
6. Enter password: ${process.env.ADMIN_PASSWORD}
7. Click login button

CRITICAL: NEVER use keyboard keys like ArrowRight, ArrowLeft, Tab, Enter. ONLY use mouse clicks and typing text.`;

  const result = await agent.execute({
    instruction: task,
    maxSteps: 10,
    autoScreenshot: false
  });
  
  console.log('Agent result:', result); // Shows success: false with invalid key errors
  await expect(stagehand.page).toHaveURL(/.*dashboard|.*home|.*main/);
  await stagehand.close();
});
```

### Error Messages / Log trace

**Onboarding/Login Test Results:**
```
Starting Computer Use Agent execution...
Agent execution completed: {
  success: false,
  actions: [
    { type: 'screenshot' },
    { type: 'click', button: 'left', x: 159, y: 571 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 162, y: 573 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 1797, y: 496 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 125, y: 569 },
    { type: 'wait' },
    { type: 'wait' }
  ],
  message: '',
  completed: false,
  usage: { input_tokens: 41254, output_tokens: 402, inference_time_ms: 57316 }
}
[2025-10-10 14:36:46.002 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ARROWDOWN"
```


**Expected Key Names**: `ArrowRight`, `ArrowLeft`, `Tab`, `Enter`, `ArrowDown`
**Actual Key Names**: `ARROWRIGHT`, `ARROWLEFT`, `TAB`, `ENTER`, `ARROWDOWN`

**Complete List of Invalid Key Errors:**
```
[2025-10-10 14:36:46.002 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ARROWDOWN"
[2025-10-10 10:17:20.988 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ARROWRIGHT"
[2025-10-10 10:17:20.989 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ARROWLEFT"
[2025-10-10 10:17:20.990 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "TAB"
[2025-10-10 10:17:20.991 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ENTER"
```

**Agent Console Output (showing execution but failure):**
```
Starting Computer Use Agent execution...
Agent execution completed: {
  success: false,
  actions: [
    { type: 'screenshot' },
    { type: 'click', button: 'left', x: 159, y: 571 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 162, y: 573 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 1797, y: 496 },
    { type: 'wait' },
    { type: 'click', button: 'left', x: 125, y: 569 },
    { type: 'wait' },
    { type: 'wait' }
  ],
  message: '',
  completed: false,
  usage: { input_tokens: 41254, output_tokens: 402, inference_time_ms: 57316 }
}
[2025-10-10 14:36:46.002 +0300] ERROR: Error executing action keypress: keyboard.press: Unknown key: "ARROWDOWN"
```

**Browser Reality:** Agent executes some clicks but crashes on invalid key names, task incomplete

**Agent Behavior:** Executes some actions but crashes on invalid key names, reports failure correctly

### Related Issues

Are there any related issues or PRs?

- Related to: Computer Use Agent functionality
- Blocks: Any automation using Computer Use Agent

## Additional Context

**What Works (Regular Stagehand Methods):**
```typescript
// This works perfectly - use regular Stagehand instead
await stagehand.page.act("Type 'Stagehand Computer Use' in the search box");
await stagehand.page.act("Press Enter or click the search button");
await stagehand.page.act("Click the first search result");
```

**Manual Navigation Works:**
```typescript
// This works reliably
await stagehand.page.goto('https://www.google.com');
await stagehand.page.waitForLoadState('networkidle');
```

**Impact:**
- Computer Use Agent is unusable for production automation tasks
- Wastes development time trying to debug non-functional features  
- Misleading success reports make debugging impossible
- Forces developers to use regular Stagehand methods instead of advertised Computer Use Agent

**Requested Fixes:**
1. Fix keyboard key names - Use proper Playwright key names (`ArrowDown` not `ARROWDOWN`)
2. Fix browser stability - Stop causing crashes during navigation
3. Respect instructions - Agent should follow explicit instructions about keyboard usage
4. Fix action execution - Actions should actually complete successfully
5. Add proper error handling - Don't report success for failed actions

**Additional Information:**
- API Keys: Valid OpenAI and Google API keys with Computer Use model access
- Permissions: Confirmed access to Computer Use models
- Rate Limits: Not hitting rate limits
- Network: Stable internet connection  
- Reproducible: Issues occur consistently across multiple test runs

## Contact
If you need additional information, logs, or test cases, please contact me at ikolosov@fliplet.com

---
**This bug report represents critical failures of the Computer Use Agent feature that make it unusable for production automation tasks.**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Computer Use Agent: Critical failures - invalid keys, crashes, false success reports #1122

Environment Information

Issue Description

Steps to Reproduce

Minimal Reproduction Code

Error Messages / Log trace

Related Issues

Additional Context

Contact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Computer Use Agent: Critical failures - invalid keys, crashes, false success reports #1122

Description

Environment Information

Issue Description

Steps to Reproduce

Minimal Reproduction Code

Error Messages / Log trace

Related Issues

Additional Context

Contact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions