Visual Agent — Coordinate-Based Delegation

# Visual Agent — Coordinate-Based Delegation

## Summary

Implement visual sub-loop using Gemini Computer Use model for screenshot-based interaction.

## Description

When the semantic agent can't accomplish a task through the AX Tree, it delegates:

```typescript
delegate_to_visual_agent({ instruction: 'Click the blue submit button' });
```

The visual sub-loop:

1. Capture screenshot via Playwright
2. Send to `gemini-2.5-computer-use-preview` model
3. Execute visual tools: `click_at(x, y)`, `type_text_at`, `drag_and_drop`, `scroll_document`
4. Capture new screenshot for feedback
5. Repeat until complete (max 5 steps)

Visual tools use Playwright's `page.mouse` and `page.keyboard` APIs directly, not MCP.

## Acceptance Criteria

- [ ] Visual sub-loop with separate model conversation
- [ ] Screenshot capture with consistent coordinate system
- [ ] Visual tool execution via Playwright
- [ ] MCP cache invalidation after visual actions (UIDs become stale)
- [ ] Max steps limit


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Visual Agent — Coordinate-Based Delegation #15962

Visual Agent — Coordinate-Based Delegation

Summary

Description

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Visual Agent — Coordinate-Based Delegation #15962

Description

Visual Agent — Coordinate-Based Delegation

Summary

Description

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions