Skip to content

Conversation

@seanmcguire12
Copy link
Member

why

  • before this change, when we convert z.string().url() to an ID, if it was inside a z.array(), it was not getting converted back into a URL
  • this meant that if you defined a schema like this:
schema: z.object({
  records: z.array(z.string().url()),
})

you would receive an array like this:

{
  records: [
    '0-302', '0-309',
    '0-316', '0-323',
    '0-330', '0-337',
    '0-344', '0-351',
    '0-358', '0-365'
  ]
}
  • with this change, you will now receive the actual URLs, ie:
{
  records: [
    'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10003-10041.pdf',
    'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10004-10143%20(C06932208).pdf',
    'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10004-10143.pdf',
    'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10004-10156.pdf',
    'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10004-10213.pdf',
    'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10005-10321.pdf',
    'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10006-10247.pdf',
    'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10007-10345.pdf',
    'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10009-10021.pdf',
    'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10009-10222.pdf'
  ]
}

what changed

  • updated the injectUrls function so that when it hits an array and there is not deeper path, it loops through the array and injects the URLs

test plan

  • evals

@changeset-bot
Copy link

changeset-bot bot commented Oct 6, 2025

🦋 Changeset detected

Latest commit: 6d6c7aa

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR fixes a critical bug in Stagehand's URL extraction system for array schemas. The issue occurred when using `z.array(z.string().url())` in extraction schemas - instead of returning actual URLs, users received numeric identifiers like '0-302', '0-309', etc.

The bug was in the injectUrls function in lib/utils.ts, which handles converting numeric IDs back to URLs after LLM extraction. Stagehand's extraction pipeline works by first transforming URL fields into numeric IDs (making them easier for LLMs to process), then converting those IDs back to actual URLs using a mapping. While the schema transformation correctly identified URL fields in arrays and created appropriate path segments like ['*'], the injectUrls function wasn't handling the terminal case where the path ends at an array wildcard.

The fix adds logic to handle this scenario: when the path terminates at an array (rest.length === 0), the function now loops through each array element and converts any valid IDs back to their corresponding URLs. Additionally, a toId helper function was extracted to standardize the ID detection logic that was previously duplicated.

This change integrates seamlessly with Stagehand's existing URL extraction architecture and ensures that array-based URL schemas work as expected, maintaining consistency with non-array URL field behavior.

Important Files Changed

Changed Files
Filename Score Overview
lib/utils.ts 4/5 Fixed URL injection for array schemas by adding array element processing logic and extracting a toId helper function

Confidence score: 4/5

  • This PR is safe to merge with minimal risk as it fixes a clear bug without affecting existing functionality
  • Score reflects focused bug fix with good code organization and clear understanding of the existing system
  • No files require special attention as the change is contained and well-implemented

Sequence Diagram

sequenceDiagram
    participant User
    participant ExtractHandler as "StagehandExtractHandler"
    participant Utils as "utils.ts"
    participant Schema as "Zod Schema"
    participant LLM as "LLM Client"

    User->>ExtractHandler: "extract() with array schema containing z.string().url()"
    ExtractHandler->>ExtractHandler: "domExtract()"
    ExtractHandler->>Utils: "transformSchema(schema, [])"
    
    Note over Utils: Transform z.string().url() to z.number() in arrays
    Utils->>Utils: "isKind(schema, Kind.ZodArray)"
    Utils->>Utils: "transformSchema(itemType, [...currentPath, '*'])"
    Utils->>Utils: "Replace z.string().url() with z.number()"
    Utils-->>ExtractHandler: "[transformedSchema, urlPaths]"
    
    ExtractHandler->>LLM: "extract data using transformed schema"
    LLM-->>ExtractHandler: "extracted data with numeric IDs"
    
    Note over ExtractHandler: Data contains numeric IDs like ['0-302', '0-309']
    
    ExtractHandler->>Utils: "injectUrls(extractedData, urlPaths, idToUrlMapping)"
    Utils->>Utils: "Check if path[0] === '*' and rest.length === 0"
    
    Note over Utils: NEW: Handle arrays directly when no deeper path
    Utils->>Utils: "Loop through array items"
    Utils->>Utils: "Replace numeric IDs with actual URLs"
    
    Utils-->>ExtractHandler: "Data with URLs injected"
    ExtractHandler-->>User: "Array with actual URLs instead of numeric IDs"
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@seanmcguire12 seanmcguire12 added the extract These changes pertain to the extract function label Oct 6, 2025
@seanmcguire12 seanmcguire12 merged commit 3ccf335 into main Oct 6, 2025
17 of 29 checks passed
miguelg719 pushed a commit that referenced this pull request Oct 7, 2025
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/stagehand@2.5.1

### Patch Changes

- [#1082](#1082)
[`8c0fd01`](8c0fd01)
Thanks [@tkattkat](https://github.com/tkattkat)! - Pass stagehand object
to agent instead of stagehand page

- [#1104](#1104)
[`a1ad06c`](a1ad06c)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix logging for
stagehand agent

- [#1066](#1066)
[`9daa584`](9daa584)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add playwright
arguments to agent execute response

- [#1077](#1077)
[`7f38b3a`](7f38b3a)
Thanks [@tkattkat](https://github.com/tkattkat)! - adds support for
stagehand agent in the api

- [#1032](#1032)
[`bf2d0e7`](bf2d0e7)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix for zod peer
dependency support

- [#1014](#1014)
[`6966201`](6966201)
Thanks [@tkattkat](https://github.com/tkattkat)! - Replace operator
handler with base of new agent

- [#1089](#1089)
[`536f366`](536f366)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fixed info logs
on api session create

- [#1103](#1103)
[`889cb6c`](889cb6c)
Thanks [@tkattkat](https://github.com/tkattkat)! - patch custom tool
support in anthropic cua client

- [#1056](#1056)
[`6a002b2`](6a002b2)
Thanks [@chrisreadsf](https://github.com/chrisreadsf)! - remove need for
duplicate project id if already passed to Stagehand

- [#1090](#1090)
[`8ff5c5a`](8ff5c5a)
Thanks [@miguelg719](https://github.com/miguelg719)! - Improve failed
act error logs

- [#1014](#1014)
[`6966201`](6966201)
Thanks [@tkattkat](https://github.com/tkattkat)! - replace operator
agent with scaffold for new stagehand agent

- [#1107](#1107)
[`3ccf335`](3ccf335)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: url
extraction not working inside an array

- [#1102](#1102)
[`a99aa48`](a99aa48)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add current page
and date context to agent

- [#1110](#1110)
[`dda52f1`](dda52f1)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add support for
new Gemini Computer Use models

## @browserbasehq/stagehand-evals@1.1.0

### Minor Changes

- [#1057](#1057)
[`b7be89e`](b7be89e)
Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - added
web voyager ground truth (optional), added web bench, and subset of
OSWorld evals which run on a browser

### Patch Changes

- [#1072](#1072)
[`dc2d420`](dc2d420)
Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - improve
evals screenshot service - add img hashing diff to add screenshots and
change to screenshot intercepts from the agent

- Updated dependencies
\[[`8c0fd01`](8c0fd01),
[`a1ad06c`](a1ad06c),
[`9daa584`](9daa584),
[`7f38b3a`](7f38b3a),
[`bf2d0e7`](bf2d0e7),
[`6966201`](6966201),
[`536f366`](536f366),
[`889cb6c`](889cb6c),
[`6a002b2`](6a002b2),
[`8ff5c5a`](8ff5c5a),
[`6966201`](6966201),
[`3ccf335`](3ccf335),
[`a99aa48`](a99aa48),
[`dda52f1`](dda52f1)]:
    -   @browserbasehq/stagehand@2.5.1

## @browserbasehq/stagehand-examples@1.0.10

### Patch Changes

- Updated dependencies
\[[`8c0fd01`](8c0fd01),
[`a1ad06c`](a1ad06c),
[`9daa584`](9daa584),
[`7f38b3a`](7f38b3a),
[`bf2d0e7`](bf2d0e7),
[`6966201`](6966201),
[`536f366`](536f366),
[`889cb6c`](889cb6c),
[`6a002b2`](6a002b2),
[`8ff5c5a`](8ff5c5a),
[`6966201`](6966201),
[`3ccf335`](3ccf335),
[`a99aa48`](a99aa48),
[`dda52f1`](dda52f1)]:
    -   @browserbasehq/stagehand@2.5.1

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extract These changes pertain to the extract function

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants