Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

Summary

Adds comprehensive integration tests for OpenAI's web_search tool to prevent regressions of the reasoning item ID bug.

Tests Added

Three concurrent integration tests in tests/ipcMain/openai-web-search.test.ts:

  1. Basic web_search execution - Verifies web_search tool is called and completes successfully
  2. Multiple web_search calls - Tests sequential web searches in a single conversation
  3. Reasoning + web_search - Regression test for the itemId bug where reasoning and web_search interact

Key Features

  • Uses gpt-5-codex which supports web_search tool
  • Tests verify no errors occur during multi-step execution
  • Validates proper event streaming (stream-start, tool-call-start, stream-end)
  • Third test specifically targets the reasoning + web_search scenario that exposed the original bug

Testing

These are integration tests that require OPENAI_API_KEY:

TEST_INTEGRATION=1 bun x jest tests/ipcMain/openai-web-search.test.ts

Tests run concurrently with appropriate timeouts (45-75 seconds) for real API calls.

Related

  • Part of ongoing work to ensure OpenAI reasoning middleware correctly strips itemIds
  • Complements the fix in the openai-fix branch
  • Provides regression coverage for future changes

Generated with cmux

- Test basic web_search tool execution
- Test multiple web_search calls in sequence
- Test reasoning + web_search combination (regression test for item ID bug)
- All tests use gpt-5-codex which supports web_search
- Tests verify no errors occur during multi-step execution

_Generated with `cmux`_
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

ammar-agent and others added 3 commits October 8, 2025 14:17
- Reduce to single focused test with constrained prompt
- Explicit instruction to use web_search + simple factual query
- Increase timeout to 90s for reasoning model processing
- Key test: verifies no reasoning itemId errors occur
Use gold price + Collatz computation prompt that:
- Always triggers web_search (pricing data needs search)
- Requires reasoning for mathematical computation
- Sets thinkingLevel: 'high' to ensure reasoning is used
- This combination reliably exposes the itemId bug on main

Test now validates both reasoning and web_search occurred,
making it a true regression test for the itemId bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants