Skip to content

Conversation

ammar-agent
Copy link
Collaborator

Problem

Integration test was flaking: OpenAI auto truncation integration > should include full file_edit diff in UI/history but redact it from the next provider request

Failure mode: Stream completes tool calls but never emits stream-end event, causing test timeout.

Root cause: AI models can complete tool execution without generating text output. This is non-deterministic behavior - sometimes the model responds with text after tools, sometimes it doesn't.

See full analysis in E2E_FLAKE_ANALYSIS.md.

Solution

Modified test prompt to explicitly request confirmation after tool execution:

- "Open and replace 'line2' with 'LINE2' using file_edit_replace"
+ "Open and replace 'line2' with 'LINE2' using file_edit_replace, then confirm the change was successfully applied."

This encourages the AI to generate text output after completing tools, ensuring the stream finishes properly.

Trade-offs

Short-term fix: Prompt modification reduces flakiness significantly
Long-term fix: Stream manager should detect tool-only responses and auto-emit stream-end (tracked for future work)

Testing

  • Integration tests should be more stable
  • Test still validates the actual truncation/redaction behavior
  • No changes to production code

Generated with cmux

@ammario ammario enabled auto-merge October 14, 2025 19:23
@ammario ammario added this pull request to the merge queue Oct 14, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 14, 2025
The integration test was flaky because AI models sometimes complete
tool calls without generating text output, causing the stream to never
emit stream-end.

Fix: Modified test prompt to request confirmation after tool execution.
This encourages the AI to generate text output, ensuring the stream
completes properly.

Added analysis document explaining the root cause and potential solutions.
@ammar-agent ammar-agent force-pushed the investigate-e2e-flake branch from 42c093d to d87478d Compare October 14, 2025 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant