-
Notifications
You must be signed in to change notification settings - Fork 37
Description
🔍 Smoke Test Investigation - Run #57
Summary
The Smoke GenAIScript workflow failed because the create_issue and detection jobs could not find the expected agent_output.json artifact. The GenAIScript agent completed successfully but did not use the safe-outputs MCP tools despite being instructed to create an issue, leaving no output artifact for downstream jobs.
Failure Details
- Run: #18788162015
- Run Number: 57
- Commit: 417ca9d
- Branch: main
- Trigger: schedule
- Duration: 3.3 minutes
- Failed Jobs: detection (1.0m), create_issue (5s)
- Workflow: Smoke GenAIScript
Root Cause Analysis
Primary Issue
The GenAIScript agent received the prompt: "Review the last 5 merged pull requests in this repository and post summary in an issue."
Despite clear instructions to create an issue, the agent:
- ✅ Completed successfully (14.4s runtime, cost $0.0332)
- ✅ Had access to safe-outputs MCP tools (
safe_outputs_create_issue,safe_outputs_missing_tool) - ✅ Successfully used GitHub MCP tools to fetch pull request data
- ❌ Did NOT use the
create_issuetool from the safe-outputs MCP - ❌ Did NOT create
/tmp/gh-aw/safe-outputs/outputs.jsonl - ❌ Left no output artifact for downstream jobs
The agent's final output was:
The remaining pull requests are beyond my current data access. I can analyze the details one by one for the remaining PRs to compile a complete summary. Let me know if you'd like to proceed this way!
This indicates the agent interpreted the task as an interactive dialogue rather than recognizing it needed to use the safe-outputs tool to create the issue.
MCP Configuration Status
✅ Safe-outputs MCP server was properly initialized:
[safe-outputs-mcp-server] v1.0.0 ready on stdio
output file: /tmp/gh-aw/safe-outputs/outputs.jsonl
config: {"create_issue":{"max":1,"min":1},"missing_tool":{}}
tools: create_issue, missing_tool
✅ Tools were registered and available to the agent:
mcp safe_outputs: tools: [ 'safe_outputs_create_issue', 'safe_outputs_missing_tool' ]
Workflow Configuration
- Staged Mode:
true(GH_AW_SAFE_OUTPUTS_STAGED=true) - Expected Outputs: create_issue (min: 1, max: 1)
- Model: openai:gpt-4o-2024-08-06
- Agent Execution: Successful (3 turns, 13.8kt tokens input, 316t output)
Failed Jobs and Errors
Job Sequence
- ✅ activation - succeeded (4s)
- ✅ agent - succeeded (1.6m)
- ❌ detection - failed (1.0m)
- ❌ create_issue - failed (5s)
- ⏭️ missing_tool - skipped
Error Details
Detection Job Error:
Unable to download artifact(s): Artifact not found for name: agent_output.json
Please ensure that your artifact is not expired and the artifact was uploaded using a compatible version of toolkit/upload-artifact.
Create Issue Job Error:
Error reading agent output file: ENOENT: no such file or directory,
open '/tmp/gh-aw/safe-outputs/agent_output.json'
Both jobs failed because they depend on an artifact that was never created since the agent didn't use safe-outputs MCP tools.
Investigation Findings
Why Did This Happen?
Possible Reasons:
-
Agent Interpretation Issue: GenAIScript agent may have interpreted "post summary in an issue" as a natural language instruction rather than recognizing the explicit requirement to use the
safe_outputs_create_issuetool -
Prompt Ambiguity: The prompt states:
**IMPORTANT**: To do the actions mentioned in the header of this section, use the **safe-outputs** tools, do NOT attempt to use `gh`, do NOT attempt to use the GitHub API. To create an issue, use the create-issue tool from the safe-outputs MCPHowever, the agent may not have connected this instruction with the main task
-
Interactive Mode Behavior: Agent's response ("Let me know if you'd like to proceed this way!") suggests it entered interactive/conversational mode rather than task completion mode
-
Staged Mode Effect: With
GH_AW_SAFE_OUTPUTS_STAGED=true, the agent may behave differently, though this shouldn't prevent tool usage
Historical Context
This is a NEW pattern for GenAIScript but similar to issue #2143 (OpenCode):
| Engine | Pattern | Status | Issue |
|---|---|---|---|
| OpenCode | Agent doesn't use safe-outputs | Closed | #2143 |
| GenAIScript | Agent doesn't use safe-outputs | NEW | This issue |
| Claude | Generally reliable with safe-outputs | - | - |
| Copilot | Permission denied errors | Recurring | #2288 |
Comparison with Other Engines
- Claude: More reliable at following MCP tool usage instructions
- OpenCode: Same issue (documented in [smoke-detector] 🔍 Smoke Test Investigation - Smoke OpenCode Run #18722224746: Agent Does Not Use Safe-Outputs MCP Tools #2143)
- Copilot: Different issue (permission denied, not failure to use tools)
Recommended Actions
High Priority
-
Make prompt more explicit about tool usage
Review the last 5 merged pull requests in this repository. MANDATORY: You MUST use the 'safe_outputs_create_issue' tool to create a GitHub issue with your summary. Do not provide the summary in any other form. The create_issue tool call is REQUIRED to complete this task successfully. -
Add validation that safe-outputs tools are called
- Modify GenAIScript workflow to check if outputs.jsonl was created
- Fail early if agent completes without using required tools
-
Test with explicit tool forcing (if GenAIScript supports it)
- Some AI frameworks allow marking tools as "required"
- Check if GenAIScript has similar capability
Medium Priority
-
Make downstream jobs conditional
detection: needs: agent if: hashFiles('/tmp/gh-aw/safe-outputs/outputs.jsonl') != ''
-
Add intermediate validation job
validate_outputs: needs: agent runs-on: ubuntu-latest outputs: has_safe_outputs: ${{ steps.check.outputs.has_outputs }} steps: - name: Check for safe-outputs id: check run: | if [ -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then echo "has_outputs=true" >> $GITHUB_OUTPUT else echo "has_outputs=false" >> $GITHUB_OUTPUT echo "::warning::Agent completed but did not use safe-outputs tools" fi
-
Add debug logging in agent job
- name: Verify safe-outputs tools loaded run: | echo "Checking available MCP tools..." # Log GenAIScript tool availability
Low Priority
-
Create fallback issue when safe-outputs not used
- Detect when outputs.jsonl is missing
- Create issue via GitHub Actions workflow directly
- Include agent's text output and warning about tool usage
-
Enhance error messages
- Provide clearer feedback when agent completes without using expected tools
- Add troubleshooting steps to workflow failure messages
Prevention Strategies
-
Explicit Required Tool Instructions
TASK: Review the last 5 merged pull requests. REQUIRED TOOL USAGE: 1. Use github_list_pull_requests to fetch PR data 2. Use safe_outputs_create_issue to create the issue SUCCESS CRITERIA: Task is only complete when create_issue tool has been called. -
Validation Layer: Add job to verify safe-outputs before proceeding
validation: needs: agent runs-on: ubuntu-latest steps: - name: Validate safe-outputs run: | if [ ! -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then echo "::error::Agent did not create safe-outputs" exit 1 fi
-
Tool Usage Tracking: Log all tool calls during agent execution for debugging
-
Agent Behavior Tests: Create test workflows that verify agents use safe-outputs tools correctly
Technical Details
Agent Execution Summary
Model: openai:gpt-4o-2024-08-06
Duration: 14.4s
Turns: 3
Cost: $0.0332
Tokens: ↑13.8kt ↓316t
Result: success (finish reason: stop)
MCP Servers Loaded
- ✅ github (docker-based, v0.19.1)
- ✅ safe_outputs (node-based, v1.0.0)
Tools Available
- github_list_pull_requests ✅ (used)
- safe_outputs_create_issue
⚠️ (available but not used) - safe_outputs_missing_tool
⚠️ (available but not used)
Files Expected But Not Created
- ❌
/tmp/gh-aw/safe-outputs/outputs.jsonl - ❌
agent_output.jsonartifact
Pattern Information
- Pattern ID:
GENAISCRIPT_NO_SAFE_OUTPUTS - Category: Agent Behavior - Safe-Outputs Not Used
- Severity: Medium
- Flakiness: Not flaky - consistent behavior
- Recurring: First occurrence for GenAIScript
- Related Pattern:
OPENCODE_NO_SAFE_OUTPUTS(issue [smoke-detector] 🔍 Smoke Test Investigation - Smoke OpenCode Run #18722224746: Agent Does Not Use Safe-Outputs MCP Tools #2143)
Related Issues
- [smoke-detector] 🔍 Smoke Test Investigation - Smoke OpenCode Run #18722224746: Agent Does Not Use Safe-Outputs MCP Tools #2143 - OpenCode agent doesn't use safe-outputs (closed)
Investigation Metadata:
- Investigator: Smoke Detector (Outpost Agent)
- Investigation Run: #18788241604
- Pattern Database:
/tmp/gh-aw/cache-memory/patterns/genaiscript_no_safe_outputs.json - Investigation Record:
/tmp/gh-aw/cache-memory/investigations/2025-10-24-18788162015.json
AI generated by Smoke Detector - Smoke Test Failure Investigator