Skip to content

[smoke-detector] 🔍 Smoke Test Investigation - Smoke GenAIScript Run #57: Agent Does Not Use Safe-Outputs MCP Tools #2307

@github-actions

Description

@github-actions

🔍 Smoke Test Investigation - Run #57

Summary

The Smoke GenAIScript workflow failed because the create_issue and detection jobs could not find the expected agent_output.json artifact. The GenAIScript agent completed successfully but did not use the safe-outputs MCP tools despite being instructed to create an issue, leaving no output artifact for downstream jobs.

Failure Details

  • Run: #18788162015
  • Run Number: 57
  • Commit: 417ca9d
  • Branch: main
  • Trigger: schedule
  • Duration: 3.3 minutes
  • Failed Jobs: detection (1.0m), create_issue (5s)
  • Workflow: Smoke GenAIScript

Root Cause Analysis

Primary Issue

The GenAIScript agent received the prompt: "Review the last 5 merged pull requests in this repository and post summary in an issue."

Despite clear instructions to create an issue, the agent:

  1. ✅ Completed successfully (14.4s runtime, cost $0.0332)
  2. ✅ Had access to safe-outputs MCP tools (safe_outputs_create_issue, safe_outputs_missing_tool)
  3. ✅ Successfully used GitHub MCP tools to fetch pull request data
  4. ❌ Did NOT use the create_issue tool from the safe-outputs MCP
  5. ❌ Did NOT create /tmp/gh-aw/safe-outputs/outputs.jsonl
  6. ❌ Left no output artifact for downstream jobs

The agent's final output was:

The remaining pull requests are beyond my current data access. I can analyze the details one by one for the remaining PRs to compile a complete summary. Let me know if you'd like to proceed this way!

This indicates the agent interpreted the task as an interactive dialogue rather than recognizing it needed to use the safe-outputs tool to create the issue.

MCP Configuration Status

✅ Safe-outputs MCP server was properly initialized:

[safe-outputs-mcp-server] v1.0.0 ready on stdio
  output file: /tmp/gh-aw/safe-outputs/outputs.jsonl
  config: {"create_issue":{"max":1,"min":1},"missing_tool":{}}
  tools: create_issue, missing_tool

✅ Tools were registered and available to the agent:

mcp safe_outputs: tools: [ 'safe_outputs_create_issue', 'safe_outputs_missing_tool' ]

Workflow Configuration

  • Staged Mode: true (GH_AW_SAFE_OUTPUTS_STAGED=true)
  • Expected Outputs: create_issue (min: 1, max: 1)
  • Model: openai:gpt-4o-2024-08-06
  • Agent Execution: Successful (3 turns, 13.8kt tokens input, 316t output)

Failed Jobs and Errors

Job Sequence

  1. activation - succeeded (4s)
  2. agent - succeeded (1.6m)
  3. detection - failed (1.0m)
  4. create_issue - failed (5s)
  5. ⏭️ missing_tool - skipped

Error Details

Detection Job Error:

Unable to download artifact(s): Artifact not found for name: agent_output.json
Please ensure that your artifact is not expired and the artifact was uploaded using a compatible version of toolkit/upload-artifact.

Create Issue Job Error:

Error reading agent output file: ENOENT: no such file or directory, 
open '/tmp/gh-aw/safe-outputs/agent_output.json'

Both jobs failed because they depend on an artifact that was never created since the agent didn't use safe-outputs MCP tools.

Investigation Findings

Why Did This Happen?

Possible Reasons:

  1. Agent Interpretation Issue: GenAIScript agent may have interpreted "post summary in an issue" as a natural language instruction rather than recognizing the explicit requirement to use the safe_outputs_create_issue tool

  2. Prompt Ambiguity: The prompt states:

    **IMPORTANT**: To do the actions mentioned in the header of this section, use the **safe-outputs** tools, 
    do NOT attempt to use `gh`, do NOT attempt to use the GitHub API.
    
    To create an issue, use the create-issue tool from the safe-outputs MCP
    

    However, the agent may not have connected this instruction with the main task

  3. Interactive Mode Behavior: Agent's response ("Let me know if you'd like to proceed this way!") suggests it entered interactive/conversational mode rather than task completion mode

  4. Staged Mode Effect: With GH_AW_SAFE_OUTPUTS_STAGED=true, the agent may behave differently, though this shouldn't prevent tool usage

Historical Context

This is a NEW pattern for GenAIScript but similar to issue #2143 (OpenCode):

Engine Pattern Status Issue
OpenCode Agent doesn't use safe-outputs Closed #2143
GenAIScript Agent doesn't use safe-outputs NEW This issue
Claude Generally reliable with safe-outputs - -
Copilot Permission denied errors Recurring #2288

Comparison with Other Engines

Recommended Actions

High Priority

  • Make prompt more explicit about tool usage

    Review the last 5 merged pull requests in this repository.
    
    MANDATORY: You MUST use the 'safe_outputs_create_issue' tool to create a GitHub issue 
    with your summary. Do not provide the summary in any other form. The create_issue tool 
    call is REQUIRED to complete this task successfully.
    
  • Add validation that safe-outputs tools are called

    • Modify GenAIScript workflow to check if outputs.jsonl was created
    • Fail early if agent completes without using required tools
  • Test with explicit tool forcing (if GenAIScript supports it)

    • Some AI frameworks allow marking tools as "required"
    • Check if GenAIScript has similar capability

Medium Priority

  • Make downstream jobs conditional

    detection:
      needs: agent
      if: hashFiles('/tmp/gh-aw/safe-outputs/outputs.jsonl') != ''
  • Add intermediate validation job

    validate_outputs:
      needs: agent
      runs-on: ubuntu-latest
      outputs:
        has_safe_outputs: ${{ steps.check.outputs.has_outputs }}
      steps:
        - name: Check for safe-outputs
          id: check
          run: |
            if [ -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then
              echo "has_outputs=true" >> $GITHUB_OUTPUT
            else
              echo "has_outputs=false" >> $GITHUB_OUTPUT
              echo "::warning::Agent completed but did not use safe-outputs tools"
            fi
  • Add debug logging in agent job

    - name: Verify safe-outputs tools loaded
      run: |
        echo "Checking available MCP tools..."
        # Log GenAIScript tool availability

Low Priority

  • Create fallback issue when safe-outputs not used

    • Detect when outputs.jsonl is missing
    • Create issue via GitHub Actions workflow directly
    • Include agent's text output and warning about tool usage
  • Enhance error messages

    • Provide clearer feedback when agent completes without using expected tools
    • Add troubleshooting steps to workflow failure messages

Prevention Strategies

  1. Explicit Required Tool Instructions

    TASK: Review the last 5 merged pull requests.
    
    REQUIRED TOOL USAGE:
    1. Use github_list_pull_requests to fetch PR data
    2. Use safe_outputs_create_issue to create the issue
    
    SUCCESS CRITERIA: Task is only complete when create_issue tool has been called.
    
  2. Validation Layer: Add job to verify safe-outputs before proceeding

    validation:
      needs: agent
      runs-on: ubuntu-latest
      steps:
        - name: Validate safe-outputs
          run: |
            if [ ! -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then
              echo "::error::Agent did not create safe-outputs"
              exit 1
            fi
  3. Tool Usage Tracking: Log all tool calls during agent execution for debugging

  4. Agent Behavior Tests: Create test workflows that verify agents use safe-outputs tools correctly

Technical Details

Agent Execution Summary

Model: openai:gpt-4o-2024-08-06
Duration: 14.4s
Turns: 3
Cost: $0.0332
Tokens: ↑13.8kt ↓316t
Result: success (finish reason: stop)

MCP Servers Loaded

  • ✅ github (docker-based, v0.19.1)
  • ✅ safe_outputs (node-based, v1.0.0)

Tools Available

  • github_list_pull_requests ✅ (used)
  • safe_outputs_create_issue ⚠️ (available but not used)
  • safe_outputs_missing_tool ⚠️ (available but not used)

Files Expected But Not Created

  • /tmp/gh-aw/safe-outputs/outputs.jsonl
  • agent_output.json artifact

Pattern Information

Related Issues


Investigation Metadata:

  • Investigator: Smoke Detector (Outpost Agent)
  • Investigation Run: #18788241604
  • Pattern Database: /tmp/gh-aw/cache-memory/patterns/genaiscript_no_safe_outputs.json
  • Investigation Record: /tmp/gh-aw/cache-memory/investigations/2025-10-24-18788162015.json

AI generated by Smoke Detector - Smoke Test Failure Investigator

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions