[smoke-detector] 🔍 Smoke Test Investigation - Smoke GenAIScript Run #57: Agent Does Not Use Safe-Outputs MCP Tools

# 🔍 Smoke Test Investigation - Run #57

## Summary
The Smoke GenAIScript workflow failed because the `create_issue` and `detection` jobs could not find the expected `agent_output.json` artifact. The GenAIScript agent completed successfully but did not use the safe-outputs MCP tools despite being instructed to create an issue, leaving no output artifact for downstream jobs.

## Failure Details
- **Run**: [#18788162015](https://github.com/githubnext/gh-aw/actions/runs/18788162015)
- **Run Number**: 57
- **Commit**: 417ca9d9dafdf90b8dfec1c462aa54fe0b63f8c6
- **Branch**: main
- **Trigger**: schedule
- **Duration**: 3.3 minutes
- **Failed Jobs**: detection (1.0m), create_issue (5s)
- **Workflow**: Smoke GenAIScript

## Root Cause Analysis

### Primary Issue
The GenAIScript agent received the prompt: **"Review the last 5 merged pull requests in this repository and post summary in an issue."**

Despite clear instructions to create an issue, the agent:
1. ✅ Completed successfully (14.4s runtime, cost $0.0332)
2. ✅ Had access to safe-outputs MCP tools (`safe_outputs_create_issue`, `safe_outputs_missing_tool`)
3. ✅ Successfully used GitHub MCP tools to fetch pull request data
4. ❌ Did NOT use the `create_issue` tool from the safe-outputs MCP
5. ❌ Did NOT create `/tmp/gh-aw/safe-outputs/outputs.jsonl`
6. ❌ Left no output artifact for downstream jobs

The agent's final output was:
```
The remaining pull requests are beyond my current data access. I can analyze the details one by one for the remaining PRs to compile a complete summary. Let me know if you'd like to proceed this way!
```

This indicates the agent interpreted the task as an interactive dialogue rather than recognizing it needed to use the safe-outputs tool to create the issue.

### MCP Configuration Status
✅ Safe-outputs MCP server was properly initialized:
```
[safe-outputs-mcp-server] v1.0.0 ready on stdio
  output file: /tmp/gh-aw/safe-outputs/outputs.jsonl
  config: {"create_issue":{"max":1,"min":1},"missing_tool":{}}
  tools: create_issue, missing_tool
```

✅ Tools were registered and available to the agent:
```
mcp safe_outputs: tools: [ 'safe_outputs_create_issue', 'safe_outputs_missing_tool' ]
```

### Workflow Configuration
- **Staged Mode**: `true` (GH_AW_SAFE_OUTPUTS_STAGED=true)
- **Expected Outputs**: create_issue (min: 1, max: 1)
- **Model**: openai:gpt-4o-2024-08-06
- **Agent Execution**: Successful (3 turns, 13.8kt tokens input, 316t output)

## Failed Jobs and Errors

### Job Sequence
1. ✅ **activation** - succeeded (4s)
2. ✅ **agent** - succeeded (1.6m)
3. ❌ **detection** - failed (1.0m)
4. ❌ **create_issue** - failed (5s)
5. ⏭️ **missing_tool** - skipped

### Error Details

**Detection Job Error:**
```
Unable to download artifact(s): Artifact not found for name: agent_output.json
Please ensure that your artifact is not expired and the artifact was uploaded using a compatible version of toolkit/upload-artifact.
```

**Create Issue Job Error:**
```
Error reading agent output file: ENOENT: no such file or directory, 
open '/tmp/gh-aw/safe-outputs/agent_output.json'
```

Both jobs failed because they depend on an artifact that was never created since the agent didn't use safe-outputs MCP tools.

## Investigation Findings

### Why Did This Happen?

**Possible Reasons:**

1. **Agent Interpretation Issue**: GenAIScript agent may have interpreted "post summary in an issue" as a natural language instruction rather than recognizing the explicit requirement to use the `safe_outputs_create_issue` tool

2. **Prompt Ambiguity**: The prompt states:
   ```
   **IMPORTANT**: To do the actions mentioned in the header of this section, use the **safe-outputs** tools, 
   do NOT attempt to use `gh`, do NOT attempt to use the GitHub API.
   
   To create an issue, use the create-issue tool from the safe-outputs MCP
   ```
   
   However, the agent may not have connected this instruction with the main task

3. **Interactive Mode Behavior**: Agent's response ("Let me know if you'd like to proceed this way!") suggests it entered interactive/conversational mode rather than task completion mode

4. **Staged Mode Effect**: With `GH_AW_SAFE_OUTPUTS_STAGED=true`, the agent may behave differently, though this shouldn't prevent tool usage

### Historical Context
This is a **NEW pattern for GenAIScript** but similar to issue #2143 (OpenCode):

| Engine | Pattern | Status | Issue |
|--------|---------|--------|-------|
| OpenCode | Agent doesn't use safe-outputs | Closed | #2143 |
| **GenAIScript** | **Agent doesn't use safe-outputs** | **NEW** | **This issue** |
| Claude | Generally reliable with safe-outputs | - | - |
| Copilot | Permission denied errors | Recurring | #2288 |

### Comparison with Other Engines
- **Claude**: More reliable at following MCP tool usage instructions
- **OpenCode**: Same issue (documented in #2143)
- **Copilot**: Different issue (permission denied, not failure to use tools)

## Recommended Actions

### High Priority
- [ ] **Make prompt more explicit about tool usage**
  ```
  Review the last 5 merged pull requests in this repository.
  
  MANDATORY: You MUST use the 'safe_outputs_create_issue' tool to create a GitHub issue 
  with your summary. Do not provide the summary in any other form. The create_issue tool 
  call is REQUIRED to complete this task successfully.
  ```

- [ ] **Add validation that safe-outputs tools are called**
  - Modify GenAIScript workflow to check if outputs.jsonl was created
  - Fail early if agent completes without using required tools

- [ ] **Test with explicit tool forcing** (if GenAIScript supports it)
  - Some AI frameworks allow marking tools as "required"
  - Check if GenAIScript has similar capability

### Medium Priority
- [ ] **Make downstream jobs conditional**
  ```yaml
  detection:
    needs: agent
    if: hashFiles('/tmp/gh-aw/safe-outputs/outputs.jsonl') != ''
  ```

- [ ] **Add intermediate validation job**
  ```yaml
  validate_outputs:
    needs: agent
    runs-on: ubuntu-latest
    outputs:
      has_safe_outputs: ${{ steps.check.outputs.has_outputs }}
    steps:
      - name: Check for safe-outputs
        id: check
        run: |
          if [ -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then
            echo "has_outputs=true" >> $GITHUB_OUTPUT
          else
            echo "has_outputs=false" >> $GITHUB_OUTPUT
            echo "::warning::Agent completed but did not use safe-outputs tools"
          fi
  ```

- [ ] **Add debug logging in agent job**
  ```yaml
  - name: Verify safe-outputs tools loaded
    run: |
      echo "Checking available MCP tools..."
      # Log GenAIScript tool availability
  ```

### Low Priority
- [ ] **Create fallback issue when safe-outputs not used**
  - Detect when outputs.jsonl is missing
  - Create issue via GitHub Actions workflow directly
  - Include agent's text output and warning about tool usage

- [ ] **Enhance error messages**
  - Provide clearer feedback when agent completes without using expected tools
  - Add troubleshooting steps to workflow failure messages

## Prevention Strategies

1. **Explicit Required Tool Instructions**
   ```
   TASK: Review the last 5 merged pull requests.
   
   REQUIRED TOOL USAGE:
   1. Use github_list_pull_requests to fetch PR data
   2. Use safe_outputs_create_issue to create the issue
   
   SUCCESS CRITERIA: Task is only complete when create_issue tool has been called.
   ```

2. **Validation Layer**: Add job to verify safe-outputs before proceeding
   ```yaml
   validation:
     needs: agent
     runs-on: ubuntu-latest
     steps:
       - name: Validate safe-outputs
         run: |
           if [ ! -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then
             echo "::error::Agent did not create safe-outputs"
             exit 1
           fi
   ```

3. **Tool Usage Tracking**: Log all tool calls during agent execution for debugging

4. **Agent Behavior Tests**: Create test workflows that verify agents use safe-outputs tools correctly

## Technical Details

### Agent Execution Summary
```
Model: openai:gpt-4o-2024-08-06
Duration: 14.4s
Turns: 3
Cost: $0.0332
Tokens: ↑13.8kt ↓316t
Result: success (finish reason: stop)
```

### MCP Servers Loaded
- ✅ github (docker-based, v0.19.1)
- ✅ safe_outputs (node-based, v1.0.0)

### Tools Available
- github_list_pull_requests ✅ (used)
- safe_outputs_create_issue ⚠️ (available but not used)
- safe_outputs_missing_tool ⚠️ (available but not used)

### Files Expected But Not Created
- ❌ `/tmp/gh-aw/safe-outputs/outputs.jsonl`
- ❌ `agent_output.json` artifact

## Pattern Information

- **Pattern ID**: `GENAISCRIPT_NO_SAFE_OUTPUTS`
- **Category**: Agent Behavior - Safe-Outputs Not Used
- **Severity**: Medium
- **Flakiness**: Not flaky - consistent behavior
- **Recurring**: First occurrence for GenAIScript
- **Related Pattern**: `OPENCODE_NO_SAFE_OUTPUTS` (issue #2143)

## Related Issues

- #2143 - OpenCode agent doesn't use safe-outputs (closed)

---

**Investigation Metadata:**
- **Investigator**: Smoke Detector (Outpost Agent)
- **Investigation Run**: [#18788241604](https://github.com/githubnext/gh-aw/actions/runs/18788241604)
- **Pattern Database**: `/tmp/gh-aw/cache-memory/patterns/genaiscript_no_safe_outputs.json`
- **Investigation Record**: `/tmp/gh-aw/cache-memory/investigations/2025-10-24-18788162015.json`




> AI generated by [Smoke Detector - Smoke Test Failure Investigator](https://github.com/githubnext/gh-aw/actions/runs/18788241604)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[smoke-detector] 🔍 Smoke Test Investigation - Smoke GenAIScript Run #57: Agent Does Not Use Safe-Outputs MCP Tools #2307

🔍 Smoke Test Investigation - Run #57

Summary

Failure Details

Root Cause Analysis

Primary Issue

MCP Configuration Status

Workflow Configuration

Failed Jobs and Errors

Job Sequence

Error Details

Investigation Findings

Why Did This Happen?

Historical Context

Comparison with Other Engines

Recommended Actions

High Priority

Medium Priority

Low Priority

Prevention Strategies

Technical Details

Agent Execution Summary

MCP Servers Loaded

Tools Available

Files Expected But Not Created

Pattern Information

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Engine	Pattern	Status	Issue
OpenCode	Agent doesn't use safe-outputs	Closed	#2143
GenAIScript	Agent doesn't use safe-outputs	NEW	This issue
Claude	Generally reliable with safe-outputs	-	-
Copilot	Permission denied errors	Recurring	#2288

[smoke-detector] 🔍 Smoke Test Investigation - Smoke GenAIScript Run #57: Agent Does Not Use Safe-Outputs MCP Tools #2307

Description

🔍 Smoke Test Investigation - Run #57

Summary

Failure Details

Root Cause Analysis

Primary Issue

MCP Configuration Status

Workflow Configuration

Failed Jobs and Errors

Job Sequence

Error Details

Investigation Findings

Why Did This Happen?

Historical Context

Comparison with Other Engines

Recommended Actions

High Priority

Medium Priority

Low Priority

Prevention Strategies

Technical Details

Agent Execution Summary

MCP Servers Loaded

Tools Available

Files Expected But Not Created

Pattern Information

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions