[smoke-detector] 🔄 Smoke GenAIScript Recurring Failure - Agent Does Not Use Safe-Outputs (Run #58)

# 🔄 Recurring Failure Alert - Run #58

## Summary
The Smoke GenAIScript workflow failed **AGAIN** with the same pattern documented in issue #2307 (now closed as "not_planned"). The GenAIScript agent completes successfully but does not use safe-outputs MCP tools, causing the detection job to crash with a TypeError. This is the **2nd occurrence** of this pattern in less than 6 hours.

## Failure Details
- **Run**: [#18795287355](https://github.com/githubnext/gh-aw/actions/runs/18795287355)
- **Run Number**: 58
- **Commit**: 60e85eb0a6bd4c4e8da3d1a5578987cf49e02d62 - "Fix firewall log parser rejecting invalid domains from Squid error messages (#2330)"
- **Branch**: main
- **Trigger**: schedule (automated smoke test)
- **Duration**: 3.2 minutes
- **Status**: ❌ FAILED

## Recurrence Timeline

| Occurrence | Run # | Run ID | Timestamp | Issue | Status |
|------------|-------|--------|-----------|-------|--------|
| 1st | 57 | [18788162015](https://github.com/githubnext/gh-aw/actions/runs/18788162015) | 2025-10-24 18:06 UTC | #2307 | Closed as "not_planned" |
| 2nd | **58** | **[18795287355](https://github.com/githubnext/gh-aw/actions/runs/18795287355)** | **2025-10-25 00:19 UTC** | **This issue** | **Open** |

**Time Between Occurrences**: ~6 hours (scheduled smoke test interval)

## Root Cause Analysis

### Identical Pattern to Issue #2307

The failure pattern is **exactly the same**:

1. ✅ Agent job completes successfully (1.6m runtime)
2. ✅ Safe-outputs MCP server properly initialized
3. ✅ Tools available: `safe_outputs_create_issue`, `safe_outputs_missing_tool`
4. ✅ Agent receives prompt: "Review the last 5 merged pull requests and post summary in an issue"
5. ❌ Agent generates text summary but **does NOT invoke** `create_issue` tool
6. ❌ No `outputs.jsonl` file created
7. ❌ Detection job fails with TypeError

### Error Details

**Detection Job Failure**:
```
2025-10-25T00:22:11.4422911Z Failed to load MCP configuration: MCP configuration file not found: /tmp/gh-aw/mcp-config/mcp-servers.json
2025-10-25T00:22:11.4888208Z 2025-10-25T00:22:11.488Z genaiscript:error {
2025-10-25T00:22:11.4888781Z   name: 'TypeError',
2025-10-25T00:22:11.4889324Z   message: "Cannot read properties of undefined (reading 'text')",
2025-10-25T00:22:11.4890106Z   stack: "TypeError: Cannot read properties of undefined (reading 'text')\n" +
2025-10-25T00:22:11.4891146Z     '    at githubActionSetOutputs ((redacted))\n' +
2025-10-25T00:22:11.4892394Z     '    at async Command.runScriptWithExitCode ((redacted))'
2025-10-25T00:22:11.4893217Z }
2025-10-25T00:22:11.4893584Z Cannot read properties of undefined (reading 'text')
```

The error occurs at `/tmp/gh-aw/aw-mcp/logs/run-18795287355/workflow-logs/2_detection.txt:886-896`.

### What the Agent Did

**Agent Output** (from `outputs.jsonl`):
```json
{
  "title": "Summary of Recently Merged Pull Requests",
  "body": "### Recent Merged Pull Requests Summary:\n\n1. **[WIP] Update logs command to run firewall log parser**\n   - Status: Closed without merging\n   \n2. **Fix firewall log parser rejecting invalid domains from Squid error messages**\n   - Status: Merged successfully\n   - Link: [PR #2330](https://github.com/githubnext/gh-aw/pull/2330)\n\n[... 2 more PRs ...]",
  "type": "create_issue"
}
```

The agent **generated the correct content** but delivered it as a text response instead of invoking the `safe_outputs_create_issue` MCP tool.

## Failed Jobs and Errors

### Job Sequence
1. ✅ **activation** - succeeded (4s)
2. ✅ **agent** - succeeded (1.6m) - Agent completed successfully
3. ❌ **detection** - **FAILED** (52s) - GenAIScript crashed with TypeError
4. ✅ **create_issue** - succeeded (5s) - Created this issue
5. ⏭️ **missing_tool** - skipped

## Why This Keeps Happening

### Core Problem
The GenAIScript agent does not recognize that **tool usage is MANDATORY**. The prompt says:

> "Review the last 5 merged pull requests in this repository and post summary in an issue."

And includes instructions:

> **IMPORTANT**: To create an issue, use the **safe-outputs** tools. Use the create-issue tool from the safe-outputs MCP.

However, the agent interprets this as guidance rather than a requirement, completing the task by generating text output instead of invoking the MCP tool.

### Similar Patterns Across Engines

| Engine | Pattern | Status | Issue |
|--------|---------|--------|-------|
| **GenAIScript** | **Agent doesn't use safe-outputs** | **Recurring** | **#2307 (closed), this issue** |
| OpenCode | Agent doesn't use safe-outputs | Fixed | #2143 (closed), #2164 (implemented fix) |
| Claude | Generally reliable with safe-outputs | - | - |
| Copilot | Different issues (JSON config) | Various | Multiple |

**Note**: OpenCode had the same issue (#2143) which was **fixed** by #2164. GenAIScript continues to have this problem.

## Recommended Actions

### High Priority

- [ ] **Make prompt more explicit about MANDATORY tool usage**
  ```
  TASK: Review the last 5 merged pull requests in this repository.
  
  MANDATORY REQUIREMENT: You MUST use the 'safe_outputs_create_issue' tool to create a GitHub 
  issue with your summary. Do not provide the summary in any other form. 
  
  SUCCESS CRITERIA: Task is ONLY complete when the create_issue tool has been called successfully.
  ```

- [ ] **Fix GenAIScript error handling** (Upstream bug)
  - Repository: https://github.com/microsoft/genaiscript
  - Issue: `githubActionSetOutputs` function doesn't handle undefined results
  - Location: `dist/src/githubaction.js:12:27`
  - Fix: Add null/undefined checks before accessing `.text` property

- [ ] **Add validation that safe-outputs tools are called**
  ```yaml
  - name: Validate safe-outputs
    run: |
      if [ ! -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then
        echo "::error::Agent completed but did not use safe-outputs tools"
        exit 1
      fi
  ```

### Medium Priority

- [ ] **Make detection job conditional on outputs.jsonl existence**
  ```yaml
  detection:
    needs: agent
    if: hashFiles('/tmp/gh-aw/safe-outputs/outputs.jsonl') != ''
  ```

- [ ] **Investigate GenAIScript tool forcing**
  - Check if GenAIScript supports marking tools as "required"
  - Similar to function calling with `tool_choice: {"type": "function", "function": {"name": "create_issue"}}`

- [ ] **Add intermediate validation job**
  ```yaml
  validate_outputs:
    needs: agent
    runs-on: ubuntu-latest
    steps:
      - name: Check for safe-outputs
        run: |
          if [ ! -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then
            echo "::warning::Agent did not use safe-outputs tools"
          fi
  ```

### Low Priority

- [ ] **Learn from OpenCode fix** (#2164)
  - Review how OpenCode was fixed to use safe-outputs
  - Apply similar approach to GenAIScript if applicable

- [ ] **Add debug logging**
  - Log available MCP tools during agent execution
  - Track which tools are invoked
  - Verify safe-outputs MCP initialization

## Impact Assessment

**Severity**: 🟡 **MEDIUM** (raised from previous assessment)
- GenAIScript smoke tests failing on every scheduled run
- Pattern is recurring after issue was closed as "not_planned"
- Will continue to fail indefinitely without intervention

**Urgency**: 🟡 **MODERATE**
- Not blocking critical functionality
- Smoke test failures provide false negatives
- Wasting CI minutes on recurring failures

**Frequency**: Every ~6 hours (scheduled smoke test runs)

## Historical Context

### Pattern Information
- **Pattern ID**: `GENAISCRIPT_NO_SAFE_OUTPUTS`
- **First Detected**: 2025-10-24 18:06:54 UTC
- **Total Occurrences**: 2
- **Failure Rate**: 100% of GenAIScript smoke tests
- **Related Pattern**: `OPENCODE_NO_SAFE_OUTPUTS` (fixed in #2164)

### Investigation Data
- **Investigation Record**: `/tmp/gh-aw/cache-memory/investigations/2025-10-25-18795287355.json`
- **Pattern Record**: `/tmp/gh-aw/cache-memory/patterns/genaiscript_no_safe_outputs.json`
- **Previous Issue**: #2307 (closed as "not_planned" on 2025-10-24 21:15:42Z)

## Related Issues

- #2307 - First occurrence (closed as "not_planned")
- #2143 - OpenCode same issue (closed, fixed)
- #2164 - OpenCode fix implementation (completed)

---

## Request for Action

Since issue #2307 was closed as "not_planned" but the failure continues on every scheduled run:

**Option 1**: Fix the issue (recommended)
- Enhance prompt to make tool usage mandatory
- Add validation to catch agent behavior issues early

**Option 2**: Disable scheduled GenAIScript smoke tests
- If not actively maintaining, disable scheduled runs
- Prevents recurring failed runs and investigation overhead

**Option 3**: Accept recurring failures
- Document that this is expected behavior
- Update smoke detector to skip creating issues for this pattern

Please advise on the preferred approach.

---

**Investigation Metadata:**
- **Investigator**: Smoke Detector (Failure Investigation Agent)
- **Investigation Run**: [#18795331289](https://github.com/githubnext/gh-aw/actions/runs/18795331289)
- **Pattern**: `GENAISCRIPT_NO_SAFE_OUTPUTS` (2nd occurrence)
- **Created**: 2025-10-25T00:24:00Z

> AI generated by [Smoke Detector - Smoke Test Failure Investigator](https://github.com/githubnext/gh-aw/actions/runs/18795331289)




> AI generated by [Smoke Detector - Smoke Test Failure Investigator](https://github.com/githubnext/gh-aw/actions/runs/18795331289)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[smoke-detector] 🔄 Smoke GenAIScript Recurring Failure - Agent Does Not Use Safe-Outputs (Run #58) #2351

🔄 Recurring Failure Alert - Run #58

Summary

Failure Details

Recurrence Timeline

Root Cause Analysis

Identical Pattern to Issue #2307

Error Details

What the Agent Did

Failed Jobs and Errors

Job Sequence

Why This Keeps Happening

Core Problem

Similar Patterns Across Engines

Recommended Actions

High Priority

Medium Priority

Low Priority

Impact Assessment

Historical Context

Pattern Information

Investigation Data

Related Issues

Request for Action

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Occurrence	Run #	Run ID	Timestamp	Issue	Status
1st	57	18788162015	2025-10-24 18:06 UTC	#2307	Closed as "not_planned"
2nd	58	18795287355	2025-10-25 00:19 UTC	This issue	Open

Engine	Pattern	Status	Issue
GenAIScript	Agent doesn't use safe-outputs	Recurring	#2307 (closed), this issue
OpenCode	Agent doesn't use safe-outputs	Fixed	#2143 (closed), #2164 (implemented fix)
Claude	Generally reliable with safe-outputs	-	-
Copilot	Different issues (JSON config)	Various	Multiple

[smoke-detector] 🔄 Smoke GenAIScript Recurring Failure - Agent Does Not Use Safe-Outputs (Run #58) #2351

Description

🔄 Recurring Failure Alert - Run #58

Summary

Failure Details

Recurrence Timeline

Root Cause Analysis

Identical Pattern to Issue #2307

Error Details

What the Agent Did

Failed Jobs and Errors

Job Sequence

Why This Keeps Happening

Core Problem

Similar Patterns Across Engines

Recommended Actions

High Priority

Medium Priority

Low Priority

Impact Assessment

Historical Context

Pattern Information

Investigation Data

Related Issues

Request for Action

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions