[smoke-detector] 🔄 GenAIScript Agent Not Using Safe-Outputs - 3rd Consecutive Failure (Run #59)

# 🔄 Recurring Pattern Alert - 3rd Consecutive Failure

## Summary
The Smoke GenAIScript workflow has failed **AGAIN** with the identical pattern documented in issues #2307 and #2351 (both closed as "not_planned"). The GenAIScript agent completes successfully but does not use safe-outputs MCP tools, causing the detection job to crash with a TypeError. This is the **3rd occurrence** in 12 hours with a **100% failure rate** for scheduled GenAIScript smoke tests.

## Failure Details
- **Run**: [#18799180550](https://github.com/githubnext/gh-aw/actions/runs/18799180550)
- **Run Number**: 59
- **Commit**: ea4df5833c32e00185bd96a7f4102a4054f7aa33
- **Branch**: main
- **Trigger**: schedule (automated smoke test)
- **Duration**: 4.4 minutes
- **Failed Job**: detection (2.3 minutes)
- **Status**: ❌ FAILED

## Recurrence Timeline

| # | Run | Run ID | Timestamp | Hrs Since Prev | Issue | Status |
|---|-----|--------|-----------|----------------|-------|--------|
| 1 | #57 | [18788162015](https://github.com/githubnext/gh-aw/actions/runs/18788162015) | 2025-10-24 18:06 UTC | - | #2307 | Closed as "not_planned" |
| 2 | #58 | [18795287355](https://github.com/githubnext/gh-aw/actions/runs/18795287355) | 2025-10-25 00:19 UTC | ~6.2h | #2351 | Closed as "not_planned" |
| 3 | **#59** | **[18799180550](https://github.com/githubnext/gh-aw/actions/runs/18799180550)** | **2025-10-25 06:06 UTC** | **~5.8h** | **This issue** | **Open** |

**Pattern Established**: Failing every ~6 hours on scheduled runs with 100% consistency.

## Root Cause Analysis

### The Core Problem (UNCHANGED from #2307 and #2351)

The failure pattern is **identical across all 3 occurrences**:

1. ✅ Agent job completes successfully
2. ✅ Safe-outputs MCP server properly initialized with tools available
3. ✅ Agent receives clear instructions to create an issue
4. ❌ **Agent does NOT invoke `safe_outputs_create_issue` tool**
5. ❌ Agent generates text response instead of tool invocation
6. ❌ No `outputs.jsonl` file created
7. ❌ No `agent_output.json` artifact uploaded
8. ❌ Detection job crashes with TypeError

### Error Details

**Detection Job Stack Trace**:
```
2025-10-25T06:10:37.8128968Z 2025-10-25T06:10:37.812Z genaiscript:error {
  name: 'TypeError',
  message: "Cannot read properties of undefined (reading 'text')",
  stack: "TypeError: Cannot read properties of undefined (reading 'text')\n" +
    '    at githubActionSetOutputs ((redacted))\n' +
    '    at async Command.runScriptWithExitCode ((redacted))'
}
Cannot read properties of undefined (reading 'text')
TypeError: Cannot read properties of undefined (reading 'text')
```

**Location**: `githubaction.js:12:27` in GenAIScript npm package

### Why This Keeps Happening

The agent prompt says:
> "Review the last 5 merged pull requests in this repository and post summary in an issue."

And includes:
> "**IMPORTANT**: To create an issue, use the **safe-outputs** tools."

However, the agent interprets this as **guidance** rather than a **requirement**. It understands what needs to be done but considers the task complete without actually invoking the tool.

**This is a prompt engineering problem** - the language is not strong enough to force tool usage.

## Failed Jobs and Errors

### Job Execution Summary
1. ✅ **activation** - succeeded (2s)
2. ✅ **agent** - succeeded (1.4m) - Agent completed with no errors
3. ❌ **detection** - **FAILED** (2.3m) - GenAIScript crashed with TypeError
4. ✅ **create_issue** - succeeded (5s) - Created this issue
5. ⏭️ **missing_tool** - skipped

## Historical Context & Pattern Analysis

### Pattern Information
- **Pattern ID**: `GENAISCRIPT_NO_SAFE_OUTPUTS`
- **First Detected**: 2025-10-24 18:06:54 UTC
- **Total Occurrences**: 3
- **Failure Rate**: 100% of GenAIScript scheduled smoke tests
- **Frequency**: Every ~6 hours (scheduled interval)
- **Related Pattern**: `OPENCODE_NO_SAFE_OUTPUTS` (**FIXED** in #2164)

### Why This Is Different from OpenCode

OpenCode had the **exact same issue** (#2143) which was **successfully fixed** by #2164. The fix improved the prompt to make tool usage mandatory. GenAIScript continues to have this problem despite similar instructions being present.

### Investigation Data
- **Investigation Record**: `/tmp/gh-aw/cache-memory/investigations/2025-10-25-18799180550.json`
- **Pattern Record**: `/tmp/gh-aw/cache-memory/patterns/genaiscript_no_safe_outputs.json`
- **Previous Issues**: #2307 (closed), #2351 (closed)

## Recommended Actions

### 🔴 HIGH PRIORITY - Fix the Prompt

**Option A: Learn from OpenCode Fix (#2164)**

Review the OpenCode prompt changes that successfully resolved the same issue and apply similar approach to GenAIScript:

```markdown
MANDATORY REQUIREMENT: You MUST use the 'safe_outputs_create_issue' tool to create 
a GitHub issue. Do not provide the summary in any other form.

SUCCESS CRITERIA: Task is ONLY complete when the create_issue tool has been invoked 
successfully. The workflow will fail if you do not call this tool.
```

**Option B: Add Validation Step**

Add a validation job that checks for safe-outputs before continuing:

```yaml
validate_outputs:
  needs: agent
  runs-on: ubuntu-latest
  steps:
    - name: Check for safe-outputs
      run: |
        if [ ! -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then
          echo "::error::Agent completed but did not use safe-outputs tools"
          exit 1
        fi
```

### 🟡 MEDIUM PRIORITY - Fix GenAIScript Error Handling

File upstream bug with GenAIScript project:
- **Repository**: https://github.com/microsoft/genaiscript
- **Issue**: `githubActionSetOutputs` doesn't handle undefined results
- **Location**: `dist/src/githubaction.js:12:27`
- **Fix Needed**: Add null/undefined checks before accessing `.text` property
- **Impact**: Better error messages when agent doesn't produce expected output

### 🟢 LOW PRIORITY - Alternative Solutions

If GenAIScript workflows are not actively maintained:

**Option 1**: Disable scheduled trigger to stop recurring failures
```yaml
# Comment out the schedule trigger in .github/workflows/smoke-genaiscript.md
```

**Option 2**: Make detection job conditional on outputs.jsonl existence
```yaml
detection:
  needs: agent
  if: hashFiles('/tmp/gh-aw/safe-outputs/outputs.jsonl') != ''
```

**Option 3**: Update smoke detector to skip creating issues for this known pattern

## Impact Assessment

**Severity**: 🟡 **MEDIUM** (raised from LOW)
- GenAIScript smoke tests have 100% failure rate
- Pattern recurring every ~6 hours indefinitely
- Multiple closed issues without resolution = pattern will continue
- Wasting CI minutes and investigation overhead

**Urgency**: 🟡 **MODERATE**
- Not blocking critical functionality
- Provides false negatives about system health
- Simple fix available (prompt improvement)

**Scope**:
- **Affected**: GenAIScript scheduled smoke tests only
- **Frequency**: Every ~6 hours (scheduled runs)
- **Duration**: 12+ hours of continuous failures
- **CI Minutes Wasted**: ~13 minutes (3 failures × 4.3 min average)

## Prevention Strategies

1. **Improve Prompt Clarity** - Use OpenCode's successful approach as template
2. **Add Output Validation** - Check for outputs.jsonl before proceeding
3. **Better Error Handling** - Fix GenAIScript to handle undefined results gracefully
4. **Conditional Jobs** - Make detection conditional on safe-outputs existence
5. **Tool Forcing** - Investigate if GenAIScript supports required tools
6. **Monitoring** - Track which MCP tools are invoked during execution

## Request for Decision

Since this is the **3rd consecutive occurrence** and previous issues (#2307, #2351) were closed as "not_planned", I request a decision on:

**Option 1**: Fix the issue (recommended)
- Apply OpenCode's successful prompt fix approach
- Add validation to catch agent behavior issues early
- Estimated effort: 30-60 minutes

**Option 2**: Disable GenAIScript scheduled smoke tests
- If not actively maintaining, disable to prevent recurring failures
- Estimated effort: 5 minutes

**Option 3**: Accept recurring failures as expected
- Document this as expected behavior
- Update smoke detector to not create issues for this pattern
- Estimated effort: 15 minutes

**Current situation** (recurring failures every 6 hours, creating closed issues, no resolution) is not sustainable.

## Related Issues

- #2351 - 2nd occurrence (closed as "not_planned")
- #2307 - 1st occurrence (closed as "not_planned")
- #2164 - OpenCode fix implementation (**SUCCESSFULLY RESOLVED** same issue)
- #2143 - OpenCode same issue (closed, fixed by #2164)

## Reproduction Steps

1. Configure GenAIScript agent with safe-outputs MCP
2. Give agent task to "create an issue" with current prompt wording
3. Run workflow
4. Observe agent completes successfully without invoking tool
5. Detection job crashes with TypeError

---

## Investigation Metadata

- **Investigator**: Smoke Detector (Failure Investigation Agent)
- **Investigation Run**: [#18799233829](https://github.com/githubnext/gh-aw/actions/runs/18799233829)
- **Pattern**: `GENAISCRIPT_NO_SAFE_OUTPUTS` (3rd occurrence)
- **Investigation Record**: `/tmp/gh-aw/cache-memory/investigations/2025-10-25-18799180550.json`
- **Created**: 2025-10-25T06:13:00Z

> 🤖 AI generated by [Smoke Detector - Smoke Test Failure Investigator](https://github.com/githubnext/gh-aw/actions/runs/18799233829)




> AI generated by [Smoke Detector - Smoke Test Failure Investigator](https://github.com/githubnext/gh-aw/actions/runs/18799233829)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[smoke-detector] 🔄 GenAIScript Agent Not Using Safe-Outputs - 3rd Consecutive Failure (Run #59) #2378

🔄 Recurring Pattern Alert - 3rd Consecutive Failure

Summary

Failure Details

Recurrence Timeline

Root Cause Analysis

The Core Problem (UNCHANGED from #2307 and #2351)

Error Details

Why This Keeps Happening

Failed Jobs and Errors

Job Execution Summary

Historical Context & Pattern Analysis

Pattern Information

Why This Is Different from OpenCode

Investigation Data

Recommended Actions

🔴 HIGH PRIORITY - Fix the Prompt

🟡 MEDIUM PRIORITY - Fix GenAIScript Error Handling

🟢 LOW PRIORITY - Alternative Solutions

Impact Assessment

Prevention Strategies

Request for Decision

Related Issues

Reproduction Steps

Investigation Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

#	Run	Run ID	Timestamp	Hrs Since Prev	Issue	Status
1	#57	18788162015	2025-10-24 18:06 UTC	-	#2307	Closed as "not_planned"
2	#58	18795287355	2025-10-25 00:19 UTC	~6.2h	#2351	Closed as "not_planned"
3	#59	18799180550	2025-10-25 06:06 UTC	~5.8h	This issue	Open

[smoke-detector] 🔄 GenAIScript Agent Not Using Safe-Outputs - 3rd Consecutive Failure (Run #59) #2378

Description

🔄 Recurring Pattern Alert - 3rd Consecutive Failure

Summary

Failure Details

Recurrence Timeline

Root Cause Analysis

The Core Problem (UNCHANGED from #2307 and #2351)

Error Details

Why This Keeps Happening

Failed Jobs and Errors

Job Execution Summary

Historical Context & Pattern Analysis

Pattern Information

Why This Is Different from OpenCode

Investigation Data

Recommended Actions

🔴 HIGH PRIORITY - Fix the Prompt

🟡 MEDIUM PRIORITY - Fix GenAIScript Error Handling

🟢 LOW PRIORITY - Alternative Solutions

Impact Assessment

Prevention Strategies

Request for Decision

Related Issues

Reproduction Steps

Investigation Metadata

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions