[smoke-detector] ⚠️ CRITICAL: GenAIScript Smoke Test - 17 Consecutive Failures Require Decision

# ⚠️ CRITICAL DECISION REQUIRED - Run #76

## Executive Summary

The Smoke GenAIScript workflow has now failed **17 times consecutively** over 1.5 days with a 100% failure rate on scheduled runs. Four previous investigation issues (#2227, #2307, #2351, #2378) were all closed as "not_planned", yet the failures continue every ~6 hours, **wasting ~60 CI minutes** and creating investigation overhead.

**This issue requests an explicit decision** on one of three options to resolve the unsustainable situation.

## Failure Details

- **Run**: [#18806622842]((redacted))
- **Run Number**: 76
- **Commit**: ce47d8231e3f4d6b9be20c337fb7b9ce36df7772
- **Commit Message**: "Fix trailing whitespace in create_issue.cjs (#2454)"
- **Branch**: main
- **Trigger**: schedule (automated smoke test)
- **Duration**: 3.3 minutes
- **Status**: ❌ FAILED (detection job)

## Failure Pattern Timeline

| Occurrence | Run # | Run ID | Date | Issue | Status | Decision |
|------------|-------|--------|------|-------|--------|----------|
| 1 | #57 | [18788162015]((redacted)) | Oct 24 18:06 | #2307 | Closed | not_planned |
| 2 | #58 | [18795287355]((redacted)) | Oct 25 00:19 | #2351 | Closed | not_planned |
| 3 | #59 | [18799180550]((redacted)) | Oct 25 06:06 | #2378 | Closed | not_planned |
| ... | ... | ... | ... | ... | ... | ... |
| **17** | **#76** | **[18806622842]((redacted))** | **Oct 25 18:05** | **This issue** | **Open** | **Requested** |

**Pattern**: Failing every ~6 hours on scheduled runs since Oct 24, 100% failure rate, 17 consecutive failures.

## Root Cause Analysis

### Confirmed Root Cause

The GenAIScript agent completes successfully but **does NOT use safe-outputs MCP tools** despite clear instructions:

1. ✅ Agent job succeeds (1.4m runtime)
2. ✅ Safe-outputs MCP server properly initialized
3. ✅ Tools available: `safe_outputs_create_issue`
4. ❌ **Agent generates text response instead of invoking tool**
5. ❌ No `outputs.jsonl` file created
6. ❌ Detection job crashes: `TypeError: Cannot read properties of undefined (reading 'text')`

### Error Details

```
2025-10-25T18:08:27.4229207Z 2025-10-25T18:08:27.422Z genaiscript:error {
  name: 'TypeError',
  message: "Cannot read properties of undefined (reading 'text')",
  stack: "TypeError: Cannot read properties of undefined (reading 'text')\n" +
    '    at githubActionSetOutputs ((redacted))\n' +
    '    at async Command.runScriptWithExitCode ((redacted))'
}
```

**Location**: `pkg/workflow/js/create_issue.cjs:12:27` - GenAIScript's githubActionSetOutputs function

### Why This Happens

**Prompt Engineering Issue**: The agent prompt says "use the safe-outputs tools" but the agent interprets this as guidance rather than a requirement. The agent considers the task complete after generating text output without actually invoking the MCP tool.

### OpenCode Had Identical Issue - And Fixed It

| Engine | Issue | Fix | Status |
|--------|-------|-----|--------|
| **OpenCode** | #2143 | **#2164** | ✅ **FIXED** - Prompt improved, now passes consistently |
| **GenAIScript** | #2307, #2351, #2378 | None | ❌ **17 consecutive failures** - Fix not applied |

**Proven solution exists** from OpenCode fix (#2164): Make prompt more explicit about mandatory tool usage.

## Failed Jobs Summary

1. ✅ **activation** - succeeded (3s)
2. ✅ **agent** - succeeded (1.4m) - Agent completes without calling required tool
3. ❌ **detection** - **FAILED** (1.1m) - GenAIScript crashes with TypeError
4. ✅ **create_issue** - succeeded (7s) - Created this issue
5. ⏭️ **missing_tool** - skipped

## Impact Assessment

### Resource Impact
- **Consecutive Failures**: 17
- **Failure Duration**: 1.5 days
- **CI Minutes Wasted**: ~60 minutes (17 × 3.5 min average)
- **Investigation Issues Created**: 4 (all closed as "not_planned")
- **Investigation Runs**: 17+ smoke detector investigations

### Urgency
- **Severity**: 🔴 **CRITICAL** (escalated from medium)
- **Urgency**: 🔴 **HIGH** (escalated from moderate)
- **Reason for Escalation**: 17 consecutive failures with no resolution, wasting CI resources indefinitely

### Why This Is Critical

1. **Unsustainable Pattern**: Failing every 6 hours indefinitely
2. **Resource Waste**: ~60 CI minutes wasted, 17+ investigation runs
3. **Decision Paralysis**: 4 issues closed as "not_planned" but workflow continues to run and fail
4. **Proven Fix Available**: OpenCode had same issue and successfully fixed it
5. **No End In Sight**: Without action, will continue failing every 6 hours forever

## THREE OPTIONS FOR RESOLUTION

### Option 1: Fix the Issue (RECOMMENDED)

**Effort**: 30-60 minutes  
**Approach**: Learn from OpenCode fix (#2164)

**Actions**:
- [ ] Review OpenCode PR #2164 to see how they made tool usage mandatory
- [ ] Apply similar prompt improvements to GenAIScript workflow
- [ ] Make tool invocation explicitly required with stronger language:
  ```markdown
  MANDATORY REQUIREMENT: You MUST use the 'safe_outputs_create_issue' tool.
  Do not provide output in any other form. Task is ONLY complete when the 
  create_issue tool has been successfully invoked.
  ```
- [ ] Add validation step to verify `outputs.jsonl` exists before detection job
- [ ] Test on a manual trigger to confirm fix works

**Benefits**:
- GenAIScript smoke tests will pass
- Stops wasting CI resources
- Consistent with OpenCode fix approach
- Validates safe-outputs MCP integration works properly

**Risks**: Low - proven fix from OpenCode

---

### Option 2: Disable Scheduled Runs (PRAGMATIC)

**Effort**: 5 minutes  
**Approach**: Stop running workflow that consistently fails

**Actions**:
- [ ] Comment out or remove schedule trigger from `.github/workflows/smoke-genaiscript.md`
- [ ] Keep manual trigger available for testing when needed
- [ ] Document reason for disabling in workflow comments

**Benefits**:
- Immediate stop to CI waste
- No more investigation overhead
- Can revisit when ready to fix

**Risks**: None - manual trigger still available

---

### Option 3: Accept Recurring Failures (NOT RECOMMENDED)

**Effort**: 15 minutes  
**Approach**: Document as expected behavior

**Actions**:
- [ ] Update smoke detector to skip GenAIScript failures
- [ ] Document that GenAIScript smoke tests are expected to fail
- [ ] Add workflow comment explaining the expected failure

**Benefits**:
- No code changes needed
- Acknowledges current state

**Risks**: 
- Continues wasting CI resources
- Confusing for future maintainers
- Sets poor precedent

## Recommended Actions

**PRIMARY RECOMMENDATION: Option 1 (Fix the Issue)**

Rationale:
1. Proven fix exists from OpenCode (#2164)
2. Only 30-60 minutes effort
3. Stops resource waste permanently
4. Validates MCP safe-outputs integration
5. Consistent with how team fixed OpenCode

**ALTERNATIVE: Option 2 (Disable Scheduled Runs)**

If GenAIScript workflows are not actively maintained or if fix effort is not justified, disabling scheduled runs is the pragmatic choice to stop wasting resources.

**NOT RECOMMENDED: Option 3 (Accept Failures)**

Continuing to run and fail indefinitely while closing investigation issues wastes resources and creates confusion.

## Historical Context

### Pattern Information
- **Pattern ID**: `GENAISCRIPT_API_OR_OUTPUT_ERROR` / `GENAISCRIPT_NO_SAFE_OUTPUTS`
- **First Detected**: Oct 24 00:17 UTC (evolved from earlier configuration issues)
- **First NO_SAFE_OUTPUTS**: Oct 24 18:06 UTC (run #57)
- **Total Occurrences**: 17+
- **Failure Rate**: 100% of scheduled runs
- **Related Pattern**: `OPENCODE_NO_SAFE_OUTPUTS` (✅ **FIXED** in #2164)

### Investigation Data
- **Investigation Record**: `/tmp/gh-aw/cache-memory/investigations/2025-10-25-18806622842.json`
- **Pattern Record**: `/tmp/gh-aw/cache-memory/patterns/genaiscript_api_or_output_error.json`
- **Previous Issues**: #2227, #2307, #2351, #2378 (all closed as "not_planned")

### Comparison with OpenCode Success

| Aspect | OpenCode | GenAIScript |
|--------|----------|-------------|
| **Issue** | #2143 - Agent doesn't use safe-outputs | #2307, #2351, #2378 - Same issue |
| **Fix** | #2164 - Improved prompt | None applied |
| **Result** | ✅ Passing consistently | ❌ 17 consecutive failures |
| **Time to Fix** | ~1 day | Still failing after 1.5 days |

## Related Issues

- #2378 - 3rd occurrence (closed Oct 25 06:31, run #59)
- #2351 - 2nd occurrence (closed Oct 25 00:58, run #58)
- #2307 - 1st occurrence (closed Oct 24 21:15, run #57)
- #2227 - Earlier related issue (closed Oct 23 20:16)
- #2164 - **OpenCode fix** (✅ successfully resolved same pattern)
- #2143 - OpenCode same issue (closed, fixed by #2164)

## Request for Explicit Decision

**The current situation is unsustainable.** The workflow runs every 6 hours, fails every time, creates investigation overhead, wastes CI resources, and has resulted in 4 closed issues with no action taken.

**Please choose one of the three options above** so we can either:
1. Fix the issue (recommended - proven fix exists)
2. Disable scheduled runs (pragmatic - stops waste)
3. Accept failures as expected (not recommended - continues waste)

Closing this issue as "not_planned" without action will result in another identical issue in ~6 hours when the next scheduled run fails.

---

**Investigation Metadata:**
- **Investigator**: Smoke Detector (Failure Investigation Agent)
- **Investigation Run**: [#18806662813]((redacted))
- **Pattern**: `GENAISCRIPT_API_OR_OUTPUT_ERROR` (17th consecutive occurrence)
- **Created**: 2025-10-25T18:11:00Z

> 🤖 AI generated by [Smoke Detector - Smoke Test Failure Investigator]((redacted))




> AI generated by [Smoke Detector - Smoke Test Failure Investigator](https://github.com/githubnext/gh-aw/actions/runs/18806662813)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[smoke-detector] ⚠️ CRITICAL: GenAIScript Smoke Test - 17 Consecutive Failures Require Decision #2459

⚠️ CRITICAL DECISION REQUIRED - Run #76

Executive Summary

Failure Details

Failure Pattern Timeline

Root Cause Analysis

Confirmed Root Cause

Error Details

Why This Happens

OpenCode Had Identical Issue - And Fixed It

Failed Jobs Summary

Impact Assessment

Resource Impact

Urgency

Why This Is Critical

THREE OPTIONS FOR RESOLUTION

Option 1: Fix the Issue (RECOMMENDED)

Option 2: Disable Scheduled Runs (PRAGMATIC)

Option 3: Accept Recurring Failures (NOT RECOMMENDED)

Recommended Actions

Historical Context

Pattern Information

Investigation Data

Comparison with OpenCode Success

Related Issues

Request for Explicit Decision

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Occurrence	Run #	Run ID	Date	Issue	Status	Decision
1	#57	18788162015	Oct 24 18:06	#2307	Closed	not_planned
2	#58	18795287355	Oct 25 00:19	#2351	Closed	not_planned
3	#59	18799180550	Oct 25 06:06	#2378	Closed	not_planned
...	...	...	...	...	...	...
17	#76	18806622842	Oct 25 18:05	This issue	Open	Requested

Engine	Issue	Fix	Status
OpenCode	#2143	#2164	✅ FIXED - Prompt improved, now passes consistently
GenAIScript	#2307, #2351, #2378	None	❌ 17 consecutive failures - Fix not applied

Aspect	OpenCode	GenAIScript
Issue	#2143 - Agent doesn't use safe-outputs	#2307, #2351, #2378 - Same issue
Fix	#2164 - Improved prompt	None applied
Result	✅ Passing consistently	❌ 17 consecutive failures
Time to Fix	~1 day	Still failing after 1.5 days

[smoke-detector] ⚠️ CRITICAL: GenAIScript Smoke Test - 17 Consecutive Failures Require Decision #2459

Description

⚠️ CRITICAL DECISION REQUIRED - Run #76

Executive Summary

Failure Details

Failure Pattern Timeline

Root Cause Analysis

Confirmed Root Cause

Error Details

Why This Happens

OpenCode Had Identical Issue - And Fixed It

Failed Jobs Summary

Impact Assessment

Resource Impact

Urgency

Why This Is Critical

THREE OPTIONS FOR RESOLUTION

Option 1: Fix the Issue (RECOMMENDED)

Option 2: Disable Scheduled Runs (PRAGMATIC)

Option 3: Accept Recurring Failures (NOT RECOMMENDED)

Recommended Actions

Historical Context

Pattern Information

Investigation Data

Comparison with OpenCode Success

Related Issues

Request for Explicit Decision

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions