Skip to content

[smoke-detector] 🔄 GenAIScript Agent Not Using Safe-Outputs - 3rd Consecutive Failure (Run #59) #2378

@github-actions

Description

@github-actions

🔄 Recurring Pattern Alert - 3rd Consecutive Failure

Summary

The Smoke GenAIScript workflow has failed AGAIN with the identical pattern documented in issues #2307 and #2351 (both closed as "not_planned"). The GenAIScript agent completes successfully but does not use safe-outputs MCP tools, causing the detection job to crash with a TypeError. This is the 3rd occurrence in 12 hours with a 100% failure rate for scheduled GenAIScript smoke tests.

Failure Details

  • Run: #18799180550
  • Run Number: 59
  • Commit: ea4df58
  • Branch: main
  • Trigger: schedule (automated smoke test)
  • Duration: 4.4 minutes
  • Failed Job: detection (2.3 minutes)
  • Status: ❌ FAILED

Recurrence Timeline

# Run Run ID Timestamp Hrs Since Prev Issue Status
1 #57 18788162015 2025-10-24 18:06 UTC - #2307 Closed as "not_planned"
2 #58 18795287355 2025-10-25 00:19 UTC ~6.2h #2351 Closed as "not_planned"
3 #59 18799180550 2025-10-25 06:06 UTC ~5.8h This issue Open

Pattern Established: Failing every ~6 hours on scheduled runs with 100% consistency.

Root Cause Analysis

The Core Problem (UNCHANGED from #2307 and #2351)

The failure pattern is identical across all 3 occurrences:

  1. ✅ Agent job completes successfully
  2. ✅ Safe-outputs MCP server properly initialized with tools available
  3. ✅ Agent receives clear instructions to create an issue
  4. Agent does NOT invoke safe_outputs_create_issue tool
  5. ❌ Agent generates text response instead of tool invocation
  6. ❌ No outputs.jsonl file created
  7. ❌ No agent_output.json artifact uploaded
  8. ❌ Detection job crashes with TypeError

Error Details

Detection Job Stack Trace:

2025-10-25T06:10:37.8128968Z 2025-10-25T06:10:37.812Z genaiscript:error {
  name: 'TypeError',
  message: "Cannot read properties of undefined (reading 'text')",
  stack: "TypeError: Cannot read properties of undefined (reading 'text')\n" +
    '    at githubActionSetOutputs ((redacted))\n' +
    '    at async Command.runScriptWithExitCode ((redacted))'
}
Cannot read properties of undefined (reading 'text')
TypeError: Cannot read properties of undefined (reading 'text')

Location: githubaction.js:12:27 in GenAIScript npm package

Why This Keeps Happening

The agent prompt says:

"Review the last 5 merged pull requests in this repository and post summary in an issue."

And includes:

"IMPORTANT: To create an issue, use the safe-outputs tools."

However, the agent interprets this as guidance rather than a requirement. It understands what needs to be done but considers the task complete without actually invoking the tool.

This is a prompt engineering problem - the language is not strong enough to force tool usage.

Failed Jobs and Errors

Job Execution Summary

  1. activation - succeeded (2s)
  2. agent - succeeded (1.4m) - Agent completed with no errors
  3. detection - FAILED (2.3m) - GenAIScript crashed with TypeError
  4. create_issue - succeeded (5s) - Created this issue
  5. ⏭️ missing_tool - skipped

Historical Context & Pattern Analysis

Pattern Information

Why This Is Different from OpenCode

OpenCode had the exact same issue (#2143) which was successfully fixed by #2164. The fix improved the prompt to make tool usage mandatory. GenAIScript continues to have this problem despite similar instructions being present.

Investigation Data

Recommended Actions

🔴 HIGH PRIORITY - Fix the Prompt

Option A: Learn from OpenCode Fix (#2164)

Review the OpenCode prompt changes that successfully resolved the same issue and apply similar approach to GenAIScript:

MANDATORY REQUIREMENT: You MUST use the 'safe_outputs_create_issue' tool to create 
a GitHub issue. Do not provide the summary in any other form.

SUCCESS CRITERIA: Task is ONLY complete when the create_issue tool has been invoked 
successfully. The workflow will fail if you do not call this tool.

Option B: Add Validation Step

Add a validation job that checks for safe-outputs before continuing:

validate_outputs:
  needs: agent
  runs-on: ubuntu-latest
  steps:
    - name: Check for safe-outputs
      run: |
        if [ ! -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then
          echo "::error::Agent completed but did not use safe-outputs tools"
          exit 1
        fi

🟡 MEDIUM PRIORITY - Fix GenAIScript Error Handling

File upstream bug with GenAIScript project:

  • Repository: https://github.com/microsoft/genaiscript
  • Issue: githubActionSetOutputs doesn't handle undefined results
  • Location: dist/src/githubaction.js:12:27
  • Fix Needed: Add null/undefined checks before accessing .text property
  • Impact: Better error messages when agent doesn't produce expected output

🟢 LOW PRIORITY - Alternative Solutions

If GenAIScript workflows are not actively maintained:

Option 1: Disable scheduled trigger to stop recurring failures

# Comment out the schedule trigger in .github/workflows/smoke-genaiscript.md

Option 2: Make detection job conditional on outputs.jsonl existence

detection:
  needs: agent
  if: hashFiles('/tmp/gh-aw/safe-outputs/outputs.jsonl') != ''

Option 3: Update smoke detector to skip creating issues for this known pattern

Impact Assessment

Severity: 🟡 MEDIUM (raised from LOW)

  • GenAIScript smoke tests have 100% failure rate
  • Pattern recurring every ~6 hours indefinitely
  • Multiple closed issues without resolution = pattern will continue
  • Wasting CI minutes and investigation overhead

Urgency: 🟡 MODERATE

  • Not blocking critical functionality
  • Provides false negatives about system health
  • Simple fix available (prompt improvement)

Scope:

  • Affected: GenAIScript scheduled smoke tests only
  • Frequency: Every ~6 hours (scheduled runs)
  • Duration: 12+ hours of continuous failures
  • CI Minutes Wasted: ~13 minutes (3 failures × 4.3 min average)

Prevention Strategies

  1. Improve Prompt Clarity - Use OpenCode's successful approach as template
  2. Add Output Validation - Check for outputs.jsonl before proceeding
  3. Better Error Handling - Fix GenAIScript to handle undefined results gracefully
  4. Conditional Jobs - Make detection conditional on safe-outputs existence
  5. Tool Forcing - Investigate if GenAIScript supports required tools
  6. Monitoring - Track which MCP tools are invoked during execution

Request for Decision

Since this is the 3rd consecutive occurrence and previous issues (#2307, #2351) were closed as "not_planned", I request a decision on:

Option 1: Fix the issue (recommended)

  • Apply OpenCode's successful prompt fix approach
  • Add validation to catch agent behavior issues early
  • Estimated effort: 30-60 minutes

Option 2: Disable GenAIScript scheduled smoke tests

  • If not actively maintaining, disable to prevent recurring failures
  • Estimated effort: 5 minutes

Option 3: Accept recurring failures as expected

  • Document this as expected behavior
  • Update smoke detector to not create issues for this pattern
  • Estimated effort: 15 minutes

Current situation (recurring failures every 6 hours, creating closed issues, no resolution) is not sustainable.

Related Issues

Reproduction Steps

  1. Configure GenAIScript agent with safe-outputs MCP
  2. Give agent task to "create an issue" with current prompt wording
  3. Run workflow
  4. Observe agent completes successfully without invoking tool
  5. Detection job crashes with TypeError

Investigation Metadata

  • Investigator: Smoke Detector (Failure Investigation Agent)
  • Investigation Run: #18799233829
  • Pattern: GENAISCRIPT_NO_SAFE_OUTPUTS (3rd occurrence)
  • Investigation Record: /tmp/gh-aw/cache-memory/investigations/2025-10-25-18799180550.json
  • Created: 2025-10-25T06:13:00Z

🤖 AI generated by Smoke Detector - Smoke Test Failure Investigator

AI generated by Smoke Detector - Smoke Test Failure Investigator

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions