-
Notifications
You must be signed in to change notification settings - Fork 46
Description
🔄 Recurring Failure Alert - Run #58
Summary
The Smoke GenAIScript workflow failed AGAIN with the same pattern documented in issue #2307 (now closed as "not_planned"). The GenAIScript agent completes successfully but does not use safe-outputs MCP tools, causing the detection job to crash with a TypeError. This is the 2nd occurrence of this pattern in less than 6 hours.
Failure Details
- Run: #18795287355
- Run Number: 58
- Commit: 60e85eb - "Fix firewall log parser rejecting invalid domains from Squid error messages (Fix firewall log parser rejecting invalid domains from Squid error messages #2330)"
- Branch: main
- Trigger: schedule (automated smoke test)
- Duration: 3.2 minutes
- Status: ❌ FAILED
Recurrence Timeline
| Occurrence | Run # | Run ID | Timestamp | Issue | Status |
|---|---|---|---|---|---|
| 1st | 57 | 18788162015 | 2025-10-24 18:06 UTC | #2307 | Closed as "not_planned" |
| 2nd | 58 | 18795287355 | 2025-10-25 00:19 UTC | This issue | Open |
Time Between Occurrences: ~6 hours (scheduled smoke test interval)
Root Cause Analysis
Identical Pattern to Issue #2307
The failure pattern is exactly the same:
- ✅ Agent job completes successfully (1.6m runtime)
- ✅ Safe-outputs MCP server properly initialized
- ✅ Tools available:
safe_outputs_create_issue,safe_outputs_missing_tool - ✅ Agent receives prompt: "Review the last 5 merged pull requests and post summary in an issue"
- ❌ Agent generates text summary but does NOT invoke
create_issuetool - ❌ No
outputs.jsonlfile created - ❌ Detection job fails with TypeError
Error Details
Detection Job Failure:
2025-10-25T00:22:11.4422911Z Failed to load MCP configuration: MCP configuration file not found: /tmp/gh-aw/mcp-config/mcp-servers.json
2025-10-25T00:22:11.4888208Z 2025-10-25T00:22:11.488Z genaiscript:error {
2025-10-25T00:22:11.4888781Z name: 'TypeError',
2025-10-25T00:22:11.4889324Z message: "Cannot read properties of undefined (reading 'text')",
2025-10-25T00:22:11.4890106Z stack: "TypeError: Cannot read properties of undefined (reading 'text')\n" +
2025-10-25T00:22:11.4891146Z ' at githubActionSetOutputs ((redacted))\n' +
2025-10-25T00:22:11.4892394Z ' at async Command.runScriptWithExitCode ((redacted))'
2025-10-25T00:22:11.4893217Z }
2025-10-25T00:22:11.4893584Z Cannot read properties of undefined (reading 'text')
The error occurs at /tmp/gh-aw/aw-mcp/logs/run-18795287355/workflow-logs/2_detection.txt:886-896.
What the Agent Did
Agent Output (from outputs.jsonl):
{
"title": "Summary of Recently Merged Pull Requests",
"body": "### Recent Merged Pull Requests Summary:\n\n1. **[WIP] Update logs command to run firewall log parser**\n - Status: Closed without merging\n \n2. **Fix firewall log parser rejecting invalid domains from Squid error messages**\n - Status: Merged successfully\n - Link: [PR #2330](https://github.com/githubnext/gh-aw/pull/2330)\n\n[... 2 more PRs ...]",
"type": "create_issue"
}The agent generated the correct content but delivered it as a text response instead of invoking the safe_outputs_create_issue MCP tool.
Failed Jobs and Errors
Job Sequence
- ✅ activation - succeeded (4s)
- ✅ agent - succeeded (1.6m) - Agent completed successfully
- ❌ detection - FAILED (52s) - GenAIScript crashed with TypeError
- ✅ create_issue - succeeded (5s) - Created this issue
- ⏭️ missing_tool - skipped
Why This Keeps Happening
Core Problem
The GenAIScript agent does not recognize that tool usage is MANDATORY. The prompt says:
"Review the last 5 merged pull requests in this repository and post summary in an issue."
And includes instructions:
IMPORTANT: To create an issue, use the safe-outputs tools. Use the create-issue tool from the safe-outputs MCP.
However, the agent interprets this as guidance rather than a requirement, completing the task by generating text output instead of invoking the MCP tool.
Similar Patterns Across Engines
| Engine | Pattern | Status | Issue |
|---|---|---|---|
| GenAIScript | Agent doesn't use safe-outputs | Recurring | #2307 (closed), this issue |
| OpenCode | Agent doesn't use safe-outputs | Fixed | #2143 (closed), #2164 (implemented fix) |
| Claude | Generally reliable with safe-outputs | - | - |
| Copilot | Different issues (JSON config) | Various | Multiple |
Note: OpenCode had the same issue (#2143) which was fixed by #2164. GenAIScript continues to have this problem.
Recommended Actions
High Priority
-
Make prompt more explicit about MANDATORY tool usage
TASK: Review the last 5 merged pull requests in this repository. MANDATORY REQUIREMENT: You MUST use the 'safe_outputs_create_issue' tool to create a GitHub issue with your summary. Do not provide the summary in any other form. SUCCESS CRITERIA: Task is ONLY complete when the create_issue tool has been called successfully. -
Fix GenAIScript error handling (Upstream bug)
- Repository: https://github.com/microsoft/genaiscript
- Issue:
githubActionSetOutputsfunction doesn't handle undefined results - Location:
dist/src/githubaction.js:12:27 - Fix: Add null/undefined checks before accessing
.textproperty
-
Add validation that safe-outputs tools are called
- name: Validate safe-outputs run: | if [ ! -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then echo "::error::Agent completed but did not use safe-outputs tools" exit 1 fi
Medium Priority
-
Make detection job conditional on outputs.jsonl existence
detection: needs: agent if: hashFiles('/tmp/gh-aw/safe-outputs/outputs.jsonl') != ''
-
Investigate GenAIScript tool forcing
- Check if GenAIScript supports marking tools as "required"
- Similar to function calling with
tool_choice: {"type": "function", "function": {"name": "create_issue"}}
-
Add intermediate validation job
validate_outputs: needs: agent runs-on: ubuntu-latest steps: - name: Check for safe-outputs run: | if [ ! -f "/tmp/gh-aw/safe-outputs/outputs.jsonl" ]; then echo "::warning::Agent did not use safe-outputs tools" fi
Low Priority
-
Learn from OpenCode fix ([q] Fix OpenCode MCP server integration - Enable safe-outputs tools #2164)
- Review how OpenCode was fixed to use safe-outputs
- Apply similar approach to GenAIScript if applicable
-
Add debug logging
- Log available MCP tools during agent execution
- Track which tools are invoked
- Verify safe-outputs MCP initialization
Impact Assessment
Severity: 🟡 MEDIUM (raised from previous assessment)
- GenAIScript smoke tests failing on every scheduled run
- Pattern is recurring after issue was closed as "not_planned"
- Will continue to fail indefinitely without intervention
Urgency: 🟡 MODERATE
- Not blocking critical functionality
- Smoke test failures provide false negatives
- Wasting CI minutes on recurring failures
Frequency: Every ~6 hours (scheduled smoke test runs)
Historical Context
Pattern Information
- Pattern ID:
GENAISCRIPT_NO_SAFE_OUTPUTS - First Detected: 2025-10-24 18:06:54 UTC
- Total Occurrences: 2
- Failure Rate: 100% of GenAIScript smoke tests
- Related Pattern:
OPENCODE_NO_SAFE_OUTPUTS(fixed in [q] Fix OpenCode MCP server integration - Enable safe-outputs tools #2164)
Investigation Data
- Investigation Record:
/tmp/gh-aw/cache-memory/investigations/2025-10-25-18795287355.json - Pattern Record:
/tmp/gh-aw/cache-memory/patterns/genaiscript_no_safe_outputs.json - Previous Issue: [smoke-detector] 🔍 Smoke Test Investigation - Smoke GenAIScript Run #57: Agent Does Not Use Safe-Outputs MCP Tools #2307 (closed as "not_planned" on 2025-10-24 21:15:42Z)
Related Issues
- [smoke-detector] 🔍 Smoke Test Investigation - Smoke GenAIScript Run #57: Agent Does Not Use Safe-Outputs MCP Tools #2307 - First occurrence (closed as "not_planned")
- [smoke-detector] 🔍 Smoke Test Investigation - Smoke OpenCode Run #18722224746: Agent Does Not Use Safe-Outputs MCP Tools #2143 - OpenCode same issue (closed, fixed)
- [q] Fix OpenCode MCP server integration - Enable safe-outputs tools #2164 - OpenCode fix implementation (completed)
Request for Action
Since issue #2307 was closed as "not_planned" but the failure continues on every scheduled run:
Option 1: Fix the issue (recommended)
- Enhance prompt to make tool usage mandatory
- Add validation to catch agent behavior issues early
Option 2: Disable scheduled GenAIScript smoke tests
- If not actively maintaining, disable scheduled runs
- Prevents recurring failed runs and investigation overhead
Option 3: Accept recurring failures
- Document that this is expected behavior
- Update smoke detector to skip creating issues for this pattern
Please advise on the preferred approach.
Investigation Metadata:
- Investigator: Smoke Detector (Failure Investigation Agent)
- Investigation Run: #18795331289
- Pattern:
GENAISCRIPT_NO_SAFE_OUTPUTS(2nd occurrence) - Created: 2025-10-25T00:24:00Z
AI generated by Smoke Detector - Smoke Test Failure Investigator
AI generated by Smoke Detector - Smoke Test Failure Investigator