-
Notifications
You must be signed in to change notification settings - Fork 46
Description
🔄 THIRD REGRESSION ALERT - Run #39: Second Fix Also Failed
Critical Update
BOTH fix attempts have now failed. Run #39 shows that commit 7cc5320, which claimed to address the runtime JSON construction issue identified in #2341, has NOT resolved the problem.
Summary
After TWO separate fix attempts:
- ✅ Commit f9dece6 (Run Test coverage console formatting august 13 #38) - Fixed YAML-level quoting
- ✅ Commit 7cc5320 (Run Fix Codex token usage summing for multiple chat requests #39) - Removed toJSON() wrapper
Result: ❌ Issue STILL persists
Failure Details - Run #39
- Run: 18795080703 (Run Fix Codex token usage summing for multiple chat requests #39)
- Commit: 7cc5320
- Branch: copilot/add-additional-mcp-server-support
- Trigger: workflow_dispatch
- Duration: 53 seconds
- Status: ❌ THIRD REGRESSION
Error Analysis
Error Message
Invalid JSON in --additional-mcp-config: Expected ',' or '}' after property value in JSON at position 558 (line 1 column 559)
Error Position Shift
| Run # | Commit | Error Position | Status |
|---|---|---|---|
| 36 | a160185 | 557 | ❌ Original failure |
| 37 | (same) | 557 | ❌ Duplicate |
| 38 | f9dece6 | 557 | ❌ First fix failed |
| 39 | 7cc5320 | 558 | ❌ Second fix failed |
Observation: Error position shifted by 1 character (557→558), indicating the JSON structure was modified by the fix, but the fundamental problem remains.
What the Second Fix Attempted
Commit 7cc5320 specifically addressed the runtime JSON construction issue:
Root Cause Analysis (from commit message)
1. Job-level env: GH_AW_SAFE_OUTPUTS_CONFIG = '{"create_issue":...}'
2. GitHub Actions evaluates: ${{ env.GH_AW_SAFE_OUTPUTS_CONFIG }} → {"create_issue":...}
3. With toJSON(): ${{ toJSON(env.GH_AW_SAFE_OUTPUTS_CONFIG) }} → "{\"create_issue\":...}"
4. Go's json.Marshal() adds another layer: ""{\"create_issue\":...}"" (DOUBLE QUOTES)
The Fix
Remove toJSON() wrapper since the env variable is already a JSON string.
Expected Result
✅ "GH_AW_SAFE_OUTPUTS_CONFIG":"{\"create_issue\":...}" (Valid JSON)
Actual Result
❌ Still invalid at position 558
Why This Fix Also Failed
Possible Reasons
-
Not Applied: The workflow lock files may not have been properly recompiled
- Need to verify the actual YAML contains the fix
- Check if there are cached artifacts preventing updates
-
Multiple Locations: There may be other places using this env var
- The fix was applied to one code path but not all
- Different steps/jobs might handle it differently
-
Third Layer of Escaping: There's another transformation we haven't identified
- Shell-level escaping before CLI execution?
- GitHub Actions expression evaluation doing something unexpected?
- String templating in Go code?
-
Wrong Analysis: The theory was correct but implementation missed something
- Maybe toJSON() wasn't actually the problem
- Or it's only part of the problem
-
Timing/Cache Issue: Changes not taking effect
- Build artifacts from previous runs?
- GitHub Actions cache?
Timeline of Failures
| Run # | Date | Time | Commit | Status | Fix Attempt |
|---|---|---|---|---|---|
| 36 | Oct 24 | 21:10 | a160185 | ❌ | - |
| 37 | Oct 24 | 22:43 | (same) | ❌ | - |
| - | Oct 24 | 23:13 | f9dece6 | - | First fix: YAML quoting |
| 38 | Oct 24 | 23:22 | f9dece6 | ❌ | Created #2341 |
| - | Oct 24 | 23:59 | 7cc5320 | - | Second fix: Remove toJSON() |
| 39 | Oct 25 | 00:04 | 7cc5320 | ❌ | This report |
Time between second fix and third failure: 5 minutes
Critical Next Steps
1. Verify the Fix Was Actually Applied
# Check the actual workflow lock file
cat .github/workflows/smoke-copilot.lock.yml | grep -A 20 "GH_AW_SAFE_OUTPUTS_CONFIG"
# Look for toJSON() usage
grep -r "toJSON.*GH_AW_SAFE_OUTPUTS_CONFIG" .github/workflows/If toJSON() is still present: The recompilation didn't work or wasn't committed.
If toJSON() is gone: The fix was applied but something else is wrong.
2. Add Comprehensive Debug Logging
# In the workflow, before executing copilot:
- name: Debug MCP Config
run: |
echo "Raw env var:"
echo "$GH_AW_SAFE_OUTPUTS_CONFIG"
echo "---"
echo "After expression evaluation:"
echo '${{ env.GH_AW_SAFE_OUTPUTS_CONFIG }}'
echo "---"
echo "Full MCP config that will be passed:"
# Print the actual --additional-mcp-config argument3. Test Locally
# Reproduce outside GitHub Actions:
export GH_AW_SAFE_OUTPUTS_CONFIG='{"create_issue":{"max":1,"min":1},"missing_tool":{}}'
# Build the MCP config JSON as the code does
go run ./cmd/awcli test-mcp-config4. Examine the Actual Command
From workflow logs, extract the EXACT command being run:
copilot --additional-mcp-config '...' --prompt "..."Then manually validate that JSON string.
5. Check All Code Paths
# Find all places where GH_AW_SAFE_OUTPUTS_CONFIG is used:
grep -rn "GH_AW_SAFE_OUTPUTS_CONFIG" pkg/
grep -rn "GH_AW_SAFE_OUTPUTS_CONFIG" .github/workflows/
# Check if there are multiple templates or steps:
grep -rn "additional-mcp-config" pkg/workflow/Recommended Actions
Immediate
- STOP trying to fix blindly - We need visibility first
- Add debug logging to see the actual values at each transformation step
- Verify the second fix was applied by checking the lock file
- Check if there are multiple code paths that need fixing
Short-term
-
Add validation that fails fast with detailed error message
func validateMCPConfig(config string) error { var test map[string]interface{} if err := json.Unmarshal([]byte(config), &test); err != nil { log.Printf("INVALID MCP CONFIG:") log.Printf("%s", config) return fmt.Errorf("MCP config is invalid JSON: %w", err) } return nil }
-
Add integration test that actually runs the CLI with the generated config
-
Simplify the approach: Maybe don't use env vars for JSON, use files instead?
Long-term
-
Redesign env var handling to be type-safe
- Use structs + json.Marshal() instead of string concatenation
- Have a clear contract for when values are JSON vs strings
- Add validation at compile time, not runtime
-
Document the transformation pipeline
1. Go code defines env var 2. YAML template is rendered 3. Lock file is generated 4. GitHub Actions evaluates expressions 5. Shell receives the command 6. CLI parses the JSONIdentify EXACTLY where each quoting/escaping happens.
Analysis
This is now a systematic failure of the fix process:
- ❌ First fix: Incomplete (missed runtime layer)
- ❌ Second fix: Also incomplete or not applied
The issue has survived two separate attempts by the Copilot agent to fix it, with detailed analysis and commit messages. This suggests:
- The problem is more complex than currently understood
- The debugging approach needs to change
- We need visibility into the actual data at runtime, not just theory
Recommendation: Before attempting a third fix, add comprehensive logging and verification. A third blind fix attempt without more data will likely also fail.
Investigation stored: /tmp/gh-aw/cache-memory/investigations/2025-10-25-18795080703.json
Pattern ID: COPILOT_DOUBLE_QUOTED_ENV_VAR_IN_MCP_CONFIG
Status: THIRD REGRESSION - TWO FIX ATTEMPTS FAILED
Critical: YES - Blocking Copilot smoke tests
AI generated by Smoke Detector - Smoke Test Failure Investigator
AI generated by Smoke Detector - Smoke Test Failure Investigator