Skip to content

[smoke-detector] [URGENT] Codex Smoke Tests STILL Failing After Issue Closure - 6th Occurrence (3+ Days) #2956

@github-actions

Description

@github-actions

🔍 Smoke Test Investigation - Run #77

Summary

The Smoke Codex workflow continues to fail with TOML parse error at line 30, column 109. This is the SIXTH occurrence since October 31st. Issue #2930 was closed as "not_planned" but the failures are continuing, indicating this critical problem needs urgent attention.

Failure Details

Root Cause Analysis

Primary Error

Error: TOML parse error at line 30, column 109
   |
30 | env = { "GH_AW_SAFE_OUTPUTS" = "/tmp/gh-aw/safeoutputs/outputs.jsonl", "GH_AW_SAFE_OUTPUTS_CONFIG" = "\"{\\"create_issue\\":{\\"max\\":1},\\"missing_tool\\":{}}\", ...
   |                                                                                                             ^
missing comma between key-value pairs, expected `,`

Technical Analysis

The Problem: Codex MCP Configuration + JSON Environment Variables = Invalid TOML

The Codex engine generates its MCP configuration using TOML inline table syntax with shell variable substitution. When GH_AW_SAFE_OUTPUTS_CONFIG (which contains JSON with nested quotes) is substituted into the TOML env inline table, the result is syntactically invalid TOML.

Why This Happens:

  1. GH_AW_SAFE_OUTPUTS_CONFIG contains: {"create_issue":{"max":1},"missing_tool":{}}
  2. This gets escaped for shell safety
  3. The escaped value is substituted into TOML: env = { "KEY" = "value" }
  4. Result: Nested quotes create invalid TOML that the parser rejects

Failed Jobs and Errors

Job Sequence

  1. pre_activation - succeeded (7s)
  2. activation - succeeded (4s)
  3. agent - FAILED (21s) - Codex CLI cannot parse TOML config
  4. ⏭️ detection - skipped
  5. ⏭️ missing_tool - skipped
  6. ⏭️ create_issue - skipped

Historical Context

This is a recurring pattern that has now occurred 6 times over 3+ days (76+ hours):

# Run ID Date Time Since Last Status
1 18975512058 2025-10-31 14:24 Initial ❌ Failed
2 18977321431 2025-10-31 15:31 ~1 hour ❌ Failed
3 18988214642 2025-11-01 00:12 ~9 hours ❌ Failed
4 18992422186 2025-11-01 06:03 ~6 hours ❌ Failed
5 18996415568 2025-11-01 12:03 ~6 hours ❌ Failed
6 19000564081 2025-11-01 18:03 ~6 hours ❌ Failed

Critical Note: Issue #2930 documented the 4th occurrence and was created on 2025-11-01 06:12. It was subsequently closed as "not_planned" on 2025-11-01 14:53. However, the 5th occurrence happened at 12:03 (before closure) and now the 6th occurrence has happened at 18:03 (after closure), proving the problem persists.

Pattern: Every scheduled Codex smoke test since October 31st has failed with the exact same error.

Investigation History

Previous investigations have documented this issue extensively:

Recommended Actions

CRITICAL Priority (Do Immediately)

  • Implement file-based TOML configuration for Codex

    Implementation Steps:

    1. Create renderCodexMCPConfigFile() in pkg/workflow/mcp-config.go

      func renderCodexMCPConfigFile(mcpServers map[string]interface{}) (string, error) {
          // Generate TOML config file content
          // Write to /tmp/gh-aw/mcp-config/config.toml
          // Return file path
      }
    2. Update pkg/workflow/codex_engine.go

      • Call renderCodexMCPConfigFile() during engine setup
      • Change CLI invocation from inline config to: codex --config /tmp/gh-aw/mcp-config/config.toml
      • Remove inline TOML generation logic
    3. Pattern to Follow: Copy from Claude and GenAIScript engines which already use file-based configs successfully

    Estimated Effort: 2-4 hours

    Benefits:

    • Eliminates quote escaping issues entirely
    • Matches pattern used by other engines
    • More maintainable and testable
    • Prevents future similar issues

HIGH Priority

  • Add integration tests for MCP config generation

    • Test that generated TOML is valid (parse with actual TOML parser)
    • Test with JSON-valued environment variables
    • Test across all engines (Claude, Copilot, Codex, GenAIScript)
  • Make smoke tests blocking for PRs

    • Prevent merges when smoke tests are failing
    • Add smoke test status to required checks
    • Currently smoke tests run on schedule but don't block merges
  • Add pre-merge validation

    • Check for inline TOML generation with env vars
    • Validate generated configs can be parsed
    • Run smoke tests in PR CI (at least subset)

Prevention Strategies

  1. Architectural:

    • Never use inline config substitution with complex values (JSON, nested quotes, etc.)
    • File-based configs eliminate entire class of escaping bugs
    • Standardize approach across all engines
  2. Testing:

    • Integration tests that parse generated configs with real parsers
    • Smoke tests as required checks for PRs
    • Test matrix covering all engines × all config scenarios
  3. CI/CD:

    • Run smoke tests on every PR that touches engine code
    • Block merges when smoke tests fail
    • Alert on repeated failures
  4. Process:

    • Don't close issues as "not_planned" when failures are ongoing
    • Treat consecutive smoke test failures as P0 incidents
    • Require smoke test fixes before merging other changes

Technical Details

Engine Comparison

Engine Config Method Format Inline/File Status
Codex Inline TOML TOML Inline ❌ Broken
Claude Code File-based JSON File ✅ Works
Copilot Inline JSON JSON Inline ⚠️ Fragile
GenAIScript File-based JSON File ✅ Works

Observation: Engines using file-based configs don't have these issues.

Why File-Based Configs Are Better

  1. No Escaping Issues: File content doesn't go through shell interpretation
  2. Better Debugging: Can inspect actual config file on disk
  3. Easier Testing: Can test config generation independently
  4. More Maintainable: Clearer code, fewer edge cases
  5. Proven Pattern: Already working for Claude and GenAIScript

Example Fix (Pseudocode)

// Current (broken):
func (e *CodexEngine) BuildAgentStep() {
    mcpConfig := buildMCPConfigTOML()  // Inline TOML with $VARS
    command := fmt.Sprintf(`codex --config-inline "%s"`, mcpConfig)
    // Shell substitution + TOML parsing = 💥
}

// Fixed:
func (e *CodexEngine) BuildAgentStep() {
    configPath := "/tmp/gh-aw/mcp-config/codex-config.toml"
    renderCodexMCPConfigFile(mcpServers, configPath)  // Write to file
    command := fmt.Sprintf(`codex --config "%s"`, configPath)
    // No escaping issues! 🎉
}

Impact Assessment

Current Impact

  • All Codex smoke tests failing since Oct 31 (76+ hours)
  • No automated validation of Codex engine changes
  • ⚠️ Risk of shipping bugs without working smoke tests
  • ⚠️ Developer velocity reduced due to manual testing needs
  • 🔴 Issue closure without fix creates confusion and technical debt

Risk if Not Fixed

  • 🔴 High risk of introducing Codex regressions
  • 🔴 Cannot verify Codex engine works in production scenarios
  • 🔴 Eroding confidence in CI/CD pipeline
  • 🔴 Technical debt accumulating with workarounds
  • 🔴 False signal from closing issues without fixing problems

Benefits of Fixing

  • ✅ Restore automated Codex validation
  • ✅ Prevent future config escaping issues
  • ✅ Standardize config approach across engines
  • ✅ Improve testability and maintainability
  • ✅ Increase confidence in Codex deployments
  • ✅ Reduce investigation overhead (6 investigations so far!)

Why This Needs Urgent Attention

  1. Issue [smoke-detector] [CRITICAL] Codex Smoke Tests Failing - TOML Parse Error (4th Occurrence) #2930 was closed as "not_planned" but failures continue
  2. 6 consecutive failures over 76+ hours show this is systemic, not transient
  3. Zero Codex validation for 3+ days = high risk of regressions
  4. Recommended fix is clear and well-documented (file-based config)
  5. Fix is estimated at only 2-4 hours but saves ongoing investigation time
  6. Other engines prove the pattern works (Claude, GenAIScript)

Related Information


Investigation Timestamp: 2025-11-01 18:07:00 UTC

  • Investigator: Smoke Detector
  • Investigation Run: #19000577848
  • Pattern ID: CODEX_TOML_JSON_ESCAPING
  • Severity: CRITICAL
  • Occurrence Count: 6 (and counting)
  • First Occurrence: 2025-10-31 14:24 UTC (76+ hours ago)
  • Is Flaky: No (100% reproducible, deterministic failure)

Labels: smoke-test, investigation, codex, critical, configuration, mcp, toml, urgent

AI generated by Smoke Detector - Smoke Test Failure Investigator

AI generated by Smoke Detector - Smoke Test Failure Investigator

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions