Skip to content

🏥 CI Failure🏥 CI failure investigation: smoke copilot run 21780014599 #562

@github-actions

Description

@github-actions

Summary

The Smoke Copilot workflow failed during its scheduled run on the main branch.

Workflow Description

The Smoke Copilot workflow performs integration testing of:

  1. GitHub MCP - Reviews last 2 merged PRs
  2. Playwright - Navigates to github.com and verifies page title
  3. File Writing - Creates test file in /tmp/gh-aw/agent/
  4. Bash Tools - Verifies file creation with cat command

Investigation Limitations

⚠️ Limited Access: The CI Doctor workflow cannot access GitHub API logs due to authentication constraints. This investigation is based on:

  • Workflow context from the triggering event
  • Similar past failures from cache memory
  • Workflow configuration analysis

Similar Past Failures

1. Pelis Agent Factory Advisor Failure (Run 21773244180)

  • Exit code 1 during "Execute GitHub Copilot CLI" step
  • Warning: No safe outputs file found at /opt/gh-aw/safeoutputs/outputs.jsonl
  • Notice: GitHub MCP lockdown mode enabled for public repository
  • Uses awf v0.13.12 with --enable-chroot flag

2. Examples Test SSL Failure (Run 21728924123)

Common Failure Patterns

Based on repository history, likely causes include:

  1. Docker Network Issues

    • Pool overlaps (awf-net conflicts)
    • Orphaned containers from previous runs
    • Cleanup issues from timeout kills
  2. MCP Server Issues

    • GitHub MCP lockdown mode in public repos
    • Safe outputs file not created
    • MCP gateway startup failures
  3. SSL/TLS Issues

    • HTTPS_PROXY environment variable problems
    • Squid proxy intercept mode failures
    • Certificate validation errors
  4. Copilot CLI Issues

    • Exit code 1 without clear error message
    • Tool execution failures
    • Timeout during long-running operations (5-minute limit)

Recommended Actions

Immediate Investigation

  1. Download workflow logs manually from GitHub Actions UI

    gh run download 21780014599
  2. Check for Docker resource leaks on GitHub Actions runners

    • Orphaned awf-net networks
    • Stale containers from previous runs
  3. Review Copilot CLI logs if preserved at /tmp/awf-agent-logs-*

Debugging Steps

  1. Re-run the workflow with debug logging:

    awf --log-level debug --keep-containers ...
  2. Check Squid proxy logs for blocked domains:

    sudo cat /tmp/squid-logs-*/access.log | grep TCP_DENIED
  3. Verify GitHub MCP server configuration:

    • Check /home/runner/.copilot/mcp-config.json
    • Confirm --disable-builtin-mcps flag usage
    • Verify GITHUB_PERSONAL_ACCESS_TOKEN environment variable

Prevention

  1. Add pre-test cleanup to smoke test workflows:

    docker network prune -f
    docker container prune -f
  2. Enhance error reporting in Copilot CLI execution step:

    • Capture full stdout/stderr
    • Preserve agent logs as artifacts
    • Add explicit error detection
  3. Monitor scheduled runs for patterns:

    • Check if failures correlate with specific times
    • Identify if runner resource exhaustion is a factor

Next Steps

  1. Manual log review required - Download logs from run 21780014599
  2. 🔍 Reproduce locally if possible using same commit SHA
  3. 📊 Update cache memory with findings once logs are analyzed
  4. 🛠️ Implement fixes based on root cause analysis

🏥 Automatically investigated by CI Doctor workflow
⚠️ This is a preliminary report due to GitHub API access limitations. Manual follow-up required.

AI generated by CI Doctor

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingci

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions