-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Summary
The Smoke Copilot workflow failed during its scheduled run on the main branch.
- Run ID: 21780014599
- Commit: f84db28 ("docs: sync documentation with recent code changes [docs] docs: sync documentation with recent code changes #561")
- Event: Scheduled (runs every 12 hours)
- Conclusion: failure
- Created: 2026-02-07T12:21:05Z
Workflow Description
The Smoke Copilot workflow performs integration testing of:
- GitHub MCP - Reviews last 2 merged PRs
- Playwright - Navigates to github.com and verifies page title
- File Writing - Creates test file in
/tmp/gh-aw/agent/ - Bash Tools - Verifies file creation with
catcommand
Investigation Limitations
- Workflow context from the triggering event
- Similar past failures from cache memory
- Workflow configuration analysis
Similar Past Failures
1. Pelis Agent Factory Advisor Failure (Run 21773244180)
- Exit code 1 during "Execute GitHub Copilot CLI" step
- Warning: No safe outputs file found at
/opt/gh-aw/safeoutputs/outputs.jsonl - Notice: GitHub MCP lockdown mode enabled for public repository
- Uses awf v0.13.12 with
--enable-chrootflag
2. Examples Test SSL Failure (Run 21728924123)
- Exit code 35 (CURLE_SSL_CONNECT_ERROR)
- Root cause: HTTPS_PROXY removal in PR fix: remove HTTP_PROXY/HTTPS_PROXY env vars from agent container #524
- Broke SSL connections through Squid proxy in intercept mode
Common Failure Patterns
Based on repository history, likely causes include:
-
Docker Network Issues
- Pool overlaps (
awf-netconflicts) - Orphaned containers from previous runs
- Cleanup issues from
timeoutkills
- Pool overlaps (
-
MCP Server Issues
- GitHub MCP lockdown mode in public repos
- Safe outputs file not created
- MCP gateway startup failures
-
SSL/TLS Issues
- HTTPS_PROXY environment variable problems
- Squid proxy intercept mode failures
- Certificate validation errors
-
Copilot CLI Issues
- Exit code 1 without clear error message
- Tool execution failures
- Timeout during long-running operations (5-minute limit)
Recommended Actions
Immediate Investigation
-
Download workflow logs manually from GitHub Actions UI
gh run download 21780014599
-
Check for Docker resource leaks on GitHub Actions runners
- Orphaned
awf-netnetworks - Stale containers from previous runs
- Orphaned
-
Review Copilot CLI logs if preserved at
/tmp/awf-agent-logs-*
Debugging Steps
-
Re-run the workflow with debug logging:
awf --log-level debug --keep-containers ...
-
Check Squid proxy logs for blocked domains:
sudo cat /tmp/squid-logs-*/access.log | grep TCP_DENIED
-
Verify GitHub MCP server configuration:
- Check
/home/runner/.copilot/mcp-config.json - Confirm
--disable-builtin-mcpsflag usage - Verify
GITHUB_PERSONAL_ACCESS_TOKENenvironment variable
- Check
Prevention
-
Add pre-test cleanup to smoke test workflows:
docker network prune -f docker container prune -f
-
Enhance error reporting in Copilot CLI execution step:
- Capture full stdout/stderr
- Preserve agent logs as artifacts
- Add explicit error detection
-
Monitor scheduled runs for patterns:
- Check if failures correlate with specific times
- Identify if runner resource exhaustion is a factor
Next Steps
- ⏳ Manual log review required - Download logs from run 21780014599
- 🔍 Reproduce locally if possible using same commit SHA
- 📊 Update cache memory with findings once logs are analyzed
- 🛠️ Implement fixes based on root cause analysis
🏥 Automatically investigated by CI Doctor workflow
AI generated by CI Doctor