-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
Description
Summary
The debugging.sh example test in the Examples Test workflow experienced an intermittent Squid container startup failure. The container exited with code 1 during healthcheck, causing the test to fail.
Failure Details
- Workflow Run: 21680840422
- Commit: 769a6f5 (PR feat: filter benign operational logs from Squid access.log #432 merge - "feat: filter benign operational logs from Squid access.log")
- Test: debugging.sh (3rd example test)
- Error:
dependency failed to start: container awf-squid exited (1)
Timeline
2026-02-04T17:07:10.6561324Z Container awf-squid Creating
2026-02-04T17:07:10.6744945Z Container awf-squid Created
2026-02-04T17:07:10.6857834Z Container awf-squid Starting
2026-02-04T17:07:10.8238120Z Container awf-squid Started
2026-02-04T17:07:10.8240265Z Container awf-squid Waiting
2026-02-04T17:07:11.3254835Z Container awf-squid Error
The container failed after ~500ms during healthcheck phase.
Evidence of Intermittent Nature
- Previous tests succeeded: basic-curl.sh and using-domains-file.sh passed with identical Squid config
- Recent history clean: Last 4 workflow runs on main branch succeeded
- Fast failure: Container exited in 500ms, suggesting transient issue rather than config error
Changes in PR #432
Added ACL to filter localhost healthcheck logs:
# Don't log healthcheck probes from localhost
acl healthcheck_localhost src 127.0.0.1 ::1
log_access deny healthcheck_localhost
While this change is present, the fact that earlier tests passed suggests it's not a syntax error.
Potential Root Causes
1. Docker Resource Exhaustion (Most Likely)
- Third test in sequence
- Possible race condition in container cleanup
- Pre-test cleanup may not always complete before next test starts
2. Squid Healthcheck Timing
- Container starts but fails healthcheck probe
- Possible network initialization delay
- Docker Compose healthcheck may be too aggressive
3. Image Pull/Caching Issue
- Using GHCR images (
ghcr.io/github/gh-aw-firewall/squid:latest) - Possible intermittent registry connectivity issue
- Image layer corruption (unlikely but possible)
Recommendations
Short-term
- Rerun failed workflows (likely to succeed)
- Monitor for recurrence pattern
Medium-term
- Add retry logic to container startup
- Increase healthcheck interval/timeout
- Add explicit wait between example tests
Long-term
- Improve container cleanup between tests
- Add Squid startup logging to capture failure details
- Consider test isolation improvements
Monitoring
Watch for:
- Recurrence in debugging.sh specifically
- Pattern of "3rd test fails" in other workflows
- Similar failures after PR merges
Related Issues
- Security: Separate privileged iptables setup from unprivileged command execution using Docker layers #375 - Container architecture improvements
- Past Docker network cleanup issues (pool overlaps)
Note: This appears to be a flaky test rather than a deterministic failure. The same configuration worked in tests immediately before and after in the same run.
AI generated by CI Doctor
Reactions are currently unavailable