Skip to content

🏥 CI Failuresquid container intermittent startup failure in debugging.sh test #505

@github-actions

Description

@github-actions

Summary

The debugging.sh example test in the Examples Test workflow experienced an intermittent Squid container startup failure. The container exited with code 1 during healthcheck, causing the test to fail.

Failure Details

Timeline

2026-02-04T17:07:10.6561324Z  Container awf-squid  Creating
2026-02-04T17:07:10.6744945Z  Container awf-squid  Created
2026-02-04T17:07:10.6857834Z  Container awf-squid  Starting
2026-02-04T17:07:10.8238120Z  Container awf-squid  Started
2026-02-04T17:07:10.8240265Z  Container awf-squid  Waiting
2026-02-04T17:07:11.3254835Z  Container awf-squid  Error

The container failed after ~500ms during healthcheck phase.

Evidence of Intermittent Nature

  1. Previous tests succeeded: basic-curl.sh and using-domains-file.sh passed with identical Squid config
  2. Recent history clean: Last 4 workflow runs on main branch succeeded
  3. Fast failure: Container exited in 500ms, suggesting transient issue rather than config error

Changes in PR #432

Added ACL to filter localhost healthcheck logs:

# Don't log healthcheck probes from localhost
acl healthcheck_localhost src 127.0.0.1 ::1
log_access deny healthcheck_localhost

While this change is present, the fact that earlier tests passed suggests it's not a syntax error.

Potential Root Causes

1. Docker Resource Exhaustion (Most Likely)

  • Third test in sequence
  • Possible race condition in container cleanup
  • Pre-test cleanup may not always complete before next test starts

2. Squid Healthcheck Timing

  • Container starts but fails healthcheck probe
  • Possible network initialization delay
  • Docker Compose healthcheck may be too aggressive

3. Image Pull/Caching Issue

  • Using GHCR images (ghcr.io/github/gh-aw-firewall/squid:latest)
  • Possible intermittent registry connectivity issue
  • Image layer corruption (unlikely but possible)

Recommendations

Short-term

  • Rerun failed workflows (likely to succeed)
  • Monitor for recurrence pattern

Medium-term

  • Add retry logic to container startup
  • Increase healthcheck interval/timeout
  • Add explicit wait between example tests

Long-term

  • Improve container cleanup between tests
  • Add Squid startup logging to capture failure details
  • Consider test isolation improvements

Monitoring

Watch for:

  • Recurrence in debugging.sh specifically
  • Pattern of "3rd test fails" in other workflows
  • Similar failures after PR merges

Related Issues


Note: This appears to be a flaky test rather than a deterministic failure. The same configuration worked in tests immediately before and after in the same run.

AI generated by CI Doctor

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingci

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions