Skip to content

[CI Failure Doctor] 🏥 CI Failure Investigation - Run #22185894213 #16846

@github-actions

Description

@github-actions

🏥 CI Failure Investigation - Run #22185894213

Summary

  • Integration: Workflow Permissions completed its tests but actions/upload-artifact could not finalize test-result-integration-Workflow Permissions and the JSON artifact never landed because Azure returned HTTP 403.
  • canary_go relies on those integration artifacts, so scripts/compare-test-coverage.sh reported five Workflow Permissions tests as unexecuted and failed with exit code 1 once the artifact went missing.

Failure Details

Root Cause Analysis

Integration: Workflow Permissions uploaded test-result-integration-Workflow Permissions but the final actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 step failed with Failed to FinalizeArtifact: Received non-retryable error: Failed request: (403) Forbidden. Because the artifact never became available, canary_go’s coverage comparator saw the five Workflow Permissions tests listed in all-tests.txt but not in executed-tests.txt and aborted even though the tests themselves had already passed.

Reproduction Steps

  1. Push any change so the CI run exercises Integration: Workflow Permissions and the downstream canary_go coverage check.
  2. Observe the upload of test-result-integration-Workflow Permissions (test-result-integration-*.json) succeed but then hit HTTP 403 when finalizing the artifact.
  3. canary_go runs scripts/compare-test-coverage.sh all-tests.txt executed-tests.txt, sees the five Workflow Permissions tests missing, and fails with ❌ FAILURE: Found 5 tests that are NOT being executed in CI plus ##[error]Process completed with exit code 1.

Failed Jobs and Errors

  • Integration: Workflow Permissionsactions/upload-artifact step for test-result-integration-Workflow Permissions ended with Failed to FinalizeArtifact: Received non-retryable error: Failed request: (403) Forbidden after the ZIP upload finished.
  • canary_goscripts/compare-test-coverage.sh all-tests.txt executed-tests.txt listed TestCollectPackagesFromWorkflow, TestPermissionsImportIntegration, TestPermissionsShortcutInIncludedFiles, TestPermissionsShortcutMixedUsage, and TestPermissionsWarningInNonStrictMode as missing because the integration artifact was never present, and the job exited 1.

Investigation Findings

  1. The integration job itself completed its tests and uploaded a single test-result-integration-*.json file, so the failure only occurs in the actions/upload-artifact finalization step.
  2. The missing artifact is the sole reason canary_go sees tests as unexecuted; the coverage script compares all-tests.txt against downstream JSON artifacts and treats absence as a test failure.
  3. actions/upload-artifact 403s have happened before ([CI Failure Doctor] CI Failure Investigation - Run #22103122541 #16377), and coverage noise from missing artifacts is already documented ([CI Failure Doctor] CI Failure Investigation - Run #35768 #15789), so both symptoms are recurring patterns in this workflow.

Recommended Actions

  • Re-run the workflow to see if the Failed to FinalizeArtifact 403 was transient; if it recurs, wrap the upload step in retries or a custom uploader so the integration artifact is guaranteed to be committed before downstream jobs start.
  • Teach scripts/compare-test-coverage.sh (and any other coverage check) to detect when expected artifacts are absent and fail fast with a clear message rather than listing every missing test.
  • Surface infrastructure-level failures for actions/upload-artifact (especially HTTP 403s) so the next coverage run can skip or short-circuit instead of depending on a missing artifact.

Prevention Strategies

  • Add instrumentation or retries around Azure artifact finalization to recover from intermittent 403 responses and avoid leaving downstream jobs without inputs.
  • Guard coverage comparators against missing integration artifacts so they raise an explicit “artifact not found” error instead of bloating the log with missing-test lists.

AI Team Self-Improvement

Always treat Failed to FinalizeArtifact 403 responses as infrastructure failures and avoid running dependent coverage comparisons until the artifact is confirmed present.

Historical Context

🩺 Diagnosis provided by CI Failure Doctor

To install this workflow, run gh aw add githubnext/agentics/workflows/ci-doctor.md@ea350161ad5dcc9624cf510f134c6a9e39a6f94d. View source at https://github.com/githubnext/agentics/tree/ea350161ad5dcc9624cf510f134c6a9e39a6f94d/workflows/ci-doctor.md.

  • expires on Feb 20, 2026, 2:39 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Labels

    cookieIssue Monster Loves Cookies!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions