Skip to content

Fix TOCTOU race in Docker image download#11886

Closed
Copilot wants to merge 1 commit intomainfrom
copilot/fix-toc-tou-race-condition
Closed

Fix TOCTOU race in Docker image download#11886
Copilot wants to merge 1 commit intomainfrom
copilot/fix-toc-tou-race-condition

Conversation

Copy link
Contributor

Copilot AI commented Jan 26, 2026

StartDockerImageDownload checked image availability outside the mutex, allowing concurrent calls to spawn duplicate download goroutines when an image becomes available between the check and lock acquisition.

Changes

  • Moved availability check inside mutex in StartDockerImageDownload (pkg/cli/docker_images.go:75-92)

    • Changed from exported IsDockerImageAvailable() to internal isDockerImageAvailableUnlocked()
    • Ensures atomic check-and-set operation
  • Added concurrency tests (pkg/cli/docker_images_test.go:265-413)

    • Validates single download goroutine spawned under concurrent calls
    • Verifies no downloads start for already-available images
    • Tests race scenarios with external availability changes

Before:

// ❌ Check outside lock - race window
if IsDockerImageAvailable(image) {
    return false
}
pullState.mu.Lock()
// ...

After:

// ✅ Atomic check-and-set
pullState.mu.Lock()
defer pullState.mu.Unlock()

if isDockerImageAvailableUnlocked(image) {
    return false
}
if pullState.downloading[image] {
    return false
}
pullState.downloading[image] = true
// ...
Original prompt

This section details on the original issue you should resolve

<issue_title>[Code Quality] Fix TOCTOU Race Condition in Docker Image Download Logic</issue_title>
<issue_description>## Description

The StartDockerImageDownload function in pkg/cli/docker_images.go has a time-of-check-time-of-use (TOCTOU) race condition that can cause duplicate goroutines and resource leaks in concurrent Docker image download operations.

Problem

The availability check (IsDockerImageAvailable) happens outside the mutex lock, creating a race window:

func StartDockerImageDownload(image string) bool {
    // ❌ Check availability WITHOUT lock
    if IsDockerImageAvailable(image) {
        return false
    }

    // ✅ Lock acquired HERE - too late!
    pullState.mu.Lock()
    if pullState.downloading[image] {
        pullState.mu.Unlock()
        return false
    }
    pullState.downloading[image] = true
    pullState.mu.Unlock()
    
    go func() { /* download */ }()
    return true
}

Race Scenario

  1. Thread A checks IsDockerImageAvailable(image) → returns false
  2. Thread B checks IsDockerImageAvailable(image) → returns false
  3. Image becomes available externally (e.g., pulled by another process)
  4. Thread A acquires lock, marks downloading, spawns goroutine
  5. Thread B sees downloading flag, returns false
  6. Result: Unnecessary goroutine wastes resources pulling already-available image

Impact

  • Severity: Critical (concurrency bug)
  • Risk: Duplicate download attempts, resource waste, confusing logs
  • Hot path: Called by CheckAndPrepareDockerImages (line 167)

Suggested Changes

Move the availability check inside the lock for atomic check-and-set:

func StartDockerImageDownload(image string) bool {
    // ✅ Atomic check-and-set under lock
    pullState.mu.Lock()
    defer pullState.mu.Unlock()
    
    // Check availability (now atomic with downloading check)
    if IsDockerImageAvailable(image) {
        dockerImagesLog.Printf("Image %s is already available", image)
        return false
    }
    
    // Check if already downloading
    if pullState.downloading[image] {
        dockerImagesLog.Printf("Image %s is already downloading", image)
        return false
    }
    
    // Mark as downloading and spawn goroutine
    pullState.downloading[image] = true
    
    go func() {
        defer func() {
            pullState.mu.Lock()
            delete(pullState.downloading, image)
            pullState.mu.Unlock()
        }()
        
        // Download logic (may need RLock optimization if expensive)
        /* ... */
    }()
    
    return true
}

Alternative: Double-Checked Locking

If IsDockerImageAvailable is expensive (calls docker image inspect):

  1. Check availability under RLock first (fast path)
  2. Upgrade to exclusive Lock only if download needed
  3. Re-check availability after upgrade (prevents race)

Files Affected

  • pkg/cli/docker_images.go:68-83 - Main race condition
  • pkg/cli/docker_images_test.go - Add concurrency test

Success Criteria

  • Availability check moved inside mutex lock
  • No race conditions detected: go test -race ./pkg/cli/...
  • Existing tests pass: go test ./pkg/cli/docker_images_test.go
  • New concurrency test added (multiple goroutines calling StartDockerImageDownload)
  • No performance regression (verify lock is not held excessively)
  • Check for deadlocks if IsDockerImageAvailable also takes locks
  • make agent-finish passes

Priority

Critical - Concurrency bug that can cause resource leaks and duplicate operations

Estimated Effort: Medium (4-6 hours including testing and lock analysis)

Source

Extracted from Sergo Concurrency & Error Flow Analysis - Discussion githubnext/gh-aw#10665

Analysis Quote:

"Critical Finding: Discovered a time-of-check-time-of-use (TOCTOU) race condition in pkg/cli/docker_images.go:68-83 that could cause duplicate goroutines and resource leaks in Docker image download operations."

References:

AI generated by Discussion Task Miner - Code Quality Improvement Agent

  • expires on Feb 9, 2026, 2:09 PM UTC

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@pelikhan pelikhan closed this Jan 26, 2026
Copilot AI changed the title [WIP] Fix TOCTOU race condition in Docker image download logic Fix TOCTOU race in Docker image download Jan 26, 2026
Copilot AI requested a review from pelikhan January 26, 2026 15:19
Copilot stopped work on behalf of pelikhan due to an error January 26, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Code Quality] Fix TOCTOU Race Condition in Docker Image Download Logic

2 participants