-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Description
The StartDockerImageDownload function in pkg/cli/docker_images.go has a time-of-check-time-of-use (TOCTOU) race condition that can cause duplicate goroutines and resource leaks in concurrent Docker image download operations.
Problem
The availability check (IsDockerImageAvailable) happens outside the mutex lock, creating a race window:
func StartDockerImageDownload(image string) bool {
// ❌ Check availability WITHOUT lock
if IsDockerImageAvailable(image) {
return false
}
// ✅ Lock acquired HERE - too late!
pullState.mu.Lock()
if pullState.downloading[image] {
pullState.mu.Unlock()
return false
}
pullState.downloading[image] = true
pullState.mu.Unlock()
go func() { /* download */ }()
return true
}Race Scenario
- Thread A checks
IsDockerImageAvailable(image)→ returnsfalse - Thread B checks
IsDockerImageAvailable(image)→ returnsfalse - Image becomes available externally (e.g., pulled by another process)
- Thread A acquires lock, marks downloading, spawns goroutine
- Thread B sees downloading flag, returns false
- Result: Unnecessary goroutine wastes resources pulling already-available image
Impact
- Severity: Critical (concurrency bug)
- Risk: Duplicate download attempts, resource waste, confusing logs
- Hot path: Called by
CheckAndPrepareDockerImages(line 167)
Suggested Changes
Move the availability check inside the lock for atomic check-and-set:
func StartDockerImageDownload(image string) bool {
// ✅ Atomic check-and-set under lock
pullState.mu.Lock()
defer pullState.mu.Unlock()
// Check availability (now atomic with downloading check)
if IsDockerImageAvailable(image) {
dockerImagesLog.Printf("Image %s is already available", image)
return false
}
// Check if already downloading
if pullState.downloading[image] {
dockerImagesLog.Printf("Image %s is already downloading", image)
return false
}
// Mark as downloading and spawn goroutine
pullState.downloading[image] = true
go func() {
defer func() {
pullState.mu.Lock()
delete(pullState.downloading, image)
pullState.mu.Unlock()
}()
// Download logic (may need RLock optimization if expensive)
/* ... */
}()
return true
}Alternative: Double-Checked Locking
If IsDockerImageAvailable is expensive (calls docker image inspect):
- Check availability under RLock first (fast path)
- Upgrade to exclusive Lock only if download needed
- Re-check availability after upgrade (prevents race)
Files Affected
pkg/cli/docker_images.go:68-83- Main race conditionpkg/cli/docker_images_test.go- Add concurrency test
Success Criteria
- Availability check moved inside mutex lock
- No race conditions detected:
go test -race ./pkg/cli/... - Existing tests pass:
go test ./pkg/cli/docker_images_test.go - New concurrency test added (multiple goroutines calling StartDockerImageDownload)
- No performance regression (verify lock is not held excessively)
- Check for deadlocks if
IsDockerImageAvailablealso takes locks -
make agent-finishpasses
Priority
Critical - Concurrency bug that can cause resource leaks and duplicate operations
Estimated Effort: Medium (4-6 hours including testing and lock analysis)
Source
Extracted from Sergo Concurrency & Error Flow Analysis - Discussion #10665
Analysis Quote:
"Critical Finding: Discovered a time-of-check-time-of-use (TOCTOU) race condition in
pkg/cli/docker_images.go:68-83that could cause duplicate goroutines and resource leaks in Docker image download operations."
References:
AI generated by Discussion Task Miner - Code Quality Improvement Agent
- expires on Feb 9, 2026, 2:09 PM UTC