-
Notifications
You must be signed in to change notification settings - Fork 792
Description
Description
We use limactl to create multiple cluster in parallel, all using the same k8s based template. If a developer already used lima the ubuntu server cloud image is already downloaded, but if this is the first time you download the image, or maybe you delete the ~/Library/Cache/lima, the image will be downloaded in parallel.
Since all limactl processes are downloading the same to the same temporary directory
~/Library/Caches/lima/download/by-url-sha256/xxxyyy/data.tmp and then try to rename the same temporary file to the same target file, the process can fail in various ways:
- Directory not empty since other process already started the download
2024-10-10 22:56:59,617 ERROR [hub] failed to download "https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-arm64.img": unlinkat /Users/nir/Library/Caches/lima/download/by-url-sha256/002fbe468673695a2206b26723b1a077a71629001a5b94efd8ea1580e1c3dd06: directory not empty - Temporary file renamed by another process
2024-10-10 22:58:28,695 ERROR [dr1] failed to download "https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-arm64.img": rename /Users/nir/Library/Caches/lima/download/by-url-sha256/002fbe468673695a2206b26723b1a077a71629001a5b94efd8ea1580e1c3dd06/data.tmp /Users/nir/Library/Caches/lima/download/by-url-sha256/002fbe468673695a2206b26723b1a077a71629001a5b94efd8ea1580e1c3dd06/data: no such file or directory - Corrupted download (not sure it possible with current code)
An easy way to avoid the conflicts is to download to per-process temporary file:
data.tmp.{pid}
When the download is finished, renaming to data is safe even with multiple processes since posix rename is atomic. If we have N processes renaming N identical downloads:
data.tmp.123 -> data
data.tmp.345 -> data
data.tmp.678 -> data
All renames will succeed in unknown order, but since content is the same it does not matter.
A better but more complex way is to use a lockfile so the first process does the download, and the other processes wait on the lockfile. When the first process finish it unlock the lockfile and the next processes can grab it and find the downloaded file and continue.
Since this may happen at most once for every image, I would go with the simpler solution.