Downloading images concurrently breaks in many ways

### Description

We use limactl to create multiple cluster in parallel, all using the same k8s based template. If a developer already used lima the ubuntu server cloud image is already downloaded, but if this is the first time you download the image, or maybe you delete the ~/Library/Cache/lima, the image will be downloaded in parallel.

Since all limactl processes are downloading the same to the same temporary directory
~/Library/Caches/lima/download/by-url-sha256/xxxyyy/data.tmp and then try to rename the same temporary file to the same target file, the process can fail in various ways:

- Directory not empty since other process already started the download
  ```
  2024-10-10 22:56:59,617 ERROR   [hub] failed to download "https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-arm64.img": unlinkat /Users/nir/Library/Caches/lima/download/by-url-sha256/002fbe468673695a2206b26723b1a077a71629001a5b94efd8ea1580e1c3dd06: directory not empty
  ```
- Temporary file renamed by another process
  ```
  2024-10-10 22:58:28,695 ERROR   [dr1] failed to download "https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-arm64.img": rename /Users/nir/Library/Caches/lima/download/by-url-sha256/002fbe468673695a2206b26723b1a077a71629001a5b94efd8ea1580e1c3dd06/data.tmp /Users/nir/Library/Caches/lima/download/by-url-sha256/002fbe468673695a2206b26723b1a077a71629001a5b94efd8ea1580e1c3dd06/data: no such file or directory
  ```
- Corrupted download (not sure it possible with current code)

An easy way to avoid the conflicts is to download to per-process temporary file:

    data.tmp.{pid}

When the download is finished, renaming to data is safe even with multiple processes since posix rename is atomic. If we have N processes renaming N identical downloads:

    data.tmp.123 -> data
    data.tmp.345 -> data
    data.tmp.678 -> data

All renames will succeed in unknown order, but since content is the same it does not matter.

A better but more complex way is to use a lockfile so the first process does the download, and the other processes wait on the lockfile. When the first process finish it unlock the lockfile and the next processes can grab it and find the downloaded file and continue.

Since this may happen at most once for every image, I would go with the simpler solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Downloading images concurrently breaks in many ways #2722

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Downloading images concurrently breaks in many ways #2722

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions