Importing a containerdisk onto a block volume loses sparseness #3614

stefanha · 2025-01-22T20:52:47Z

What happened:
Importing a containerdisk onto a block volume loses sparseness. When I imported the centos-stream:9 containerdisk, which only uses 2 GB of non-zero data onto an empty 10 GB block volume, all 10 GB were written by CDI. Preallocation was not enabled.

What you expected to happen:
Only the non-zero data should be written to the block volume. This saves space on the underlying storage.

How to reproduce it (as minimally and precisely as possible):
Create a DataVolume from the YAML below and observe the amount of storage allocated. I used KubeSAN as the CSI driver, so the LVM lvs command can be used to see the thin provisioned storage usage. If you don't have thin provisioned storage you could use I/O stats or tracing to determine how much data is being written.

Additional context:
I discussed this with @aglitke and we looked at the qemu-img command that is invoked:

Running qemu-img with args: [convert -t writeback -p -O raw /scratch/disk/disk.img /dev/cdi-block-volume]

Adding the --target-is-zero option should avoid writing every block in the target block volume.

If there are concerns that some new block volumes come uninitialized (blocks not zeroed), then it should be possible to run blkdiscard --zeroout /path/to/block/device before invoking qemu-img with --target-is-zero. I have not tested this, but blkdiscard should zero the device efficiently and fall back to writing zero buffers on old hardware. On modern devices this would still be faster and preserve sparseness compared to writing all zeroes. On old devices it would be slower, depending on how many non-zero the input disk image has.

Environment:

CDI version (use kubectl get deployments cdi-deployment -o yaml): 4.17.3
Kubernetes version (use kubectl version): v1.30.5
DV specification:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  annotations:
    cdi.kubevirt.io/storage.bind.immediate.requested: "true"
    cdi.kubevirt.io/storage.import.lastUseTime: "2025-01-22T20:19:34.821435785Z"
    cdi.kubevirt.io/storage.usePopulator: "true"
  creationTimestamp: "2025-01-22T20:19:34Z"
  generation: 1
  labels:
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
    app.kubernetes.io/part-of: hyperconverged-cluster
    app.kubernetes.io/version: 4.17.3
    cdi.kubevirt.io/dataImportCron: centos-stream-9-test-import-cron-3vs0wc
    instancetype.kubevirt.io/default-instancetype: u1.medium
    instancetype.kubevirt.io/default-preference: centos.stream9
  name: centos-stream-9-test-9515270a7f3d
  namespace: default
  resourceVersion: "217802868"
  uid: d8d56487-5686-47d0-93d0-79de02a8c2c3
spec:
  source:
    registry:
      url: docker://quay.io/containerdisks/centos-stream@sha256:9515270a7f3d3fd053732c15232071cb544d847e56aa2005f27002014b5becaa
  storage:
    resources:
      requests:
        storage: 10Gi
    storageClassName: kubesan

Cloud provider or hardware configuration: N/A
OS (e.g. from /etc/os-release): N/A
Kernel (e.g. uname -a): N/A
Install tools: N/A
Others: N/A

The text was updated successfully, but these errors were encountered:

awels · 2025-01-27T13:48:19Z

@stefanha Is there a particular version of qemu-img that has this support, or has it been there a long time? We should make sure the version of qemu-img used in CDI supports this flag.

stefanha · 2025-01-27T14:13:20Z

--target-is-zero was introduced in QEMU 5.0.0 in April 2020. You can get an idea of which Linux distro releases include that QEMU version here:
https://repology.org/project/qemu/versions

It is available starting from RHEL 8, Ubuntu 22.04, OpenSUSE Leap 15.4, Debian 10 (backports) or 11.

akalenyu · 2025-01-28T11:45:59Z

Super interesting, thanks for opening the issue! I am wondering how a certain test we have isn't catching this

containerized-data-importer/tests/import_test.go

Line 1643 in 41b96ed

    
           Expect(f.VerifySparse(f.Namespace, pvc, utils.DefaultPvcMountPath)).To(BeTrue())

I know our sparseness verification util is broken ATM but even on a custom branch it passes:
#3213

stefanha · 2025-01-28T12:44:39Z

I am wondering how a certain test we have isn't catching this

This du(1) command-line probably isn't working as expected on a block device:

https://github.com/kubevirt/containerized-data-importer/blob/main/tests/framework/pvc.go#L552

akalenyu · 2025-01-28T13:16:15Z

I am wondering how a certain test we have isn't catching this

This du(1) command-line probably isn't working as expected on a block device:

https://github.com/kubevirt/containerized-data-importer/blob/main/tests/framework/pvc.go#L552

This PR gets rid of the du check and relies on qemu-img info actual size

akalenyu · 2025-01-28T13:18:48Z

This explains it:

INFO: VerifySparse comparison: OriginalVirtual: 18874368 vs SizeOnDisk: 0

SizeOnDisk being qemuImgInfo.ActualSize
(Output from that custom PR branch)

stefanha · 2025-01-28T14:00:09Z

There is no generic way in Linux to query a block device to find out how many blocks are allocated, so SizeOnDisk will not have a useful value.

stefanha · 2025-01-28T14:04:24Z

Maybe this trick will work: create a sparse file using truncate(1) and then create a corresponding loopback block device using losetup(8). The test would be able to look at the blocks allocated in the underlying sparse file to get an approximation of the number of block touched on the loopback block device, modulo file system effects like its block size.

akalenyu · 2025-01-28T14:28:58Z

Maybe this trick will work: create a sparse file using truncate(1) and then create a corresponding loopback block device using losetup(8). The test would be able to look at the blocks allocated in the underlying sparse file to get an approximation of the number of block touched on the loopback block device, modulo file system effects like its block size.

Can't we just use dd and copy the content into a scratch filesystem volume?

stefanha · 2025-01-28T14:36:27Z

Can't we just use dd and copy the content into a scratch filesystem volume?

If I understand correctly, the test is attempting to verify that the block device was written sparsely (non-zero blocks were skipped). Simply using dd to copy the block device to a file won't show whether the block device was written sparsely, so I don't think that approach works.

You could first populate the block device with a pattern, import the containerdisk, and then check to see whether the pattern is still visible in blocks where the containerdisk is zero. That last step could be a single checksum comparison of the contents of the whole disk.

akalenyu · 2025-01-28T19:30:00Z

Looks like this issue is somehow hidden with ceph rbd (using same 10Gi image)

# rook-ceph-toolbox pod
$ ceph df -f json | jq .pools[0].stats.bytes_used
2036678656

stefanha · 2025-01-29T14:45:08Z

Ceph might be doing zero detection or deduplication? Even if this is the case, you should be able to see the issue by running the same qemu-img command-line as CDI under strace(1) and looking at the pattern of write syscalls.

stefanha added the kind/bug label Jan 22, 2025

aglitke added the good first issue Identifies an issue that has been specifically created or selected for first-time contributors. label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importing a containerdisk onto a block volume loses sparseness #3614

Importing a containerdisk onto a block volume loses sparseness #3614

stefanha commented Jan 22, 2025

awels commented Jan 27, 2025

stefanha commented Jan 27, 2025

akalenyu commented Jan 28, 2025

stefanha commented Jan 28, 2025

akalenyu commented Jan 28, 2025 •

edited

Loading

akalenyu commented Jan 28, 2025

stefanha commented Jan 28, 2025

stefanha commented Jan 28, 2025

akalenyu commented Jan 28, 2025

stefanha commented Jan 28, 2025

akalenyu commented Jan 28, 2025 •

edited

Loading

stefanha commented Jan 29, 2025

Importing a containerdisk onto a block volume loses sparseness #3614

Importing a containerdisk onto a block volume loses sparseness #3614

Comments

stefanha commented Jan 22, 2025

awels commented Jan 27, 2025

stefanha commented Jan 27, 2025

akalenyu commented Jan 28, 2025

stefanha commented Jan 28, 2025

akalenyu commented Jan 28, 2025 • edited Loading

akalenyu commented Jan 28, 2025

stefanha commented Jan 28, 2025

stefanha commented Jan 28, 2025

akalenyu commented Jan 28, 2025

stefanha commented Jan 28, 2025

akalenyu commented Jan 28, 2025 • edited Loading

stefanha commented Jan 29, 2025

akalenyu commented Jan 28, 2025 •

edited

Loading

akalenyu commented Jan 28, 2025 •

edited

Loading