Reset backup failed after restore individual disk on the VM #6741

Franco-Sparrow · 2024-09-27T20:13:07Z

Is really nice to have a restore in place now, without the need of detatch or create new VMs for the use of these backups. Thanks to OpenNebula team for this new feature. Anyway, please, check this bug report:

Description

If you execute a restore in place (directly on the VM) for a given disk (suppose you restore the 2nd disk of the VM disk.2), you are forced to apply a reset backup for the next backups of the VM. When you execute the reset backup, it fails, because the disk that was not restored and is the original disk of the VM (disk.0) still has dirty-bitmaps that the new restored disk doesnt have.

To Reproduce

Create a new VM with 2 non-volatile disks.
Configure the VM for Single VM backups: BACKUP_VOLATILE: YES, INCREMENT_MODE: CBT, FS_FREEZE: AGENT, KEEP_LAST: 3, MODE: INCREMENT.
Partitionate, format and mount the 2nd disk on the VM.
Create a new backup (inc0).
Create a 2nd backup (inc1).
Restore in place (directly to the VM), only the 2nd disk (disk.2) from the inc1.
Execute a reset backup with the VM in RUNNING state and see how it fails.
The previous error assume that vdb (disk.2) has same dirty-bitmaps and checkpoints as the vda (disk.0), and is not the case.
If you try the reset backup with the VM in POWEROFF state, it works, probably because the dirty-bitmaps are converted into bitmaps and they are successfully removed.

Expected behavior

The reset backup should works for a VM that has individual disks restored and is on RUNNING state.

Details

Affected Component: [Storage]
Hypervisor: [KVM]
Version: [6.10.0]

Additional context
Add any other context about the problem here.

Progress Status

Code committed
Testing - QA
Documentation (Release notes - resolved issues, compatibility, known issues)

The text was updated successfully, but these errors were encountered:

Franco-Sparrow · 2024-09-28T02:42:21Z

We need to look at the hole picture here. If we are using OpenNebula as a public cloud, there will be many clients probably restoring individual disks, directly to VMs. You can not tell these clients that they need to poweroff their VMs, in order to execute a workaround, that will restablish the backing chain of these VMs. This is not feasible solution. Anyway, this is a workaround I have documented for this issue:

INCREMENTAL BACKUP ISSUE: Bitmap not found in backing chain

If problem does not get fixed with a reset backup, then apply following procedure.

On the orchestrator leader

Define the following vars:

VM_ID=10
DISK_ID=0
BACKUP_DS_ID=101
SYSTEM_DS_ID=102
BITMAP_ID=0

Get qcow2 disk information:

qemu-img info /var/lib/one/datastores/${SYSTEM_DS_ID}/${VM_ID}/disk.${DISK_ID}

The output looks as follow:

image: /var/lib/one/datastores/0/10/disk.0
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 2.05 GiB
cluster_size: 65536
backing file: /var/lib/one//datastores/1/e4671f022ec4a6ac1abe89ee163a3ab0
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    bitmaps:
        [0]:
            flags:
                [0]: auto
            name: one-10-0
            granularity: 65536
    refcount bits: 16
    corrupt: false

Shutdown the VM:

onevm poweroff ${VM_ID}

Check the host from where the given VM is being running:

HOST=$(onevm show ${VM_ID} | grep HOST | grep kvm | awk '{print $NF}')
echo $HOST

The output was as follow:

ON-LA-N1-kvm

Force the removal of residual files from previous incremental backups:

rm -rf /var/lib/one/datastores/${SYSTEM_DS_ID}/${VM_ID}/backup/ && \
rm -rf /var/lib/one/datastores/${SYSTEM_DS_ID}/${VM_ID}/tmp/ && \
rm -rf /var/lib/one/datastores/${BACKUP_DS_ID}/${VM_ID}/backup/ && \
rm -rf /var/lib/one/datastores/${BACKUP_DS_ID}/${VM_ID}/tmp/ && \
ssh $HOST "rm -rf /var/lib/libvirt/qemu/domain-*-one-${VM_ID}/" && \
ssh $HOST "rm -rf /var/lib/libvirt/qemu/checkpoint/one-${VM_ID}"

Remove the bitmap and possible checkpoints:

ssh $HOST  "qemu-img bitmap --remove /var/lib/one/datastores/${SYSTEM_DS_ID}/${VM_ID}/disk.${DISK_ID} one-${VM_ID}-${BITMAP_ID}" && \
ssh $HOST  "rm -rf /var/lib/libvirt/qemu/domain-*-one-${VM_ID}/"

Resume VM:

onevm resume ${VM_ID}

kCyborg · 2024-09-28T04:34:33Z

I will keep an eye on this. Pretty good stuff!!!

nachowork90 · 2024-09-28T12:07:29Z

Thanks, @Franco-Sparrow. Thinking out loud, one feasible way could be removing all bitmaps of the remaining disk images during the restore-in-place process; in this way, when the VM resumes, all disks have the same information related to bitmaps.

I will take a look at the new code!

rsmontero · 2024-09-30T07:48:47Z

Hi

We found this issue as part of the development of #6411 (Ceph incremental backup). We implemented a rest of a chain after a "in-place" restore. so if the VM is configured to do incremental backups, and do a restore, the next backup will start a new incremental chain. Same behavior as when a disk is attached to the VM, the backup chain is reset.

Thanks for the feedback!

(PS: This will be in 6.10.1, I'll close the issue together with #6411 when it is merged)

Franco-Sparrow added the Type: Bug label Sep 27, 2024

rsmontero added this to the Release 6.10.1 milestone Sep 30, 2024

rsmontero added Community Category: Drivers - Storage Status: Accepted Priority: Normal labels Sep 30, 2024

rsmontero assigned 1gramos Sep 30, 2024

rsmontero closed this as completed in 5f7b370 Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset backup failed after restore individual disk on the VM #6741

Reset backup failed after restore individual disk on the VM #6741

Franco-Sparrow commented Sep 27, 2024 •

edited

Loading

Franco-Sparrow commented Sep 28, 2024 •

edited

Loading

kCyborg commented Sep 28, 2024

nachowork90 commented Sep 28, 2024 •

edited

Loading

rsmontero commented Sep 30, 2024

Reset backup failed after restore individual disk on the VM #6741

Reset backup failed after restore individual disk on the VM #6741

Comments

Franco-Sparrow commented Sep 27, 2024 • edited Loading

Description

To Reproduce

Expected behavior

Details

Progress Status

Franco-Sparrow commented Sep 28, 2024 • edited Loading

INCREMENTAL BACKUP ISSUE: Bitmap not found in backing chain

kCyborg commented Sep 28, 2024

nachowork90 commented Sep 28, 2024 • edited Loading

rsmontero commented Sep 30, 2024

Franco-Sparrow commented Sep 27, 2024 •

edited

Loading

Franco-Sparrow commented Sep 28, 2024 •

edited

Loading

nachowork90 commented Sep 28, 2024 •

edited

Loading