Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset backup failed after restore individual disk on the VM #6741

Closed
3 tasks
Franco-Sparrow opened this issue Sep 27, 2024 · 4 comments
Closed
3 tasks

Reset backup failed after restore individual disk on the VM #6741

Franco-Sparrow opened this issue Sep 27, 2024 · 4 comments

Comments

@Franco-Sparrow
Copy link

Franco-Sparrow commented Sep 27, 2024

Is really nice to have a restore in place now, without the need of detatch or create new VMs for the use of these backups. Thanks to OpenNebula team for this new feature. Anyway, please, check this bug report:

Description

If you execute a restore in place (directly on the VM) for a given disk (suppose you restore the 2nd disk of the VM disk.2), you are forced to apply a reset backup for the next backups of the VM. When you execute the reset backup, it fails, because the disk that was not restored and is the original disk of the VM (disk.0) still has dirty-bitmaps that the new restored disk doesnt have.

To Reproduce

  • Create a new VM with 2 non-volatile disks.
  • Configure the VM for Single VM backups: BACKUP_VOLATILE: YES, INCREMENT_MODE: CBT, FS_FREEZE: AGENT, KEEP_LAST: 3, MODE: INCREMENT.
  • Partitionate, format and mount the 2nd disk on the VM.
  • Create a new backup (inc0).
  • Create a 2nd backup (inc1).
  • Restore in place (directly to the VM), only the 2nd disk (disk.2) from the inc1.
  • Execute a reset backup with the VM in RUNNING state and see how it fails.
    WhatsApp Image 2024-09-27 at 3 35 36 PM
  • The previous error assume that vdb (disk.2) has same dirty-bitmaps and checkpoints as the vda (disk.0), and is not the case.
  • If you try the reset backup with the VM in POWEROFF state, it works, probably because the dirty-bitmaps are converted into bitmaps and they are successfully removed.

Expected behavior

The reset backup should works for a VM that has individual disks restored and is on RUNNING state.

Details

  • Affected Component: [Storage]
  • Hypervisor: [KVM]
  • Version: [6.10.0]

Additional context
Add any other context about the problem here.

Progress Status

  • Code committed
  • Testing - QA
  • Documentation (Release notes - resolved issues, compatibility, known issues)
@Franco-Sparrow
Copy link
Author

Franco-Sparrow commented Sep 28, 2024

We need to look at the hole picture here. If we are using OpenNebula as a public cloud, there will be many clients probably restoring individual disks, directly to VMs. You can not tell these clients that they need to poweroff their VMs, in order to execute a workaround, that will restablish the backing chain of these VMs. This is not feasible solution. Anyway, this is a workaround I have documented for this issue:

INCREMENTAL BACKUP ISSUE: Bitmap not found in backing chain

If problem does not get fixed with a reset backup, then apply following procedure.

On the orchestrator leader

Define the following vars:

VM_ID=10
DISK_ID=0
BACKUP_DS_ID=101
SYSTEM_DS_ID=102
BITMAP_ID=0

Get qcow2 disk information:

qemu-img info /var/lib/one/datastores/${SYSTEM_DS_ID}/${VM_ID}/disk.${DISK_ID}

The output looks as follow:

image: /var/lib/one/datastores/0/10/disk.0
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 2.05 GiB
cluster_size: 65536
backing file: /var/lib/one//datastores/1/e4671f022ec4a6ac1abe89ee163a3ab0
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    bitmaps:
        [0]:
            flags:
                [0]: auto
            name: one-10-0
            granularity: 65536
    refcount bits: 16
    corrupt: false

Shutdown the VM:

onevm poweroff ${VM_ID}

Check the host from where the given VM is being running:

HOST=$(onevm show ${VM_ID} | grep HOST | grep kvm | awk '{print $NF}')
echo $HOST

The output was as follow:

ON-LA-N1-kvm

Force the removal of residual files from previous incremental backups:

rm -rf /var/lib/one/datastores/${SYSTEM_DS_ID}/${VM_ID}/backup/ && \
rm -rf /var/lib/one/datastores/${SYSTEM_DS_ID}/${VM_ID}/tmp/ && \
rm -rf /var/lib/one/datastores/${BACKUP_DS_ID}/${VM_ID}/backup/ && \
rm -rf /var/lib/one/datastores/${BACKUP_DS_ID}/${VM_ID}/tmp/ && \
ssh $HOST "rm -rf /var/lib/libvirt/qemu/domain-*-one-${VM_ID}/" && \
ssh $HOST "rm -rf /var/lib/libvirt/qemu/checkpoint/one-${VM_ID}"

Remove the bitmap and possible checkpoints:

ssh $HOST  "qemu-img bitmap --remove /var/lib/one/datastores/${SYSTEM_DS_ID}/${VM_ID}/disk.${DISK_ID} one-${VM_ID}-${BITMAP_ID}" && \
ssh $HOST  "rm -rf /var/lib/libvirt/qemu/domain-*-one-${VM_ID}/"

Resume VM:

onevm resume ${VM_ID}

@kCyborg
Copy link

kCyborg commented Sep 28, 2024

I will keep an eye on this. Pretty good stuff!!!

@nachowork90
Copy link

nachowork90 commented Sep 28, 2024

Thanks, @Franco-Sparrow. Thinking out loud, one feasible way could be removing all bitmaps of the remaining disk images during the restore-in-place process; in this way, when the VM resumes, all disks have the same information related to bitmaps.

I will take a look at the new code!

@rsmontero
Copy link
Member

Hi

We found this issue as part of the development of #6411 (Ceph incremental backup). We implemented a rest of a chain after a "in-place" restore. so if the VM is configured to do incremental backups, and do a restore, the next backup will start a new incremental chain. Same behavior as when a disk is attached to the VM, the backup chain is reset.

Thanks for the feedback!

(PS: This will be in 6.10.1, I'll close the issue together with #6411 when it is merged)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants