Skip to content
This repository has been archived by the owner on May 5, 2021. It is now read-only.

Performance #1

Open
pdirksen opened this issue Mar 19, 2018 · 41 comments
Open

Performance #1

pdirksen opened this issue Mar 19, 2018 · 41 comments

Comments

@pdirksen
Copy link

pdirksen commented Mar 19, 2018

Can we do something about the performance?
Currently we use a CIFS/SMB volume via a 1Gbit/s interface.
A full backup for a medium sized VM needs about 2mins whereas it needs about 30mins to finish using xdelta3.

INFO: status: 70% (15040643072/21474836480), sparse 25% (5543481344), duration 1478, read/write 6/5 MB/s

As xdelta3 is only able to use one thread combined with a medium compression this is probably the bottleneck.

@jadsolucions
Copy link

Agree

@sienar1
Copy link

sienar1 commented Jul 23, 2018

Is xdelta3 not threaded at all? That's where I see the bottleneck as well. I have VMs that are 200+GB and can do full backups very quickly, but a differential takes nearly all day because it's stuck in xdelta3 maxing out a single CPU core and not spreading the work out.

@Kalimeiro
Copy link

Kalimeiro commented Jul 25, 2018

it's not a multithread problem but a "network problem", really it's vzdump the big problem, when you'r backup start, it use your cifs/smb to read the old backup and doing a double write in same time : the vm snapshot in a dat/tmp file to compare with old backup and the delta vcdiff file...

@sienar1
Copy link

sienar1 commented Jul 25, 2018

I have to disagree. I can run the same differential backups of large VMs on local storage (storage that can handle over 1GB/s of throughput) and the differential backup runs at about 4MB/s, checking CPU usage you can see xdelta3 running on only 1 core, for hours (18 hours for a 200GB VM specifically). It appears to be xdelta3 very poorly threaded. I've also found multiple other support threads where xdelta3 has been used in commercial products and they suffer the same issue. If you have a server with many slower cores (such as my Xeon E5-2650L based server), these differential backups are near useless for any large source VMs which is exactly where you need it to perform well.

@Kalimeiro
Copy link

I have to disagree. I can run the same differential backups of large VMs on local storage.

You say in local storage, but the problem is when using CIFS/SMB storage, it read the old backup from CIFS/SMB + it write the actual snapshot in a dat/tmp to the CIFS/SMB and then compare the old backup and the actual snapshot to write the vcdiff file... (double write CIFS/SMB + READ CIFS/SMB) ... it's a design problem that comes from vzdump in addition to the ayufan patch (which is not reactive at all)

I agree with local storage, no problem...

@gilbertoferreira
Copy link

Hi
I have issues with this too...
Before apply this patch, backup tooks 7 hours... After the path, is took more than 18 hours!
And the servers all of them has 16 cores.... or more...
I am using NFS as storage...
Something definitely is very wrong.

@jhusarek
Copy link

Hi
I have issues with this too...
Before apply this patch, backup tooks 7 hours... After the path, is took more than 18 hours!
And the servers all of them has 16 cores.... or more...
I am using NFS as storage...
Something definitely is very wrong.

Hi,
I have same issue.

@gilbertoferreira
Copy link

Hi there! I have the same issue here too! The file is, indeed, reduced but slow down all vzdump process... How can we do something to improve this?

@alebeta90
Copy link

Hi all,

here having the same problem when differential backup is happening

I'm using the patch for 5.4-5.

normal backups are working at normal speed and differential backups speed drops drastically.

All backups goes to same NFS storage.

@marcin-github
Copy link

I did a few test on 5GB .vma files. If you add -3 -B 2147483648 options to xdelta3 you should get noticeably smaller diff copies. Disadvantage is more memory consumption.
About speed. xdelta3 is single threaded, it uses gzip which is also single threaded. If you find way to change gzip to pigz you backup should finish a little faster.

@gilbertoferreira
Copy link

gilbertoferreira commented Aug 30, 2019 via email

@KlugFR
Copy link

KlugFR commented Dec 1, 2019

xdelta3 3.0.11 exists but is not downloaded by the patch installer.
See: #34

@ScIT-Raphael
Copy link

ScIT-Raphael commented Dec 20, 2019

Have the same performance issue here, really small machine takes ways to long for a differential backup. Already running xdelta3 3.0.11, on backup routine it only takes one cpu core. Backup target is a cifs storage service (storage box from hetzner.com), full backups just takes 1 or 2 minutes and working properly.

I like the differential solution, but with that bad performance I can't let it run on my other proxmox servers with larger vms. Is there any advice how to optimize it?

Backup log:

INFO: starting new backup job: vzdump 101 --all 0 --compress lzo --mailnotification failure --maxfiles 30 --mode snapshot --quiet 1 --mailto support@xxx.xx --fullbackup 30 --node pm103 --storage backup
INFO: doing differential backup against '/mnt/pve/backup/dump/vzdump-qemu-101-2019_12_20-08_45_02.vma.lzo'
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2019-12-20 08:47:17
INFO: status = running
INFO: update VM 101: -lock backup
INFO: VM Name: fw03.xx.xx.xx
INFO: include disk 'scsi0' 'data:vm-101-disk-0' 60G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/backup/dump/vzdump-qemu-101-2019_12_20-08_45_02.vma.lzo--differential-2019_12_20-08_47_17.vcdiff'
INFO: started backup task '3033d581-542d-4f46-9ce6-d939874c7524'
INFO: status: 10% (6830620672/64424509440), sparse 10% (6820663296), duration 4, read/write 1707/2 MB/s
INFO: status: 11% (7492009984/64424509440), sparse 11% (7237709824), duration 38, read/write 19/7 MB/s
INFO: status: 14% (9460973568/64424509440), sparse 14% (9090572288), duration 57, read/write 103/6 MB/s
INFO: status: 20% (13451329536/64424509440), sparse 20% (13062885376), duration 61, read/write 997/4 MB/s
INFO: status: 23% (15379267584/64424509440), sparse 23% (14967386112), duration 65, read/write 481/5 MB/s
INFO: status: 24% (15470493696/64424509440), sparse 23% (14968123392), duration 77, read/write 7/7 MB/s
INFO: status: 25% (16109076480/64424509440), sparse 23% (15011840000), duration 168, read/write 7/6 MB/s
INFO: status: 26% (16755261440/64424509440), sparse 23% (15303970816), duration 208, read/write 16/8 MB/s
INFO: status: 27% (17412849664/64424509440), sparse 24% (15700697088), duration 236, read/write 23/9 MB/s
INFO: status: 28% (18059034624/64424509440), sparse 24% (16054276096), duration 264, read/write 23/10 MB/s
INFO: status: 29% (18686214144/64424509440), sparse 25% (16454565888), duration 282, read/write 34/12 MB/s
INFO: status: 30% (19336200192/64424509440), sparse 26% (16785715200), duration 307, read/write 25/12 MB/s
INFO: status: 31% (19982385152/64424509440), sparse 26% (17112711168), duration 385, read/write 8/4 MB/s
INFO: status: 32% (20620967936/64424509440), sparse 26% (17158856704), duration 566, read/write 3/3 MB/s
INFO: status: 33% (21270953984/64424509440), sparse 27% (17622482944), duration 612, read/write 14/4 MB/s
INFO: status: 37% (23984472064/64424509440), sparse 31% (20185636864), duration 650, read/write 71/3 MB/s
INFO: status: 51% (33088536576/64424509440), sparse 45% (29283241984), duration 653, read/write 3034/2 MB/s
INFO: status: 53% (34527707136/64424509440), sparse 47% (30698168320), duration 656, read/write 479/8 MB/s
INFO: status: 66% (43034148864/64424509440), sparse 60% (39193022464), duration 659, read/write 2835/3 MB/s
INFO: status: 75% (48851648512/64424509440), sparse 69% (44998873088), duration 662, read/write 1939/3 MB/s
INFO: status: 84% (54398222336/64424509440), sparse 78% (50537148416), duration 665, read/write 1848/2 MB/s
INFO: status: 94% (61080600576/64424509440), sparse 88% (57219432448), duration 668, read/write 2227/0 MB/s
INFO: status: 100% (64424509440/64424509440), sparse 94% (60563333120), duration 670, read/write 1671/0 MB/s
INFO: transferred 64424 MB in 670 seconds (96 MB/s)
INFO: archive file size: 1.64GB
INFO: Finished Backup of VM 101 (00:11:11)
INFO: Backup finished at 2019-12-20 08:58:28
INFO: Backup job finished successfully
TASK OK

@KlugFR
Copy link

KlugFR commented Dec 20, 2019

My guess is that if you want multi-core, you need to use LZMA and a multi-core compiled version of LZMA.

@ScIT-Raphael
Copy link

Thanks for the answer @KlugFR, is there any docs how this can be done?

@vdeville
Copy link

vdeville commented Jun 2, 2020

Hello,
Confirmed on a vm of 100gb. 10minutes for full backup, few hours for differential.
Thanks

@ScIT-Raphael
Copy link

Nothing new about how to resolve the issue?

@ayufan
Copy link
Owner

ayufan commented Jun 2, 2020

@MyTheValentinus @ScIT-Raphael Maybe try it with the recently released zstd?

@vdeville
Copy link

vdeville commented Jun 2, 2020 via email

@vdeville
Copy link

vdeville commented Jun 2, 2020

Standard backup zstd mode snapshots: 110-130 mb/s
Differential backup on same target: 6-10 mb/s

@JoeApo108
Copy link

JoeApo108 commented Jun 4, 2020

Completely the same situation. In this current state, it's unusable.
I'm using the latest patch with the latest pve-xdelta3 3.0.11

@ScIT-Raphael
Copy link

Same here, latest verison, still slow as hell :(.

@ayufan
Copy link
Owner

ayufan commented Jun 4, 2020

Hmm. Is the xdelta3 to be single threaded? Can you show CPU usage of individual processes?

@vdeville
Copy link

vdeville commented Jun 4, 2020

When i look the cpu usage, no core are 100%, i'm not sure that is linked to the single thread of xdelta. Before, other old version work fine in single core.

@JoeApo108
Copy link

Testing with edited /etc/vzdump.conf
Added zstd: 0 (which utilized half of all available cores)

After this, I can see 28 cores in use out of 56. Before it was just 1. Backup still in process at the moment. Will keep you informed.

@ScIT-Raphael
Copy link

@JoeApo108 Do you already have some feedback? Went the backup trough properly and fast?

@JoeApo108
Copy link

JoeApo108 commented Jun 4, 2020

@JoeApo108 Do you already have some feedback? Went the backup trough properly and fast?

No, it didn't help at all...still slow.
The full backup of 320GB took 1h40m. The diff is ongoing 6h and still processing.

Ok, it might have smth to do with the source window size (-B switch). When I tested it with LXC of the tiny size than the full backup took (1.4GiB, 29MiB/s, archive size 410MB) and diff backup (1.4GiB, 23MiB/s, archive size 700kB)

The huge LXC full backup took (320GiB, 52MiB/s, archive size 160GB) and diff ( ???? still in process)

@JoeApo108
Copy link

Hmm. Is the xdelta3 to be single threaded? Can you show CPU usage of individual processes?

https://imgur.com/a/bgB88ix

@vdeville
Copy link

Hello,
Any news ?
Thanks

@umm0n
Copy link

umm0n commented Jun 17, 2020

We're having the same problem, using Proxmox v6.2-4 and the latest xdelta3.

ZSTD full backup is quick, differential against it takes hours and hours until it basically stalls. Any solutions?

@ScIT-Raphael
Copy link

Also would move to have a solution here, currently we just had to remove it again and switch back to normal backup. It's not usefull when a diff backup takes much longer than a full...

@ayufan
Copy link
Owner

ayufan commented Jun 18, 2020 via email

@ayufan
Copy link
Owner

ayufan commented Jun 21, 2020

I run some testing xdelta3, and well performance is kind of miserable. It really depends on the amount of changes. I'm trying different settings to check the impact on size and performance to maybe find a balance.

@vdeville
Copy link

@ayufan Ok nice to hear that your have same issue. We waiting the fix thanks

@marcin-github
Copy link

https://pbs.proxmox.com/wiki/index.php/Main_Page does dedup and compression. I think this solution from ayfan (thank you Kamil) will become deprecated.

@Genzo4
Copy link

Genzo4 commented Jul 26, 2020

https://pbs.proxmox.com/wiki/index.php/Main_Page does dedup and compression. I think this solution from ayfan (thank you Kamil) will become deprecated.

1.3 Main Features
...
Incremental backups Changes between backups are typically low. Reading and sending only the delta reduces storage and network impact of backups

@ogghi
Copy link

ogghi commented Aug 23, 2020

Would it make ayfan's solution deprecated though? I would need another server to be running just for backups?
I like the integrated solution.
(I found this thread because I was looking into the performance issues)

@ayufan
Copy link
Owner

ayufan commented Aug 23, 2020 via email

@sienar1
Copy link

sienar1 commented Aug 23, 2020

Would it make ayfan's solution deprecated though? I would need another server to be running just for backups?
I like the integrated solution.
(I found this thread because I was looking into the performance issues)

Ayufan's solution is not integrated though, it's a modification that had a serious performance problem. PBS can be run in a container (I have it running in a Debian container) OR in a VM OR on bare metal. Any of those can be run anywhere you want. PBS is more flexible and more performant.

@ayufan
Copy link
Owner

ayufan commented Aug 23, 2020 via email

@ogghi
Copy link

ogghi commented Aug 24, 2020

All right, will try PBS then on Openmediavault 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests