Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

files cache: index by physical extents to support reflinks & snapshots #2743

Open
alphazo opened this issue Jun 24, 2017 · 15 comments
Open

files cache: index by physical extents to support reflinks & snapshots #2743

alphazo opened this issue Jun 24, 2017 · 15 comments

Comments

@alphazo
Copy link

alphazo commented Jun 24, 2017

XFS has implemented some new (experimental) exciting features such as reflink that allows instant CoW snapshots similar to what is found on btrfs. I don't think there is a plan to support send/receive commands like on ZFS so the dedup function is pretty much limited to the local filesystem.
I'm envisioning to use the following scheme for my external USB drive that contains my photos and that I usually backup on a second drive or network storage using borg. This applies to both btrfs and the new reflink enabled xfs.

  • While on the move and without network connectivity or access to my borg drive, I can, before working on the pictures, perform instant pseudo snapshots (not like the snapshots found on btrfs) without eating precious hard drive space.
    # cp -r --reflink=always /mnt/usbdrive/photos /mnt/usbdrive/snapshots/snap01

  • When I'm back home I can connect my photo drive and my backup drive to my PC and initiate a borg backup.
    # borg create /mnt/backupdrive/borg::borg-snap1 /mnt/usbdrive

So the borg-snap1 snapshot will contain all the different snapshots I performed while away from home plus the working directory. But since borg doesn't know about the reflink feature it will rescan each of the files found in each snapshot found on my photo drive thinking they are new files but will ultimately find corresponding known dedup blocks so it will effectively not copy over each of the btrfs/xfs pseudo snapshot. I tried it and the size of my photo directore + many snapshots of the same pictures gave pretty much the size of the photo directory on the borg snapshot which is great.
I was wondering it there would be a way to have a new feature in borg to detect such reflink enabled filesystems (btrfs/xfs) so it would immediately know that a file found in a btrfs/xfs directory is a duplicate of an existing known one and therefore use the same dedup blocks.

@enkore
Copy link
Contributor

enkore commented Jun 25, 2017

According to [1] there is no way to find physical extents (the backing element of reflinks) without either risking data corruption (when btrfs compression is used) or writing code that parses btrfs data structures. Apart from that it could be incorporated into the files cache (key := id-hash(NUL || "physical-extents" || extent-descriptor...)).

[1] https://www.spinics.net/lists/linux-btrfs/msg60845.html

@alphazo
Copy link
Author

alphazo commented Jun 25, 2017

Would the new reverse mapping (rmapbt) support on xfs be of any help for identifying CoW files ?

https://lwn.net/Articles/695290/
https://lwn.net/Articles/659677/

@enkore
Copy link
Contributor

enkore commented Jun 26, 2017

I don't see a problem on XFS, apart from this being a rather fickle business overall.

It's the btrfs issue I linked to above that seems problematic to me (there needs to be a reliable way to detect compressed files/extents to work around it). coreutils hints at problems with ext4, though the comments are old. Maybepossibly fixed.

I'm going to be straight here and say that this won't be implementable casually. I'd estimate that implementing this will take 1-2 developer weeks, i.e. quite an effort.

@alphazo
Copy link
Author

alphazo commented Jun 26, 2017

Understood. Thanks for takin the time to answer. Those new XFS features (reflink + rmapbt) are still marked as Experimental anyway. I'm going to give them a try on a controlled environment and see how they perform. By the time XFS reflink goes primetime more people might express a need for such feature on borg. I find it a good balance between btrfs and its flacky RAID support and ZFS that is not available straight in the kernel. Cheap snapshots + borg for real dedup backup is probably going to be my next workhorse.

@enkore enkore changed the title Could borg support reflink enabled filesystems? files cache: index by physical extents to support reflinks & snapshots Jun 26, 2017
@alphazo
Copy link
Author

alphazo commented Jul 1, 2017

Some more pointers from the xfs folks:

borgbackup will probably need to call the GETFSMAP ioctl, which won't land until 4.12. On xfs, rmapbt is needed to supply data block ownership info, which is what borgbackup (and bees, and...) say they want to be smarter about dedup.

https://www.spinics.net/lists/linux-xfs/msg08128.html

@jcharaoui
Copy link

With the release of Linux 4.16, the XFS reflink feature is no longer tagged EXPERIMENTAL.

@alphazo
Copy link
Author

alphazo commented May 23, 2018

@jcharaoui Thanks for pointing this out. I used those features on nearly a year on my photo hard drive and haven't seen any problem. Now I believe that the rmapt feature is also no longer experimental (I'm running Linux 4.16.9) since I no longer see those red warnings when mounting my external drive that has both reflink and rmapbt enabled.

@srd424
Copy link

srd424 commented Aug 27, 2019

Interested in this while watching the first borg backup of a btrfs-based container pool take forever :)
Note that duperemove claims to check shared extents when working out whether to hash files ..

@charles-dyfis-net
Copy link

https://gist.github.com/charles-dyfis-net/bfb0e30862f04957d020afe0ff8b093b may be of interest to those here -- invoking xfs_io to reflink together identical chunks of content Borg has identified.

Not maintained, not recently tested, not documented at time of development and use, very much YMMV.

@srd424
Copy link

srd424 commented Jun 17, 2020

For my use case I'm now investigating https://github.com/systemd/casync, which has btrfs reflink support (don't know if it works on xfs.) I hit a few bugs, but worked out fixes for a couple (systemd/casync#235, systemd/casync#237 - now merged) and found work-arounds for the other two (systemd/casync#239, systemd/casync#240.)

@charles-dyfis-net
Copy link

charles-dyfis-net commented Jun 17, 2020

For my use case I'm now investigating https://github.com/systemd/casync, which has btrfs reflink support (don't know if it works on xfs.) I hit a few bugs, but worked out fixes for a couple (systemd/casync#235, systemd/casync#237 - now merged) and found work-arounds for the other two (systemd/casync#239, systemd/casync#240.)

At the risk of plugging a project I'm a contributor to, I strongly suggest also looking into https://github.com/folbricht/desync. casync may have gotten better over time, but back when desync was started, casync's error handling was atrocious; and desync very much does presently support reflinks when content already exists in another, local .caibx.

@srd424
Copy link

srd424 commented Jun 17, 2020

I'd looked at desync and thought it didn't support reflinking, but it seems I might be mistaken .. will revisit! casync .. does have some issues.

@charles-dyfis-net
Copy link

I'd looked at desync and thought it didn't support reflinking, but it seems I might be mistaken .. will revisit! casync .. does have some issues.

Depending on when you looked it may not have; but it most definitely does today. See https://github.com/folbricht/desync/blob/4a8700c059471d5f005dd7c9a957072bb1fa5c8a/fileseed.go#L123-L126

@srd424
Copy link

srd424 commented Jun 17, 2020

Yup, just found that - I only looked the other day but I think I'd mis-parsed the beginning of the README. Going off topic here, but quickly - desync doesn't seem to support ACLs in the catar stream, but does now implement xattrs - does that mean it can restore ACLs from catar streams that it generates itself?

@charles-dyfis-net
Copy link

Yup, just found that - I only looked the other day but I think I'd mis-parsed the beginning of the README. Going off topic here, but quickly - desync doesn't seem to support ACLs in the catar stream, but does now implement xattrs - does that mean it can restore ACLs from catar streams that it generates itself?

Couldn't say. I'm a regular user and occasional contributor, but I don't use that particular functionality. Frank does have a gitter chat room, though -- I'd suggest asking in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants