Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverlayFS support (d_revalidate out and support renameat2 flags) #9414

Closed
wants to merge 7 commits into from

Conversation

snajpa
Copy link
Contributor

@snajpa snajpa commented Oct 4, 2019

Description

This PR is comprised of several commits, the original motivation is to add the
functionality needed to enable OverlayFS on top of ZFS, which namely means
supporting renameat2() Linux syscall with its flags
(most interesting of which is the atomic swap of files with RENAME_EXCHANGE)
and getting rid of d_revalidate function, which makes ZFS appear as a remote
filesystem.

Bulk of the original work for supporting the renameat2() flags was done by
@cyphar; he proposes new IL record types, which I'd like as a best solution
as well - but I'm of the opinion that until more platforms adopt renameat2()
semantics in some way, it's not worth the trouble and I'm trying the compatible,
yet more ugly approach of emulating the otherwise atomic operations with
multiple non-atomic ones.

See the commit messages for more information on the relevant parts, I've kept
the code for new txtypes in the original @cyphar's commits.

Fixes #2256
Fixes #8648
Fixes #8774

How Has This Been Tested?

Tested with vanilla packaged Docker from Docker's repo, however, current Docker
checks for ZFS's magic and then refuses to even consider the overlay2 driver.

So I've been testing with a little cheat patch - vpsfreecz@6c7873e

The patches have been deployed to @vpsfreecz staging nodes, which revealed some bugs, that have been addressed in the later force-pushes.

Also tested with the @cyphar's renameat2 util and some parallel script-fu.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@snajpa
Copy link
Contributor Author

snajpa commented Oct 5, 2019

Btw, my branch with changes needed to get Docker working (and xattrs working in 1st level userns) can be found here:

https://github.com/vpsfreecz/zfs/commits/ovl-wip

@snajpa snajpa marked this pull request as ready for review October 5, 2019 00:43
@cyphar
Copy link
Contributor

cyphar commented Oct 5, 2019

Have you tried mounting this with overlayfs directly? Given we no longer have a d_revalidate handler, overlayfs should work without additional hacks. (In fact, I would add that as a separate test.)

@snajpa
Copy link
Contributor Author

snajpa commented Oct 5, 2019

Hmm should have let the tests finish before publishing the PR as open for comments :) Now I have to figure out why is zfs_link_create playing the -ENOSYS note.

@snajpa snajpa force-pushed the ovl branch 3 times, most recently from c2e3c19 to aba3a50 Compare October 6, 2019 23:44
@snajpa
Copy link
Contributor Author

snajpa commented Oct 7, 2019

Can I help myself and restart the tests somehow, please? Or is it up to the project leadership to come by and restart them?

Also, I'm trying running tests locally and even without my patches still, like > 150 tests are failing - is this normal? I see that some of them are documented to fail, but I still get a really long list of those without any annotation (like, acl tests fail? hm...).

@gmelikov
Copy link
Member

gmelikov commented Oct 7, 2019

@snajpa

  • to restart tests - you can just git commit --amend and git push origin HEAD -f, it will trigger bots
  • you may see some tests skipping due to non-installed deps, but you shouldn't see much more errors. One more recommendation - run tests inside VM to be safe from destroying your pools.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 7, 2019
@snajpa snajpa force-pushed the ovl branch 10 times, most recently from 1e054d6 to d09d14b Compare October 19, 2019 22:48
@snajpa
Copy link
Contributor Author

snajpa commented Oct 20, 2019

So I think this will be a last force-push if the tests finish up fine; I had a creds memory leak in there and an ASSERT-wrapped code before, which caused some space leak and associated entertaining cleanup afterwards on the staging nodes... :)

That is if I'm right in assuming all that's needed to fix the i386 build test is to restart it... is it so?

@snajpa snajpa force-pushed the ovl branch 6 times, most recently from 9ef43df to ccd1137 Compare October 20, 2019 20:45
@behlendorf behlendorf self-requested a review October 21, 2019 17:56
@tuxillo
Copy link

tuxillo commented Apr 30, 2021

Are there plans to move this forward?

for (int i = 0; i < 16; i++) {
int r = 0xFF;
random_get_pseudo_bytes((void *)&r, 1);
pos += snprintf(tmpname+pos, MAXPATHLEN, "%02x", r);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this should be MAXPATHLEN-pos otherwise you could overrun the buffer.

@cyphar
Copy link
Contributor

cyphar commented Jun 8, 2021

I'll try to pick this one up again next week (the authorship of this PR is getting kinda funky now...).

@cyphar
Copy link
Contributor

cyphar commented Jun 9, 2021

@snajpa I'm going to squash some of these patches together and add a Co-authored-by -- hopefully that doesn't bother you. (It's a bit strange to have in the git history a patch which adds new transaction types and then deletes them.)

@cyphar
Copy link
Contributor

cyphar commented Jun 9, 2021

Ah I forgot that #9600 hasn't been fully resolved. I guess I'll need to tackle that first (or we can just make the PR a "renameat2 flags" PR and we can deal with revalidation separately).

@sempervictus
Copy link
Contributor

Curious if there's been any progress on this or the required #9600 offline. Would be great to have drop-in support for replacing legacy docker setups with ZFS underpinnings but without rebuilding the storage graph.

@cyphar
Copy link
Contributor

cyphar commented Oct 9, 2021

@sempervictus I've carried the renameat2 changes to #12209 but there is an issue with a necessary change to the integration tests which is being reviewed in #12588. However, it seems I will need to switch back to new TX types because the current mechanism of emulating the replay of the new features through other TX types is actually not crash-resistant during replay. As for #9600 I haven't had time to look and see what is necessary to implement the new test harness which was discussed in the relevant review PR (the d_revalidate change was merged but bugs were found so it was reverted).

@sempervictus
Copy link
Contributor

@cyphar - thank you very much for explaining that, a bit more complex than i'd thought. We're surviving on AUFS for the time being, but would be cool if at some point we could just use OS defaults for Docker.

@ahrens ahrens added the Status: Work in Progress Not yet ready for general review label Nov 9, 2021
@ahrens ahrens marked this pull request as draft November 9, 2021 19:55
@m-ueberall
Copy link

We're surviving on AUFS for the time being

@sempervictus: Slightly OT, but may I ask whether you still stick to AUFS and/or compared it/switched to fuse-overlayfs in the meantime given that the former is unfortunately only supported by newer LTS kernels 5.10, 5.15 according to http://aufs.sourceforge.net/ which forced me to look for an alternative for the OEM kernel 5.14 on Ubuntu 20.04 ("Focal") with ZFS on root?

@BtbN
Copy link
Contributor

BtbN commented Nov 13, 2022

This can probably be closed, now that 86db35c and dbf6108 are merged?

@sempervictus
Copy link
Contributor

We're surviving on AUFS for the time being

@sempervictus: Slightly OT, but may I ask whether you still stick to AUFS and/or compared it/switched to fuse-overlayfs in the meantime given that the former is unfortunately only supported by newer LTS kernels 5.10, 5.15 according to http://aufs.sourceforge.net/ which forced me to look for an alternative for the OEM kernel 5.14 on Ubuntu 20.04 ("Focal") with ZFS on root?

Keeps workloads in the kernel, and have used AUFS since the SLAX days of old... works fine on 5.15.

@behlendorf
Copy link
Contributor

Closing, support for overlayfs was merged.

@behlendorf behlendorf closed this Nov 14, 2022
PhilippWendler added a commit to sosy-lab/benchexec that referenced this pull request Mar 17, 2023
It seems that ZFS has gained for overlayfs in
openzfs/zfs#9414

Furthermore, such a change should not be mixed in with the cgroups
changes.
@ikus060
Copy link

ikus060 commented Aug 27, 2023

Sorry to bring this one up again.

Was this change complete ? If yes, which version of ZFS I need ?

On Proxmox VE running on ZFS version zfs-2.1.12-pve1. Trying a docker pull of some image is failing with:

failed to register layer: unlinkat /usr/lib/.build-id/29: invalid argument

And kernel message get display:

[141358.973417] overlayfs: upper fs does not support RENAME_WHITEOUT.
[141358.973452] overlayfs: fs on '/var/lib/docker/overlay2/l/JD6T632VFX6H4PZ6U62BB5YMRT' does not support file handles, falling back to xino=off.

Si either I'm not running the right version of it's still not working ?

@BtbN
Copy link
Contributor

BtbN commented Aug 27, 2023

Yes, overlayfs works fine on zfs 2.2 and above.

@ikus060
Copy link

ikus060 commented Aug 27, 2023

@BtbN Thanks ! I will wait for zfs v2.2 to be available and retest. Thanks

@problame
Copy link
Contributor

From my personal notes on this subject:

Upgraded to the Proxmox 6.5 kernel that includes OpenZFS 2.2 (instructions)

The

[141358.973417] overlayfs: upper fs does not support RENAME_WHITEOUT.

warning is gone when creating an overlayfs on a ZFS dataset / inside a Proxmox LXC, using

cd test/
mkdir upper lower work result
mount -t overlay -o lowerdir=./lower,upperdir=./upper,workdir=./work overlayfs result/

The warning about xino=off is still there:

[  400.551652] overlayfs: fs on './lower' does not support file handles, falling back to xino=off.

To my understanding, this warning is annoying but can be ignored for the typical use case of running overlayfs (namely, Docker) on top of a single ZFS pool:


I'd appreciate commentary on my analysis about the xino=off warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing Status: Work in Progress Not yet ready for general review
Projects
None yet