-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unprivileged btrfs subvol list
#893
Open
maharmstone
wants to merge
29
commits into
kdave:devel
Choose a base branch
from
maharmstone:osandov-subvol-list
base: devel
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add new option --recursive 'btrfs subvol delete', causing it to pass the BTRFS_UTIL_DELETE_SUBVOLUME_RECURSIVE flag through to libbtrfsutil. This can work in two modes, depending on the user: - regular user - this will skip subvolumes that are not accessible - root (CAP_SYS_ADMIN) - no limitations Pull-request: kdave#861 Signed-off-by: Mark Harmstone <maharmstone@meta.com> Co-authored-by: Omar Sandoval <osandov@osandov.com> Reviewed-by: Qu Wenruo <wqu@suse.com> [ Add details to man page, fix indent in the doc. ] Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Add a new option --subvol, which tells mkfs.btrfs to create the specified directories as subvolumes when used with --rootdir. Given a populated directory dir, the command $ mkfs.btrfs --rootdir dir --subvol usr --subvol home --subvol home/username img will create subvolumes 'usr' and 'home' within the toplevel subvolume, and subvolume 'username' within the 'home' subvolume. It will fail if any of the directories do not yet exist. Pull-request: kdave#868 Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <maharmstone@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
Add limit parameter so workflows are not skipped if they don't fit the default limit 10. Add more workflows to clean up after recent updates. Signed-off-by: David Sterba <dsterba@suse.com>
Remove last newline in the output of 'btrfs filesystem show', keep the line between two filesystems so the devices are visually grouped togehter. Pull-request: kdave#866 Author: Matt Langford <github@matt.boats> Signed-off-by: David Sterba <dsterba@suse.com>
Added 0x prefix to HEX numbers and transform some tables to new format. Pull-request: kdave#881 Signed-off-by: Yuwei Han <hrx@bupt.moe> [ Fix RST grammar errors ] Signed-off-by: Qu Wenruo <wqu@suse.com>
Change --subvol that it can accept flags, and add a "default" flag that allows you to mark a subvolume as the default. Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Adds a flag to mkfs.btrfs --subvol to allow subvolumes to be created readonly. Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Call btrfs_util_subvolume_create in create_one_subvolume rather than calling the ioctl directly. Pull-request: kdave#878 Signed-off-by: Mark Harmstone <maharmstone@fb.com> Co-authored-by: Omar Sandoval <osandov@fb.com>
Call btrfs_util_subvolume_snapshot in cmd_subvolume_snapshot rather than calling the ioctl directly. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Co-authored-by: Omar Sandoval <osandov@fb.com>
Remove functions that after the previous two patches are no longer referenced. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Co-authored-by: Omar Sandoval <osandov@fb.com>
Currently the transaction log is more or less ignored by btrfs check, meaning that it's possible for a FS with a corrupt log to pass btrfs check, but be immediately corrupted by the kernel when it's mounted. Adds a check that if there's an inode in the log, any pending non-inlined csumed writes also have corresponding csum entries. Pull-request: kdave#879 Signed-off-by: Mark Harmstone <maharmstone@fb.com> [ Small commit message update. ] Signed-off-by: Qu Wenruo <wqu@suse.com>
The new hard link detection and creation support is done by maintaining an rb tree with the following members: - st_ino, st_dev This is to record the stat() report from the host fs. With this two, we can detect if it's really a hard link (st_dev determines one filesystem/subvolume, and st_ino determines the inode number inside the fs). - root This is btrfs root pointer. This a special requirement for the recent introduced "--subvol" option. As we can have the following corner case: rootdir/ |- foobar_hardlink1 |- foobar_hardlink2 |- subv/ <- To be a subvolume inside btrfs |- foobar_hardlink3 In above case, on the host fs, `subv/` directory is just a regular directory, but in the new btrfs it will be a subvolume. In that case, `foobar_hardlink3` cannot be created as a hard link, but a new inode. - st_nlink and found_nlink Records the original reported number of links, and the nlinks we created inside btrfs. This is recorded in case we created all hard links and can remove the entry early. - btrfs_ino This is the inode number inside btrfs. And since we can handle hard links safely, remove all the related warnings, and add a new note for `--subvol` option, warning about the case where we need to split hard links due to subvolume boundary. Pull-request: kdave#873 Signed-off-by: Qu Wenruo <wqu@suse.com>
This introduces two new cases: - 3 hardlinks without any subvolume This should results 3 hard links inside the btrfs. - 3 hardlinks, but a subvolume will split 2 of them Then the 2 inside the same subvolume should still report 2 nlinks, but the lone one inside the new subvolume can only report 1 nlink. Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG] Sometimes test case btrfs/012 fails randomly, with the failure to read a symlink: QA output created by 012 Checking converted btrfs against the original one: -OK +readlink: Structure needs cleaning Checking saved ext2 image against the original one: OK Furthermore, this will trigger a kernel error message: BTRFS critical (device dm-2): regular/prealloc extent found for non-regular inode 133081 [CAUSE] For that specific inode 133081, the tree dump looks like this: item 127 key (133081 INODE_ITEM 0) itemoff 40984 itemsize 160 generation 1 transid 1 size 4095 nbytes 4096 block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0 sequence 0 flags 0x0(none) item 128 key (133081 INODE_REF 133080) itemoff 40972 itemsize 12 index 2 namelen 2 name: l3 item 129 key (133081 EXTENT_DATA 0) itemoff 40919 itemsize 53 generation 4 type 1 (regular) extent data disk byte 2147483648 nr 38080512 extent data offset 37974016 nr 4096 ram 38080512 extent compression 0 (none) Note that, the symlink inode size is 4095 at the max size (PATH_MAX, removing the terminating NUL). But the nbytes is 4096, exactly matching the sector size of the btrfs. Thus it results the creation of a regular extent, but for btrfs we do not accept a symlink with a regular/preallocated extent, thus kernel rejects such read and failed the readlink call. The root cause is in the convert code, where for symlinks we always create a data extent with its size + 1, causing the above problem. I guess the original code is to handle the terminating NUL, but in btrfs we never need to store the terminating NUL for inline extents nor file names. Thus this pitfall in btrfs-convert leads to the above invalid data extent and fail the test case. [FIX] - Fix the ext2 and reiserfs symbolic link creation code To remove the terminating NUL. - Add extra checks for the size of a symbolic link Btrfs has extra limits on the size of a symbolic link, as btrfs must store symbolic link targets as inlined extents. This means for 4K node sized btrfs, the size limit is smaller than the usual PATH_MAX - 1 (only around 4000 bytes instead of 4095). So for certain nodesize, some filesystems can not be converted to btrfs. (this should be rare, because the default nodesize is 16K already) - Split the symbolic link and inline data extent size checks For symbolic links the real limit is PATH_MAX - 1 (removing the terminating NUL), but for inline data extents the limit is sectorsize - 1, which can be different from 4096 - 1 (e.g. 64K sector size). Pull-request: kdave#884 Signed-off-by: Qu Wenruo <wqu@suse.com>
symbolic links [BUG] There is a recent bug that btrfs/012 fails and kernel rejects to read a symbolic link which is backed by a regular extent. Furthremore in that case, "btrfs check" doesn't detect such problem at all. [CAUSE] For symbolic links, we only allow inline file extents, and this means we should only have a symbolic link target which is smaller than 4K. But btrfs check doesn't handle symbolic link inodes any differently, thus it doesn't check if the file extents are inlined or not, nor reporting this problem as an error. [FIX] When processing data extents, if we find the owning inode is a symbolic link, and the file extent is regular/preallocated, mark the inode with I_ERR_FILE_EXTENT_TOO_LARGE error. Signed-off-by: Qu Wenruo <wqu@suse.com>
…inks [BUG] There is a recent bug that btrfs/012 fails and kernel rejects to read a symbolic link which is backed by a regular extent. Furthremore in that case, "btrfs check --mode=lowmem" doesn't detect such problem at all. [CAUSE] For symbolic links, we only allow inline extents, and this means we should only have a symbolic link target which is smaller than 4K. But lowmem mode btrfs check doesn't handle symbolic link inodes any differently, thus it doesn't check if the file extents are inlined or not, nor reporting this problem as an error. [FIX] When processing data extents, if we find the owning inode is a symbolic link, and the file extent is regular/preallocated, report an error for the bad file extent item. Signed-off-by: Qu Wenruo <wqu@suse.com>
…n convert The new test case will: - Create a symbolic which contains a 4095 bytes sized target on ext4 - Convert the ext4 to btrfs - Make sure we can still read the symbolic link For unpatched btrfs-convert, the resulted symbolic link will be rejected by kernel and fail. Signed-off-by: Qu Wenruo <wqu@suse.com>
btrfs-ioctl.rst was laid out like it should be a man page, including having a section number, but it wasn't getting installed because there was not enough content. Pull-request: kdave#892 Signed-off-by: Mark Harmstone <maharmstone@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
There is an internal report that, during btrfs-convert to block-group tree, by accident some systemd events triggered the mount of the target fs. This leads to double mount (one by kernel and one by the btrfs-progs), which seems to cause quite some problems. To avoid such accident, exclusively opens all devices if btrfs-progs is doing write operations. Pull-request: kdave#888 Reported-by: pandada8 <pandada8@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG] There is one report about `btrfs rescue clear-ino-cache` failed with tree block level mismatch: # btrfs rescue clear-ino-cache /dev/mapper/rootext Successfully cleaned up ino cache for root id: 5 Successfully cleaned up ino cache for root id: 257 Successfully cleaned up ino cache for root id: 258 corrupt node: root=7 block=647369064448 slot=0, invalid level for leaf, have 1 expect 0 node 647369064448 level 1 items 252 free space 241 generation 6065173 owner CSUM_TREE node 647369064448 flags 0x1(WRITTEN) backref revision 1 fs uuid e6614f01-6f56-4776-8b0a-c260089c35e7 chunk uuid f665f535-4cfd-49e0-8be9-7f94bf59b75d key (EXTENT_CSUM EXTENT_CSUM 3714473984) block 677126111232 gen 6065002 [...] key (EXTENT_CSUM EXTENT_CSUM 6192357376) block 646396493824 gen 6065032 ERROR: failed to clear ino cache: Input/output error [CAUSE] During `btrfs rescue clear-ino-cache`, btrfs-progs will iterate through all the subvolumes, and clear the inode cache inode from each subvolume. The problem is in how we iterate the subvolumes. We hold a path of tree root, and go modifiy the fs for each found subvolume, then call btrfs_next_item(). This is not safe, because the path to tree root is not longer reliable if we modified the fs. So the btrfs_next_item() call will fail because the fs is modified halfway, resulting the above problem. [FIX] Instead of holding a path to a subvolume root item, and modify the fs halfway, here introduce a helper, find_next_root(), to locate the root item whose objectid >= our target rootid, and return the found item key. The path to root tree is only hold then released inside find_next_root(). By this, we won't hold any unrelated path while modifying the filesystem. And since we're here, also adding back the missing new line when all ino cache is cleared. Pull-request: kdave#890 Reported-by: Archange <archange@archlinux.org> Link: https://lore.kernel.org/linux-btrfs/4803f696-2dc5-4987-a353-fce1272e93e7@archlinux.org/ Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG] There are reports about deprecated inode cache causing newer kernels to rejecting them. Such inode cache is rarely utilized and already fully deprecated since v5.11, and newer kernel will reject data extents of inode cache since v6.11. But original mode btrfs check won't detect nor report them as error. Meanwhile lowmem mode can properly detect and report them: ERROR: root 5 INODE[18446744073709551604] nlink(1) not equal to inode_refs(0) ERROR: invalid imode mode bits: 00 ERROR: invalid inode generation 18446744073709551604 or transid 1 for ino 18446744073709551605, expect [0, 72) ERROR: root 5 INODE[18446744073709551605] is orphan item Since those inode cache paid no attention to properly maintain all the numbers, they are easy targets for more recent lowmem mode. [CAUSE] For original mode, it has extra hardcoded hacks to avoid nlink checks for inode cache inode. Furthermore original mode doesn't check the mode bits nor its generation. [FIX] For original mode, remove the hack for inode cache so that the deprecated inode cache can be reported as an error. For both modes, add extra global message to direct the affected users to use 'btrfs rescue clear-ino-cache' to clear the deprecated cache. Pull-request: kdave#891 Signed-off-by: Qu Wenruo <wqu@suse.com>
The inode_cache and involved on-disk formats are deprecated and will have no effect since v5.11 kernel. And in v6.11 kernel, new tree-checker will even reject data extents belonging to those deprecated inode cache. Lowmem check can detect such deprecated inode cache from the beginning. This images are generated by 5.10 LTS kernels with inode cache. Signed-off-by: Qu Wenruo <wqu@suse.com>
btrfs_util_subvolume_info() explicitly checks whether geteuid() == 0 to decide whether to use the unprivileged BTRFS_IOC_GET_SUBVOL_INFO ioctl or the privileged BTRFS_IOC_TREE_SEARCH ioctl. This breaks in user namespaces: $ unshare -r python3 -c 'import btrfsutil; print(btrfsutil.subvolume_info("/"))' Traceback (most recent call last): File "<string>", line 1, in <module> btrfsutil.BtrfsUtilError: [BtrfsUtilError 12 Errno 1] Could not search B-tree: Operation not permitted: '/' The unprivileged ioctl has been supported since Linux 4.18. Let's try the unprivileged ioctl first, then fall back to the privileged version only if it isn't supported. Signed-off-by: Omar Sandoval <osandov@fb.com>
The subvolume iterator API explicitly checks whether geteuid() == 0 to decide whether to use the unprivileged BTRFS_IOC_GET_SUBVOL_ROOTREF and BTRFS_IOC_INO_LOOKUP_USER ioctls or the privileged BTRFS_IOC_TREE_SEARCH ioctl. This breaks in user namespaces: $ unshare -r python3 -c 'import btrfsutil; print(list(btrfsutil.SubvolumeIterator("/home")))' Traceback (most recent call last): File "<string>", line 1, in <module> btrfsutil.BtrfsUtilError: [BtrfsUtilError 12 Errno 1] Could not search B-tree: Operation not permitted Instead of the explicit check, let's try the privileged mode first, and if it fails with a permission error, fall back to the unprivileged mode (which has been supported since Linux 4.18). Note that we have to try the privileged mode first, since even for privileged users, the unprivileged mode may omit some subvolumes that are hidden by filesystem mounts. Signed-off-by: Omar Sandoval <osandov@fb.com>
It hasn't been used since commit 9005b60 ("btrfs-progs: use libbtrfsutil for subvol show"). Signed-off-by: Omar Sandoval <osandov@fb.com>
The way btrfs subvol list prints paths and what the -o and -a flags do are all nonsense. Apparently, very early versions of Btrfs had a concept of a "top level" of subvolumes rather than the root filesystem tree that we have today; see commit 4ff9e2a ("Add btrfs-list for listing subvolumes"). The original subvol list code tracked the ID of that top level subvolume. Eventually, 5 became the only possibility for the top level, and -o, -a, and path printing were based on that. Commit 4f5ebb3 ("Btrfs-progs: fix to make list specified directory's subvolumes work") broke this and changed the top level to be the same as the parent subvolume ID, which gave us the illogical behavior we have today. It has been this way for a decade, so we're probably stuck with it. But let's at least document precisely what these all do in preparation for adding sensible options. Let's also add tests in preparation for the upcoming changes. Signed-off-by: Omar Sandoval <osandov@fb.com>
btrfs subvol list has its own subvolume walking implementation that we can replace with a libbtrfsutil subvolume iterator. Most of the changed lines are removing the old implementation and mechanically updating the comparators, filters, and printers to use libbtrfsutil's subvolume info. The interesting parts are: 1. We can replace the red-black tree of subvolumes with an array that we qsort. 2. Listing deleted subvolumes needs a different codepath, but we don't need a filter for it anymore. 3. We need some hacks to maintain the weird path behavior documented in the previous commit. In addition to removing a bunch of redundant code, this also prepares us for allowing subvol list by unprivileged users in some cases. Signed-off-by: Omar Sandoval <osandov@fb.com>
Now that we've documented the current nonsensical behavior, add a couple of options that actually make sense: -O lists all subvolumes below a path (which is what people think -o does), and -A lists all subvolumes with no path munging (which is what people think the default or -a do). -O can even be used by unprivileged users. -O and -A also renames the "top level" in the default output to what it actually is now: the "parent". Signed-off-by: Omar Sandoval <osandov@fb.com>
maharmstone
force-pushed
the
osandov-subvol-list
branch
from
September 13, 2024 11:42
f351ff9
to
73abcba
Compare
Well, not just in 2022, I've been working on that all the time (irregularly). I understand you need it but it's an interface change and currently there's the mkfs set of changes and subvol create/delete pots to libbtrfsutil with yet unresolved issues, so subvol list will wait. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resubmission of Omar's patches from June, which allow
btrfs subvol list
to work without CAP_SYS_ADMIN: https://lore.kernel.org/linux-btrfs/cover.1718995160.git.osandov@fb.com/@kdave, I appreciate that you were working on a replacement for
btrfs subvol list
in 2022, but we need this for unprivileged containers. Plus in any event we'll be keeping the old interface for a good while yet, in the interests of stability.