Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btrfs-progs: convert: fix inline extent size for symbol link #884

Closed
wants to merge 1 commit into from

Conversation

adam900710
Copy link
Collaborator

[BUG]
For btrfs converted from ext* or reiserfs, the inlined data extent is always one byte larger than the inode size:

item 10 key (267 INODE_ITEM 0) itemoff 15543 itemsize 160
	generation 1 transid 1 size 4 nbytes 5
	block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
	sequence 0 flags 0x0(none)
item 11 key (267 INODE_REF 256) itemoff 15529 itemsize 14
	index 3 namelen 4 name: path
item 12 key (267 EXTENT_DATA 0) itemoff 15503 itemsize 26
	generation 4 type 0 (inline)
	inline extent data size 5 ram_bytes 5 compression 0 (none)

[CAUSE]
Inside the symbol link creation path for each fs, they all create the inline data extent with a tailing NUL ('\0').

This is different from what btrfs kernel module does:

item 4 key (257 INODE_ITEM 0) itemoff 15883 itemsize 160
	generation 9 transid 9 size 4 nbytes 4
	block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
	sequence 0 flags 0x0(none)
item 5 key (257 INODE_REF 256) itemoff 15869 itemsize 14
	index 2 namelen 4 name: path
item 6 key (257 EXTENT_DATA 0) itemoff 15844 itemsize 25
	generation 9 type 0 (inline)
	inline extent data size 4 ram_bytes 4 compression 0 (none)

[FIX]
Thankfully this is not a big deal, kernel properly reads the content and use inode size to determine the proper link target.

Just align the btrfs-progs convert behavior to the kernel one.

[BUG]
For btrfs converted from ext* or reiserfs, the inlined data extent is
always one byte larger than the inode size:

	item 10 key (267 INODE_ITEM 0) itemoff 15543 itemsize 160
		generation 1 transid 1 size 4 nbytes 5
		block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
		sequence 0 flags 0x0(none)
	item 11 key (267 INODE_REF 256) itemoff 15529 itemsize 14
		index 3 namelen 4 name: path
	item 12 key (267 EXTENT_DATA 0) itemoff 15503 itemsize 26
		generation 4 type 0 (inline)
		inline extent data size 5 ram_bytes 5 compression 0 (none)

[CAUSE]
Inside the symbol link creation path for each fs, they all create the
inline data extent with a tailing NUL ('\0').

This is different from what btrfs kernel module does:

	item 4 key (257 INODE_ITEM 0) itemoff 15883 itemsize 160
		generation 9 transid 9 size 4 nbytes 4
		block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
		sequence 0 flags 0x0(none)
	item 5 key (257 INODE_REF 256) itemoff 15869 itemsize 14
		index 2 namelen 4 name: path
	item 6 key (257 EXTENT_DATA 0) itemoff 15844 itemsize 25
		generation 9 type 0 (inline)
		inline extent data size 4 ram_bytes 4 compression 0 (none)

[FIX]
Thankfully this is not a big deal, kernel properly reads the content and
use inode size to determine the proper link target.

Just align the btrfs-progs convert behavior to the kernel one.

Signed-off-by: Qu Wenruo <wqu@suse.com>
@adam900710
Copy link
Collaborator Author

Closing because the fix is not a simple no-harm fix, but it's already causing bugs in btrfs/012 test case.

This will need to be a multi-patch series to not only fix the +1 error, but also new sanity checks for it, along with new test case.

@adam900710 adam900710 closed this Sep 3, 2024
kdave pushed a commit that referenced this pull request Sep 12, 2024
[BUG]
Sometimes test case btrfs/012 fails randomly, with the failure to read a
symlink:

     QA output created by 012
     Checking converted btrfs against the original one:
    -OK
    +readlink: Structure needs cleaning
     Checking saved ext2 image against the original one:
     OK

Furthermore, this will trigger a kernel error message:

 BTRFS critical (device dm-2): regular/prealloc extent found for non-regular inode 133081

[CAUSE]
For that specific inode 133081, the tree dump looks like this:

        item 127 key (133081 INODE_ITEM 0) itemoff 40984 itemsize 160
                generation 1 transid 1 size 4095 nbytes 4096
                block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
                sequence 0 flags 0x0(none)
        item 128 key (133081 INODE_REF 133080) itemoff 40972 itemsize 12
                index 2 namelen 2 name: l3
        item 129 key (133081 EXTENT_DATA 0) itemoff 40919 itemsize 53
                generation 4 type 1 (regular)
                extent data disk byte 2147483648 nr 38080512
                extent data offset 37974016 nr 4096 ram 38080512
                extent compression 0 (none)

Note that, the symlink inode size is 4095 at the max size (PATH_MAX,
removing the terminating NUL).
But the nbytes is 4096, exactly matching the sector size of the btrfs.

Thus it results the creation of a regular extent, but for btrfs we do
not accept a symlink with a regular/preallocated extent, thus kernel
rejects such read and failed the readlink call.

The root cause is in the convert code, where for symlinks we always
create a data extent with its size + 1, causing the above problem.

I guess the original code is to handle the terminating NUL, but in btrfs
we never need to store the terminating NUL for inline extents nor
file names.

Thus this pitfall in btrfs-convert leads to the above invalid data
extent and fail the test case.

[FIX]
- Fix the ext2 and reiserfs symbolic link creation code
  To remove the terminating NUL.

- Add extra checks for the size of a symbolic link
  Btrfs has extra limits on the size of a symbolic link, as btrfs must
  store symbolic link targets as inlined extents.

  This means for 4K node sized btrfs, the size limit is smaller than the
  usual PATH_MAX - 1 (only around 4000 bytes instead of 4095).

  So for certain nodesize, some filesystems can not be converted to
  btrfs.
  (this should be rare, because the default nodesize is 16K already)

- Split the symbolic link and inline data extent size checks
  For symbolic links the real limit is PATH_MAX - 1 (removing the
  terminating NUL), but for inline data extents the limit is
  sectorsize - 1, which can be different from 4096 - 1 (e.g. 64K sector
  size).

Pull-request: #884
Signed-off-by: Qu Wenruo <wqu@suse.com>
kdave pushed a commit that referenced this pull request Sep 12, 2024
[BUG]
Sometimes test case btrfs/012 fails randomly, with the failure to read a
symlink:

     QA output created by 012
     Checking converted btrfs against the original one:
    -OK
    +readlink: Structure needs cleaning
     Checking saved ext2 image against the original one:
     OK

Furthermore, this will trigger a kernel error message:

 BTRFS critical (device dm-2): regular/prealloc extent found for non-regular inode 133081

[CAUSE]
For that specific inode 133081, the tree dump looks like this:

        item 127 key (133081 INODE_ITEM 0) itemoff 40984 itemsize 160
                generation 1 transid 1 size 4095 nbytes 4096
                block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
                sequence 0 flags 0x0(none)
        item 128 key (133081 INODE_REF 133080) itemoff 40972 itemsize 12
                index 2 namelen 2 name: l3
        item 129 key (133081 EXTENT_DATA 0) itemoff 40919 itemsize 53
                generation 4 type 1 (regular)
                extent data disk byte 2147483648 nr 38080512
                extent data offset 37974016 nr 4096 ram 38080512
                extent compression 0 (none)

Note that, the symlink inode size is 4095 at the max size (PATH_MAX,
removing the terminating NUL).
But the nbytes is 4096, exactly matching the sector size of the btrfs.

Thus it results the creation of a regular extent, but for btrfs we do
not accept a symlink with a regular/preallocated extent, thus kernel
rejects such read and failed the readlink call.

The root cause is in the convert code, where for symlinks we always
create a data extent with its size + 1, causing the above problem.

I guess the original code is to handle the terminating NUL, but in btrfs
we never need to store the terminating NUL for inline extents nor
file names.

Thus this pitfall in btrfs-convert leads to the above invalid data
extent and fail the test case.

[FIX]
- Fix the ext2 and reiserfs symbolic link creation code
  To remove the terminating NUL.

- Add extra checks for the size of a symbolic link
  Btrfs has extra limits on the size of a symbolic link, as btrfs must
  store symbolic link targets as inlined extents.

  This means for 4K node sized btrfs, the size limit is smaller than the
  usual PATH_MAX - 1 (only around 4000 bytes instead of 4095).

  So for certain nodesize, some filesystems can not be converted to
  btrfs.
  (this should be rare, because the default nodesize is 16K already)

- Split the symbolic link and inline data extent size checks
  For symbolic links the real limit is PATH_MAX - 1 (removing the
  terminating NUL), but for inline data extents the limit is
  sectorsize - 1, which can be different from 4096 - 1 (e.g. 64K sector
  size).

Pull-request: #884
Signed-off-by: Qu Wenruo <wqu@suse.com>
kdave pushed a commit that referenced this pull request Sep 17, 2024
[BUG]
Sometimes test case btrfs/012 fails randomly, with the failure to read a
symlink:

     QA output created by 012
     Checking converted btrfs against the original one:
    -OK
    +readlink: Structure needs cleaning
     Checking saved ext2 image against the original one:
     OK

Furthermore, this will trigger a kernel error message:

 BTRFS critical (device dm-2): regular/prealloc extent found for non-regular inode 133081

[CAUSE]
For that specific inode 133081, the tree dump looks like this:

        item 127 key (133081 INODE_ITEM 0) itemoff 40984 itemsize 160
                generation 1 transid 1 size 4095 nbytes 4096
                block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
                sequence 0 flags 0x0(none)
        item 128 key (133081 INODE_REF 133080) itemoff 40972 itemsize 12
                index 2 namelen 2 name: l3
        item 129 key (133081 EXTENT_DATA 0) itemoff 40919 itemsize 53
                generation 4 type 1 (regular)
                extent data disk byte 2147483648 nr 38080512
                extent data offset 37974016 nr 4096 ram 38080512
                extent compression 0 (none)

Note that, the symlink inode size is 4095 at the max size (PATH_MAX,
removing the terminating NUL).
But the nbytes is 4096, exactly matching the sector size of the btrfs.

Thus it results the creation of a regular extent, but for btrfs we do
not accept a symlink with a regular/preallocated extent, thus kernel
rejects such read and failed the readlink call.

The root cause is in the convert code, where for symlinks we always
create a data extent with its size + 1, causing the above problem.

I guess the original code is to handle the terminating NUL, but in btrfs
we never need to store the terminating NUL for inline extents nor
file names.

Thus this pitfall in btrfs-convert leads to the above invalid data
extent and fail the test case.

[FIX]
- Fix the ext2 and reiserfs symbolic link creation code
  To remove the terminating NUL.

- Add extra checks for the size of a symbolic link
  Btrfs has extra limits on the size of a symbolic link, as btrfs must
  store symbolic link targets as inlined extents.

  This means for 4K node sized btrfs, the size limit is smaller than the
  usual PATH_MAX - 1 (only around 4000 bytes instead of 4095).

  So for certain nodesize, some filesystems can not be converted to
  btrfs.
  (this should be rare, because the default nodesize is 16K already)

- Split the symbolic link and inline data extent size checks
  For symbolic links the real limit is PATH_MAX - 1 (removing the
  terminating NUL), but for inline data extents the limit is
  sectorsize - 1, which can be different from 4096 - 1 (e.g. 64K sector
  size).

Pull-request: #884
Signed-off-by: Qu Wenruo <wqu@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant