Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix statfs(2) for 32-bit user space #7937

Merged
merged 1 commit into from
Sep 25, 2018
Merged

Conversation

behlendorf
Copy link
Contributor

@behlendorf behlendorf commented Sep 20, 2018

Motivation and Context

Issues #7122 and #7927.

Description

When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.

This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled when
needing to fit in a 32-bit unsigned long.

Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are not changed to the
internal ZFS limits.

How Has This Been Tested?

Starting with a clean Ubuntu 18.04 x86_64 install I added the
32-bit user space compatibility libraries.

$ sudo dpkg --add-architecture i386
$ sudo apt update
$ sudo apt install libc6:i386

Next I manually extracted the 32-bit df(1) binary from the
coreutils_8.26-3ubuntu4_i386.deb package as a test application.
With an unmodified version of ZFS I was able to immediately
reproduce the issue.

$ strace -e trace=statfs64 ./df -Ti /mnt/zfs
strace: [ Process PID=15191 runs in 32 bit mode. ]
statfs64("/mnt/zfs", 84, 0xffe9e358)    = -1 EOVERFLOW (Value too large for defined data type)
./df: /mnt/zfs: Value too large for defined data type
+++ exited with 1 +++

After applying the proposed fix which scales the block and file
counts the EOVERFLOW was prevented as expected.

$ strace -e trace=statfs64 ./df -Ti /mnt/zfs /mnt/ext4
strace: [ Process PID=16839 runs in 32 bit mode. ]
statfs64("/mnt/zfs", 84, {f_type=ZFS_SUPER_MAGIC, f_bsize=131072, f_blocks=24252411, f_bfree=24252410, f_bavail=24252410, f_files=4294967295, f_ffree=4294966286, f_fsid={val=[560721052, 10632861]}, f_namelen=255, f_frsize=131072, f_flags=ST_VALID}) = 0
statfs64("/mnt/ext4", 84, {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=792403690, f_bfree=792381155, f_bavail=752111741, f_files=201326592, f_ffree=201326581, f_fsid={val=[1255411960, 3511091618]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
Filesystem     Type     Inodes IUsed      IFree IUse% Mounted on
tank           zfs  4294967295  1009 4294966286    1% /mnt/zfs
/dev/loop20    ext4  201326592    11  201326581    1% /mnt/ext4
+++ exited with 0 +++

Note that using the 64-bit system df(1) the reported file counts are
not capped.

$ df -Ti /mnt/zfs /mnt/ext4
Filesystem     Type     Inodes IUsed      IFree IUse% Mounted on
tank           zfs  6208618183  1009 6208617174    1% /mnt/zfs
/dev/loop20    ext4  201326592    11  201326581    1% /mnt/ext4

Finally, I manually verify the block size and count scaling by changing
the filesystems record size to 512b forcing an overflow. The reported
block size was dynamically scaled to 1k as required to prevent the overflow.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@behlendorf behlendorf added Type: Architecture Indicates an issue is specific to a single processor architecture Status: Code Review Needed Ready for review and testing labels Sep 20, 2018
@behlendorf behlendorf requested a review from ryao September 20, 2018 23:12
When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.

This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled as
needed to fit in a 32-bit unsigned long.

Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are no changes to the
internal ZFS limits.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Copy link
Contributor

@ryao ryao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behlendorf This patch will not fix #7927. Reports on Valve’s bug tracker and the one that I received privately show that it has issues with >2TB of space independently of the recordsize, which is reported to userspace through statfs. This behavior is consistent across both ZFS and ext4, which report different block/fragment sizes by default. Others have also observed that it is using the libc function statvfs64, so in principle, it could handle 64-bit numbers. This leads me to believe that it is scaling the free block count by the reported block size while forcing it to be a 32-bit integer. That means that we cannot address the issue by scaling things in the other direction.

If this is done for the other issue, I suggest implementing a module parameter to turn it on or off. You could call it zpl_quirks and make it a bit mask.

@ryao
Copy link
Contributor

ryao commented Sep 24, 2018

@behlendorf @adilger if the software is built with -D_FILE_OFFSET_BITS=64 and gets an EOVERFLOW anyway, then the issue is likely the mainline bug fixed by this upstream patch:

https://patchwork.kernel.org/patch/9987759/

I had noticed that the reports in ValveSoftware/steam-for-linux#4982 had EOVERFLOW being emitted in situations where it should have been impossible when looking at the kernel code while other users reported no EOVERFLOW under seemingly the same conditions. That initially made no sense to me. That mainline patch explains it. The users with EOVERFLOW have kernels that lack it while the users without EOVERFLOW have kernels with it. The reports of EOVERFLOW made no sense to me because I had been looking at a version of the kernel code that had that patch.

I am certain that this will not fix #7927 (which is tracking a different issue where userspace goofs in a way that this would not fix). Although I have not looked, I expect both coreutils and nfsutils to have properly implemsnted 64-bit file offset support on 32-bit systems a longtime ago and I suspect that #7122 is that mainline bug. One of us would need to look to be certain, but that task can be tracked in #7122

I am closing this because we lack of evidence that it can fix an issue that is not addressed by a kernel update. We can revisit it if we find an example of software breaking that this addresses in a way that has not been addressed by an update to another component.

@ryao ryao closed this Sep 24, 2018
@behlendorf behlendorf reopened this Sep 24, 2018
@behlendorf
Copy link
Contributor Author

@ryao it sounds like there may be multiple issues at play here, the least we can do is address this one. The patch is specifically intended to address the -D_FILE_OFFSET_BITS=32 case, as described above. This case is not handled by the referenced kernel patch, but even if it were we'd want to pick it up for older kernels.

static int put_compat_statfs64(struct compat_statfs64 __user *ubuf, struct kstatfs *kbuf)
{
        struct compat_statfs64 buf;
        if (sizeof(ubuf->f_bsize) == 4) {
                if ((kbuf->f_type | kbuf->f_bsize | kbuf->f_namelen |
                     kbuf->f_frsize | kbuf->f_flags) & 0xffffffff00000000ULL)
                        return -EOVERFLOW;
                /* f_files and f_ffree may be -1; it's okay
                 * to stuff that into 32 bits */
>>>             if (kbuf->f_files != 0xffffffffffffffffULL
>>>              && (kbuf->f_files & 0xffffffff00000000ULL))
>>>                     return -EOVERFLOW;
>>>             if (kbuf->f_ffree != 0xffffffffffffffffULL
>>>              && (kbuf->f_ffree & 0xffffffff00000000ULL))
>>>                     return -EOVERFLOW;
        }

Specifically, the most common way for this issue to manifest itself is an overflow of ->f_files. This overflow happens relatively easily with ZFS because the number of total files is calculated by dividing the available free space by the minimum dnode size. It only takes an empty 2TB pool to reach the limit. I agree that overflowing the block count is far more unlikely, but there's no reason not to handle it.

I've reopened the PR because this resolves a real issue, which is easy to reproduce.

@ryao
Copy link
Contributor

ryao commented Sep 24, 2018

@behlendorf Alright, although I am not convinced that we know of any userland software that benefits from this in practice. It would be cleaner to have a dataset level option for controlling how things are reported to userspace. That would have a more predictable behavior.

If we go this route, I suggest putting it behind a module parameter. The information from statfs is used by userspace to try to optimize IO transfers. Changing it under userspace programs programmatically feels like it has the potential to cause headaches in unexpected ways.

@behlendorf
Copy link
Contributor Author

Any Linux system with a 32-bit user space will benefit from this. I think that Ubuntu 18.04 providing all the 32-bit libraries from their repositories for a multi-lib system is pretty good evidence there's still a need for this. Although, I admit that it is more of an endangered species these days.

I suggest putting it behind a module parameter.

I'd rather not, we have more than enough of those, and this is functionality which needs to just work. This change will only impact 32-bit compat callers, and it is already wrapped with unlikely() so there's virtual no cost.

@ryao
Copy link
Contributor

ryao commented Sep 24, 2018

@behlendorf It is true that Ubuntu is shipping 32-bit libraries, but those libraries should be immune to this. They are designed to be built for both 32-bit and 64-bit architectures. When built for a 32-bit architecture, the build system should always pass -D_FILE_OFFSET_BITS=64 to ensure parity with the 64-bit versions. If there is a single library that Ubuntu is shipping that is not built with -D_FILE_OFFSET_BITS=64, then that is likely a bug.

The only software that would benefit from this is software built before a 64-bit statvfs was a thing. To my knowledge, such software is not being shipped by any distribution. Also, we do not have any examples of such software being used in production. We can address this, but it is an extreme edge case. I feel that such an edge case belongs behind a kernel parameter.

I do not like the idea of implementing a workaround without confirming that someone out there benefits from it. Once it is in the codebase, we could find ourselves in a position of supporting it forever should someone come to depend on it. At the very least, could we find a piece of software in any production distribution that benefits from this first?

@behlendorf
Copy link
Contributor Author

@ryao I tested this fix while developing it on Ubuntu 18:04 by adding the 32-bit architecture, installing the libraries, and then using the df(1) from coreutils_8.26-3ubuntu4_i386.deb. I happened to pick df(1), but any 32-bit binary which calls statfs64 will not work properly. And it should according to the multi-arch packaging guidelines set by Debian and used by Ubuntu. RHEL/CentOS handle things slightly differently (of course) but the kernel side fix is the same.

https://wiki.debian.org/Multiarch/HOWTO

$ sudo dpkg --add-architecture i386
$ sudo apt update
$ sudo apt install libc6:i386

I'm not sure I fully understand your objections here either. In the vast majority of cases the only thing we'll be reporting differently is the total number of available objects. Since this is already a calculated value (freespace / DNODE_SIZE) there never was a right answer to report. As for the reported block size and count, assuming the default block size no downshifting will occur until the you reach a 512TiB filesystem. Hopefully, very few people are using 32-bit applications on filesystems this large but even if they choose to this will continue to work.

@ryao
Copy link
Contributor

ryao commented Sep 24, 2018

@behlendorf Is this patch part of your kernel?

https://patchwork.kernel.org/patch/9987759/

If it is, then I am surprised. I was under the impression that the 32-bit df would be built with -D_FILE_OFFSET_BITS=64 so that the machine word size would not determine the size of supported filesystems. If that is case, I would need to dig into this to learn why.

Copy link
Contributor

@ryao ryao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behlendorf Disregard my remarks. I see what I had misunderstood. Let’s go with this.

@behlendorf
Copy link
Contributor Author

It was the Ubuntu LTS kernel, 4.15.0-34-generic. Great, I'm glad we could hash this out. Thanks for making the time to fully review this.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 25, 2018
@behlendorf behlendorf merged commit e897a23 into openzfs:master Sep 25, 2018
@behlendorf behlendorf added Status: Completed and removed Status: Accepted Ready to integrate (reviewed, tested) labels Sep 25, 2018
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Completed labels Sep 25, 2018
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Oct 31, 2018
When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.

This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled as
needed to fit in a 32-bit unsigned long.

Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are no changes to the
internal ZFS limits.

Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7927
Closes openzfs#7122
Closes openzfs#7937
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Nov 5, 2018
When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.

This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled as
needed to fit in a 32-bit unsigned long.

Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are no changes to the
internal ZFS limits.

Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7927
Closes openzfs#7122
Closes openzfs#7937
tonyhutter pushed a commit that referenced this pull request Nov 13, 2018
When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.

This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled as
needed to fit in a 32-bit unsigned long.

Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are no changes to the
internal ZFS limits.

Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7927
Closes #7122
Closes #7937
GregorKopka pushed a commit to GregorKopka/zfs that referenced this pull request Jan 7, 2019
When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.

This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled as
needed to fit in a 32-bit unsigned long.

Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are no changes to the
internal ZFS limits.

Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#7927 
Closes openzfs#7122 
Closes openzfs#7937
@behlendorf behlendorf deleted the issue-7122 branch April 19, 2021 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Architecture Indicates an issue is specific to a single processor architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants