-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix statfs(2) for 32-bit user space #7937
Conversation
When handling a 32-bit statfs() system call the returned fields, although 64-bit in the kernel, must be limited to 32-bits or an EOVERFLOW error will be returned. This is less of an issue for block counts since the default reported block size in 128KiB. But since it is possible to set a smaller block size, these values will be scaled as needed to fit in a 32-bit unsigned long. Unlike most other filesystems the total possible file counts are more likely to overflow because they are calculated based on the available free space in the pool. In order to prevent this the reported value must be capped at 2^32-1. This is only for statfs(2) reporting, there are no changes to the internal ZFS limits. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
c3adb47
to
79689a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behlendorf This patch will not fix #7927. Reports on Valve’s bug tracker and the one that I received privately show that it has issues with >2TB of space independently of the recordsize, which is reported to userspace through statfs. This behavior is consistent across both ZFS and ext4, which report different block/fragment sizes by default. Others have also observed that it is using the libc function statvfs64, so in principle, it could handle 64-bit numbers. This leads me to believe that it is scaling the free block count by the reported block size while forcing it to be a 32-bit integer. That means that we cannot address the issue by scaling things in the other direction.
If this is done for the other issue, I suggest implementing a module parameter to turn it on or off. You could call it zpl_quirks and make it a bit mask.
@behlendorf @adilger if the software is built with https://patchwork.kernel.org/patch/9987759/ I had noticed that the reports in ValveSoftware/steam-for-linux#4982 had I am certain that this will not fix #7927 (which is tracking a different issue where userspace goofs in a way that this would not fix). Although I have not looked, I expect both coreutils and nfsutils to have properly implemsnted 64-bit file offset support on 32-bit systems a longtime ago and I suspect that #7122 is that mainline bug. One of us would need to look to be certain, but that task can be tracked in #7122 I am closing this because we lack of evidence that it can fix an issue that is not addressed by a kernel update. We can revisit it if we find an example of software breaking that this addresses in a way that has not been addressed by an update to another component. |
@ryao it sounds like there may be multiple issues at play here, the least we can do is address this one. The patch is specifically intended to address the static int put_compat_statfs64(struct compat_statfs64 __user *ubuf, struct kstatfs *kbuf)
{
struct compat_statfs64 buf;
if (sizeof(ubuf->f_bsize) == 4) {
if ((kbuf->f_type | kbuf->f_bsize | kbuf->f_namelen |
kbuf->f_frsize | kbuf->f_flags) & 0xffffffff00000000ULL)
return -EOVERFLOW;
/* f_files and f_ffree may be -1; it's okay
* to stuff that into 32 bits */
>>> if (kbuf->f_files != 0xffffffffffffffffULL
>>> && (kbuf->f_files & 0xffffffff00000000ULL))
>>> return -EOVERFLOW;
>>> if (kbuf->f_ffree != 0xffffffffffffffffULL
>>> && (kbuf->f_ffree & 0xffffffff00000000ULL))
>>> return -EOVERFLOW;
} Specifically, the most common way for this issue to manifest itself is an overflow of I've reopened the PR because this resolves a real issue, which is easy to reproduce. |
@behlendorf Alright, although I am not convinced that we know of any userland software that benefits from this in practice. It would be cleaner to have a dataset level option for controlling how things are reported to userspace. That would have a more predictable behavior. If we go this route, I suggest putting it behind a module parameter. The information from statfs is used by userspace to try to optimize IO transfers. Changing it under userspace programs programmatically feels like it has the potential to cause headaches in unexpected ways. |
Any Linux system with a 32-bit user space will benefit from this. I think that Ubuntu 18.04 providing all the 32-bit libraries from their repositories for a multi-lib system is pretty good evidence there's still a need for this. Although, I admit that it is more of an endangered species these days.
I'd rather not, we have more than enough of those, and this is functionality which needs to just work. This change will only impact 32-bit compat callers, and it is already wrapped with |
@behlendorf It is true that Ubuntu is shipping 32-bit libraries, but those libraries should be immune to this. They are designed to be built for both 32-bit and 64-bit architectures. When built for a 32-bit architecture, the build system should always pass The only software that would benefit from this is software built before a 64-bit statvfs was a thing. To my knowledge, such software is not being shipped by any distribution. Also, we do not have any examples of such software being used in production. We can address this, but it is an extreme edge case. I feel that such an edge case belongs behind a kernel parameter. I do not like the idea of implementing a workaround without confirming that someone out there benefits from it. Once it is in the codebase, we could find ourselves in a position of supporting it forever should someone come to depend on it. At the very least, could we find a piece of software in any production distribution that benefits from this first? |
@ryao I tested this fix while developing it on Ubuntu 18:04 by adding the 32-bit architecture, installing the libraries, and then using the https://wiki.debian.org/Multiarch/HOWTO
I'm not sure I fully understand your objections here either. In the vast majority of cases the only thing we'll be reporting differently is the total number of available objects. Since this is already a calculated value (freespace / DNODE_SIZE) there never was a right answer to report. As for the reported block size and count, assuming the default block size no downshifting will occur until the you reach a 512TiB filesystem. Hopefully, very few people are using 32-bit applications on filesystems this large but even if they choose to this will continue to work. |
@behlendorf Is this patch part of your kernel? https://patchwork.kernel.org/patch/9987759/ If it is, then I am surprised. I was under the impression that the 32-bit df would be built with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behlendorf Disregard my remarks. I see what I had misunderstood. Let’s go with this.
It was the Ubuntu LTS kernel, 4.15.0-34-generic. Great, I'm glad we could hash this out. Thanks for making the time to fully review this. |
When handling a 32-bit statfs() system call the returned fields, although 64-bit in the kernel, must be limited to 32-bits or an EOVERFLOW error will be returned. This is less of an issue for block counts since the default reported block size in 128KiB. But since it is possible to set a smaller block size, these values will be scaled as needed to fit in a 32-bit unsigned long. Unlike most other filesystems the total possible file counts are more likely to overflow because they are calculated based on the available free space in the pool. In order to prevent this the reported value must be capped at 2^32-1. This is only for statfs(2) reporting, there are no changes to the internal ZFS limits. Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#7927 Closes openzfs#7122 Closes openzfs#7937
When handling a 32-bit statfs() system call the returned fields, although 64-bit in the kernel, must be limited to 32-bits or an EOVERFLOW error will be returned. This is less of an issue for block counts since the default reported block size in 128KiB. But since it is possible to set a smaller block size, these values will be scaled as needed to fit in a 32-bit unsigned long. Unlike most other filesystems the total possible file counts are more likely to overflow because they are calculated based on the available free space in the pool. In order to prevent this the reported value must be capped at 2^32-1. This is only for statfs(2) reporting, there are no changes to the internal ZFS limits. Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#7927 Closes openzfs#7122 Closes openzfs#7937
When handling a 32-bit statfs() system call the returned fields, although 64-bit in the kernel, must be limited to 32-bits or an EOVERFLOW error will be returned. This is less of an issue for block counts since the default reported block size in 128KiB. But since it is possible to set a smaller block size, these values will be scaled as needed to fit in a 32-bit unsigned long. Unlike most other filesystems the total possible file counts are more likely to overflow because they are calculated based on the available free space in the pool. In order to prevent this the reported value must be capped at 2^32-1. This is only for statfs(2) reporting, there are no changes to the internal ZFS limits. Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #7927 Closes #7122 Closes #7937
When handling a 32-bit statfs() system call the returned fields, although 64-bit in the kernel, must be limited to 32-bits or an EOVERFLOW error will be returned. This is less of an issue for block counts since the default reported block size in 128KiB. But since it is possible to set a smaller block size, these values will be scaled as needed to fit in a 32-bit unsigned long. Unlike most other filesystems the total possible file counts are more likely to overflow because they are calculated based on the available free space in the pool. In order to prevent this the reported value must be capped at 2^32-1. This is only for statfs(2) reporting, there are no changes to the internal ZFS limits. Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#7927 Closes openzfs#7122 Closes openzfs#7937
Motivation and Context
Issues #7122 and #7927.
Description
When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW
error will be returned.This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled when
needing to fit in a 32-bit unsigned long.
Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are not changed to the
internal ZFS limits.
How Has This Been Tested?
Starting with a clean Ubuntu 18.04 x86_64 install I added the
32-bit user space compatibility libraries.
Next I manually extracted the 32-bit
df(1)
binary from thecoreutils_8.26-3ubuntu4_i386.deb
package as a test application.With an unmodified version of ZFS I was able to immediately
reproduce the issue.
After applying the proposed fix which scales the block and file
counts the
EOVERFLOW
was prevented as expected.Note that using the 64-bit system
df(1)
the reported file counts arenot capped.
Finally, I manually verify the block size and count scaling by changing
the filesystems record size to 512b forcing an overflow. The reported
block size was dynamically scaled to 1k as required to prevent the overflow.
Types of changes
Checklist:
Signed-off-by
.