Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libstd: File::open uses O_CLOEXEC on platforms where it is unsupported #69862

Closed
Tyg13 opened this issue Mar 9, 2020 · 4 comments
Closed

libstd: File::open uses O_CLOEXEC on platforms where it is unsupported #69862

Tyg13 opened this issue Mar 9, 2020 · 4 comments
Labels
C-bug Category: This is a bug. O-linux Operating system: Linux T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Tyg13
Copy link
Contributor

Tyg13 commented Mar 9, 2020

I was originally trying to install the latest Rust stable on Linux 2.6.18-8.el5 i686 (CC rustup#2254), and failing when RUSTUP_HOME or CARGO_HOME were set to my NFS mount (which uses 64 bit inodes).

This led me to (erroneously) file a bug with libc (CC libc#1679), thinking that it was an issue with 64 bit inode support wherein libc did not call the 64-bit version of stat and friends. This was not the case: running strace -f rustup update led me to find this curious sequence of syscalls in the trace output:

strace.log
...
open("/usr/share/terminfo/x/xterm", O_RDONLY|O_LARGEFILE|0x80000) = 3
fcntl64(3, F_GETFD)                     = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
read(3, "\32\0010\0&\0\17\0\235\0014\5xterm|xterm terminal"..., 8192) = 2472
close(3)                                = 0
getcwd("/home/LOCAL/tla/build/libc", 512) = 27
stat64("/home/LOCAL/tla/.local/.rustup", {st_mode=S_IFDIR|0755, st_size=27, ...}) = 0
stat64("/home/LOCAL/tla/.local/.rustup/settings.toml", {st_mode=S_IFREG|0644, st_size=48, ...}) = 0
open("/home/LOCAL/tla/.local/.rustup/settings.toml", O_RDONLY|O_LARGEFILE|0x80000) = -530
...

As you can see: we are passing 0x80000 = O_CLOEXEC to the open call, which leads to an error code when the file in question is on the NFS mount (or perhaps it's a symlink issue -- /home/LOCAL/tla/.local/ is actually a symlink to a directory on the NFS mount). Reproducing the above sequence of syscalls with a small test program written in C shows that this open call only fails when the erroneous O_CLOEXEC argument is passed.

It is my suspicion that this line is the culprit. Comments there suggest that passing this flag on systems where it is unsupported (such as mine) will have no effect. I don't know enough about how the open syscall works to confirm, but it appears that is not the case with my configuration.

@Tyg13 Tyg13 added the C-bug Category: This is a bug. label Mar 9, 2020
@jonas-schievink jonas-schievink added O-linux Operating system: Linux T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Mar 9, 2020
@Tyg13
Copy link
Contributor Author

Tyg13 commented Mar 9, 2020

If I hook open64 with LD_PRELOAD and unset the offending O_CLOEXEC flag, then the open64 calls succeed. So it definitely looks like O_CLOEXEC is the culprit here.

@Tyg13
Copy link
Contributor Author

Tyg13 commented Mar 9, 2020

I want to submit a patch for this, but I'm unsure about how to go about solving this issue.

The first option I can think of is to explicitly check the kernel version by calling libc::uname, and only pass the O_CLOEXEC flag when the equivalent of uname -r reports kernel version >=2.6.23. We could subsume this into the existing ensure_cloexec() function, to ensure we only pass the flag when it's supported, and work around it when it's not. libc::uname should be supported on all platforms where we could run into this issue, but in case this is not a valid solution, we could do the second option.

The second option is something like: "check if open failed, if it did, try again without O_CLOEXEC". This is definitely a slowdown for cases where O_CLOEXEC is supported, but the open call fails for some other reason, but avoids the call to uname. Additionally, this doesn't allow us to cache whether O_CLOEXEC is supported (since the first call could fail and the second succeed for reasons entirely orthogonal to the flags passed).

A third (probably undesirable) option is to just remove O_CLOEXEC from the flags entirely, and always use the mechanism in ensure_cloexec() on platforms where it might not be supported.

Any thoughts, suggestions, comments?

@cuviper
Copy link
Member

cuviper commented Mar 9, 2020

Linux 2.6.18-8.el5 i686

That kernel would be as RHEL 5.0 originally shipped, without any updates at all!

See rhbz313681, "O_ATOMICLOOKUP vs O_CLOEXEC(mainstream kernels) incompatibility". Your strace log even shows the error return -530, corresponding to EWOULDBLOCKIO for that misinterpreted flag. This was fixed in the kernel for RHEL 5.2.


You should also know that there's pressure to raise the minimum kernel support -- see #62516. As a Red Hat engineer, I've pushed back on that so far, but only because I still need RHEL5 kernel support to build for RHEL6 myself. I will likely yield when RHEL6 is retired, November 30, 2020, which is also when RHEL5 leaves extended life-cycle support.

https://access.redhat.com/support/policy/updates/errata#Life_Cycle_Dates

@Tyg13
Copy link
Contributor Author

Tyg13 commented Mar 9, 2020

We are indeed in very unsupported territory. Currently my organization is using RHEL5 with an expired support contract. I will have to look into upgrading our kernel. I hope we will be able to do that without a support contract, though I am not personally familiar with the details. (ETA: a brief perusal of the relevant Red Hat literature suggests that this is not allowed by the RHEL5 licensing agreement. I will have to contact our sysadmin to see if this is still the case, or if the situation can be rectified.)

That being said, it appears that this is not a Rust issue. As such, I think I can safely close all of the above tickets.

@Tyg13 Tyg13 closed this as completed Mar 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-linux Operating system: Linux T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants