Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libstd: Drop RHEL 5 support, Increase Minimum kernel version to 2.6.27 #62516

Closed
josephlr opened this issue Jul 9, 2019 · 21 comments · Fixed by #74163
Closed

libstd: Drop RHEL 5 support, Increase Minimum kernel version to 2.6.27 #62516

josephlr opened this issue Jul 9, 2019 · 21 comments · Fixed by #74163
Labels
C-cleanup Category: PRs that clean code up or issues documenting cleanup. O-linux Operating system: Linux T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@josephlr
Copy link
Contributor

josephlr commented Jul 9, 2019

The docs currently claim that the minimum Linux kernel version for libstd is 2.6.18 (released September 20th, 2006). This is because RHEL 5 used that kernel version. However, RHEL 5 entered ELS on March 31, 2017.

Should we continue to support RHEL 5 for libstd, or should we increase the minimum Linux Kernel version to 2.6.27 (2nd LTS) or 2.6.32 (RHEL 6, 3rd LTS)? List of the relevant kernel versions.

Even bumping the min-version to 2.6.27 would allow us to remove most of the Linux-specific hacks in libstd. Example: the O_CLOEXEC code.

@josephlr
Copy link
Contributor Author

Given that RHEL is the only reason we keep comically old kernel versions around, I would propose that Rust only support RHEL until Maintenance Support ends. This is (essentially) what we already did for RHEL 4. Rust never supported RHEL 4, and set its minimum supported Linux Kernel version back when RHEL 5 still had maintenance support.

It would be nice to get someone from Red Hat or a RHEL customer to comment. This policy would allow us to increment the minimum kernel from 2.6.18 to 2.6.32 (and remove a lot of Linux-specific hacks).

Note that RHEL has a weird (and very long) support system for RHEL 4, 5, and 6 (sources: dates and service details). It has 5 major steps:

  • Full Support
    • Normal support of the OS
  • Maintenance Support 1
    • Bug/Security fixes
    • Limited hardware refresh
  • Maintenance Support 2
    • Bug/Security fixes
    • The end of this phase is considered "Product retirement"
  • Extended Life Cycle Support (ELS)
    • Additional paid product from Red Hat
    • Gives updates for a longer period of time
    • No additional releases/images
  • Extended Life Phase (ELP)
    • No more updates
    • Limited Technical Support
    • End date not given by Red Hat

Current status of RHEL versions:

  • RHEL 4
    • Not supported by Rust
    • Currently in ELP (no end date specified)
    • ELS ended March 31, 2017
  • RHEL 5
    • Supported by Rust
    • Currently in ELS
    • Maintenance Support ended March 31, 2017
    • ELS ends November 30, 2020
  • RHEL 6
    • Supported by Rust
    • Currently in Maintenance Support 2
    • Maintenance Support ends November 30, 2020

@tesuji
Copy link
Contributor

tesuji commented Jul 29, 2019

cc @cuviper which I think they could be related.

@newpavlov
Copy link
Contributor

newpavlov commented Jul 29, 2019

It also may be worth to drop support of Windows XP and Vista (especially considering that panics are broken on XP since June 2016, see: #34538). Previously it was discussed here.

cc @steveklabnik

@briansmith
Copy link
Contributor

Even bumping the min-version to 2.6.27 would allow us to remove most of the Linux-specific hacks in libstd. Example: the O_CLOEXEC code.

Besides O_CLOEXEC, which other things would we be able to clean up?

@josephlr
Copy link
Contributor Author

josephlr commented Jul 29, 2019

Besides O_CLOEXEC, which other things would we be able to clean up?

Good question! There are 6 workarounds in libstd for older Linux versions (that I could find). Increasing the minimum version to 2.6.32 (aka 3rd Kernel LTS, aka RHEL 6) would fix 5 of them. Code links are inline:

As you can see, the workarounds fixed by this proposal all have a similar flavor.

@jonas-schievink jonas-schievink added C-cleanup Category: PRs that clean code up or issues documenting cleanup. O-linux Operating system: Linux T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jul 29, 2019
@cuviper
Copy link
Member

cuviper commented Jul 29, 2019

I am Red Hat's maintainer for Rust on RHEL -- thanks @lzutao for the cc. I try to keep an eye on new issues, but this one slipped past me.

Red Hat only ships the Rust toolchain to customers for RHEL 7 and RHEL 8. If our customers would like to use Rust on older RHEL, they can do so via rustup, and we'll support them in the same way we would for any other third-party software.

Internally we do also build and use rustc on RHEL 6, mostly because it's needed to ship Firefox updates. This is where it gets a little hairy, because each RHEL N is bootstrapped on RHEL N-1 and often kept that way -- meaning a lot of our RHEL 6 builders are still running on a RHEL 5 kernel. I would have to apply local workarounds if upstream Rust loses the ability to run on 2.6.18 kernels. We prefer to keep fixes upstream as much as possible, both for open sharing and to avoid bitrot.

Is there much of a benefit to removing the current workarounds? I agree all of that CLOEXEC code is annoying and cumbersome, but it's already written, and as far as I know it hasn't been a maintenance burden. Do you see otherwise?

If there are any known issues with such Linux compatibility code, I am definitely willing to take assignment for fixing them.

@alex
Copy link
Member

alex commented Jul 29, 2019

(Not a Rust maintainer, but I'm a Rust user and maintain a lot of other OSS software so I have well-developed feelings around supporting Very Old Software :-))

Do you have any sense of on what timeline Rust would be able to drop support for 2.6.18 kernels without causing you pain?

In general I don't think people mind supporting configurations that have users and are painful to work around, but needing to support them for forever is a bitter pill to swallow! Particularly as they get harder and harder to test over time (already I have no idea how to test on really old kernels besides building it myself). So if there was an estimate "we'd like to be able to support this until Q2 2020", even if it's not set in stone, I think that would be very helpful!

@nikic
Copy link
Contributor

nikic commented Jul 29, 2019

@cuviper The other benefit would be that Rust wouldn't have to use CentOS 5 for CI, which means we don't have to patch LLVM (and Emscripten LLVM) to compile on those systems. Of course that's also fairly limited in scope.

@cuviper
Copy link
Member

cuviper commented Jul 29, 2019

@alex

Do you have any sense of on what timeline Rust would be able to drop support for 2.6.18 kernels without causing you pain?

As soon as we stop shipping Firefox and Thunderbird updates on RHEL 6, I won't need Rust there anymore. AFAIK this does correspond to the end of Maintenance Support, approximately November 30, 2020. Then I'll be on to RHEL 7 builders as my minimum, probably still with some RHEL 6 2.6.32 kernels involved.

@nikic

The other benefit would be that Rust wouldn't have to use CentOS 5 for CI

It should be fine for me if we update CI to CentOS 6. This is mostly concerned with how the release binaries are linked to GLIBC symbol versions, which is all in userspace. It's a broader community question whether any other Rust users still care about running on RHEL or CentOS 5.

(Small caveat - glibc support for a symbol can still return ENOSYS, as noted for pipe2.)

Centril added a commit to Centril/rust that referenced this issue Jul 29, 2019
vxworks: Remove Linux-specific comments.

It looks like the VxWorks fork inadvertently left in some Linux-specific workaround comments in `libstd`, these can be removed. Came up when looking into rust-lang#62516

CC:  @BaoshanPang
@josephlr
Copy link
Contributor Author

@cuviper I noticed that Red Hat also has Extended Life Cycle Support (ELS) for RHEL 6 until June 30, 2024. Will you need RHEL 5 to work during this time? I don't know how ELS works with Firefox updates. Also, is there any reason RHEL 4 support wasn't an issue prior to March 31, 2017 (while RHEL 5 was still normally supported)?

This issue came up for me when dealing with opening files in rust-random/getrandom#58 , see #62082 (comment) for more info. No single RHEL 5 issue is that bad, it's mainly just the sum of a bunch of tiny issues.

@cuviper
Copy link
Member

cuviper commented Jul 29, 2019

I don't know how ELS works with Firefox updates.

I'm not on that team, but AFAIK we don't ship Firefox updates for ELS. The last build I see for RHEL 5 was 45.8.0 on 2017-03-08. Maybe there could be an exception for a severe security issue, but I really doubt we'd rebase to newer Firefox for that, which means Rust requirements won't change.

New Firefox ESR versions do require a newer Rust toolchain too, which is generally why we have to keep up. Otherwise we could just freeze some older compat rustc while upstream moves on.

Also, is there any reason RHEL 4 support wasn't an issue prior to March 31, 2017 (while RHEL 5 was still normally supported)?

Rust wasn't required until Firefox 52.

@josephlr
Copy link
Contributor Author

josephlr commented Jul 29, 2019

@cuviper that makes perfect sense to me, thanks for clarifying.

So the proposed policy would be: Support RHEL N-1 until RHEL N is retired (i.e. ends normal support). This would mean:

  • Supporting RHEL 5 (v2.6.18) until November 30, 2020
  • Supporting RHEL 6 (v2.6.32) until June 30, 2024
  • Supporting RHEL 7 (v3.10) until May, 2029

This is a much longer support tail than any other Linux distros (that I know of), so it would also be the effective minimum kernel version of libstd, rustc, and their dependencies.

How does that sound to people?

EDIT: The alternative to this would be Support RHEL N until it is retired, taking the table above and incrementing the RHEL versions by 1. It would also mean being able to drop RHEL 5 support now.

@cuviper
Copy link
Member

cuviper commented Jul 29, 2019

Support RHEL N-1 until RHEL N is retired
How does that sound to people?

Sure, that's ideal for me. 😄

As mentioned, N-1 is just needed for kernel support, as far as I'm concerned. Userspace concerns like GLIBC symbol versions can stick to currently supported N for my needs.

@briansmith
Copy link
Contributor

Red Hat should backport the features we need (CLOEXEC and getrandom, in particular) to whatever kernels that it wants Rust to support. I don't know any good reason why they don't do so, other than it's cheaper to convince the whole world to keep supporting older kernel versions than it is to do the backports. We should change that dynamic.

Centril added a commit to Centril/rust that referenced this issue Jul 29, 2019
vxworks: Remove Linux-specific comments.

It looks like the VxWorks fork inadvertently left in some Linux-specific workaround comments in `libstd`, these can be removed. Came up when looking into rust-lang#62516

CC:  @BaoshanPang
Centril added a commit to Centril/rust that referenced this issue Jul 30, 2019
vxworks: Remove Linux-specific comments.

It looks like the VxWorks fork inadvertently left in some Linux-specific workaround comments in `libstd`, these can be removed. Came up when looking into rust-lang#62516

CC:  @BaoshanPang
Centril added a commit to Centril/rust that referenced this issue Jul 30, 2019
vxworks: Remove Linux-specific comments.

It looks like the VxWorks fork inadvertently left in some Linux-specific workaround comments in `libstd`, these can be removed. Came up when looking into rust-lang#62516

CC:  @BaoshanPang
@newpavlov
Copy link
Contributor

newpavlov commented Jul 30, 2019

The alternative to this would be Support RHEL N until it is retired

I think we should not officially support retired OS versions (well, maybe with some grace period), so I prefer this option. Supporting 14 year old kernels seems a bit too much to me.

@cuviper

each RHEL N is bootstrapped on RHEL N-1 and often kept that way -- meaning a lot of our RHEL 6 builders are still running on a RHEL 5 kernel

Is it possible to apply kernel patches to those builders to add support for functionality like CLOEXEC? Or build Firefox separately on RHEL 6?

@cuviper
Copy link
Member

cuviper commented Jul 30, 2019

@briansmith

Red Hat should backport the features we need [...]

I think it won't be fruitful for us to debate the merits of stable enterprise kernels, but no, this is not a possibility. An alternative take is "Red Hat should be responsible for the work in Rust to maintain the support they want" -- here I am, ready and willing.

@newpavlov

Is it possible to apply kernel patches to those builders to add support for functionality like CLOEXEC? Or build Firefox separately on RHEL 6?

I definitely can't change those kernels. The ones that are stuck on N-1 are precisely to avoid rocking the boat, and backporting features is a big change. Some of our arches do eventually update the builders to their matching N kernel, but some don't, and I don't know all the reasons.

If the CLOEXEC workarounds are removed, then I will have to reapply them in our own builds. This assumes I keep using only our own binaries -- if I ever have to re-bootstrap from upstream binaries for stage0, I would also have to use some interception (like LD_PRELOAD) to hack in the workarounds. This is all possible, but much worse than the status quo of maintaining support here.

@josephlr
Copy link
Contributor Author

josephlr commented Aug 1, 2019

@cuviper I think this sounds reasonable. We aren't going to be able to update these kernels, it's just a question of when we would drop support for these OSes. Especially given that we wont need RHEL 5 CI support, I think that leaving things as they are is fine.

I opened this issue to verify two things:

  1. That we weren't intending on supporting RHEL 5 "forever", and had a clear date when to drop support.
  2. That someone from Red Hat actively cared about supporting these older kernels.

It looks like both these things are true. We should remove RHEL 5 workarounds after November 30, 2020. This issue can be postponed until then.

EDIT: For our use case in getrandom, we were able to add compatibility for RHEL 5 using two lines of code.

@RalfJung
Copy link
Member

RalfJung commented Aug 27, 2019

Is there much of a benefit to removing the current workarounds? I agree all of that CLOEXEC code is annoying and cumbersome, but it's already written, and as far as I know it hasn't been a maintenance burden. Do you see otherwise?

See rust-random/getrandom#58 for a recent case where supporting such ancient kernels required extra manual work (fixed link to the 2 lines mentioned by @josephlr above).
I am not sure how often that comes up, but probably the hardest part is to even notice that it happens -- and even then those codepaths will likely linger untested.

@cuviper
Copy link
Member

cuviper commented Aug 27, 2019

I am not sure how often that comes up, but probably the hardest part is to even notice that it happens -- and even then those codepaths will likely linger untested.

Yes, but this will still be true in general. If the CI kernel is newer than whatever kernel baseline we choose, it will be possible for newer syscalls to pass undetected. I don't know if we have any control over that -- can we choose a particular VM image?

So that leaves us with a stated policy of support. The more that developers are aware of such issues, the better, but I'm the one most likely to notice when my build actually fails. If I come back with a fix, I at least need the project to be receptive.

I haven't been in the habit of building nightly on our RHEL6 builders, but maybe I should! Catching this early is better for everyone involved...

@josephlr
Copy link
Contributor Author

I haven't been in the habit of building nightly on our RHEL6 builders, but maybe I should! Catching this early is better for everyone involved...

That seems reasonable. How quick is the Rust related bootstrapping? Could it be run automatically (or more frequently)?

See rust-random/getrandom#58 for a recent case where supporting such ancient kernels required extra manual work..

This is an interesting point. If I hadn't made that change, getrandom would have kept working on RHEL 5, it just would have leaked an FD. I'm not sure how we would even test for this sort of thing. This isn't a RHEL 5 specific concern, I just don't know how Rust tests these types of resource leak issues in general.

@cuviper
Copy link
Member

cuviper commented Aug 27, 2019

That seems reasonable. How quick is the Rust related bootstrapping? Could it be run automatically (or more frequently)?

About 2 hours. Automation is harder, since I believe our build system needs my credentials, but I'll look into what is possible here.

If I hadn't made that change, getrandom would have kept working on RHEL 5, it just would have leaked an FD. I'm not sure how we would even test for this sort of thing.

Yeah, that's a lot more subtle than an ENOSYS!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-cleanup Category: PRs that clean code up or issues documenting cleanup. O-linux Operating system: Linux T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
9 participants