Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android NDK r25b changes will break developers using r22b or older #103673

Closed
alex-pinkus opened this issue Oct 28, 2022 · 52 comments
Closed

Android NDK r25b changes will break developers using r22b or older #103673

alex-pinkus opened this issue Oct 28, 2022 · 52 comments
Assignees
Labels
regression-from-stable-to-beta Performance or correctness regression from stable to beta.
Milestone

Comments

@alex-pinkus
Copy link

alex-pinkus commented Oct 28, 2022

Discussion continued from #102332.

Background

Changes that were merged in #102332, slated for 1.66.0, will update the CI build scripts so that all Android targets1 use the r25b release2 of the Android NDK to build Rust binaries, moving from r15c. The r25b release is the newest version of the NDK at time of writing (Oct 2022), and is a Long Term Support release (LTS)3. r15c predates LTS designation, is over five years old, and is past the end of its support life. Rust's Android platform support document4, merged on 24 September 2022, indicates support for the most recent LTS release of the NDK, so the changes in #102332 are in line with that document.

Currently, Rust developers who wish to use an android-ndk toolchain newer than r22b must use build-time hacks to do so. This is because the Rust standard library uses libgcc for its unwinder implementation on Android, but libgcc is not included in new versions of the NDK. When using r23b or newer in a Rust library, a developer will see ld: error: unable to find library -lgcc. #85806 added logic to select between libgcc or libunwind, but this logic runs when building Rust itself, not when compiling the downstream application, so it is mainly useful with the nightly flag -Z build-std. Since older versions of the NDK do not have libunwind, and newer versions do not have libgcc, support for r23 (and newer) is mutually exclusive from support for r22 (and older), unless a developer applies a manual workaround5. This workaround, for building r25b, with a binary release that uses libgcc6, involves a fake libgcc.a containing INPUT(-lunwind). One common form of this workaround7 is used in the ring crate and referenced from setup scripts for flutter_rust_bridge; a slightly modified form can be found in cargo-apk.

Problem

When the updated CI script lands in a stable release, the Rust binary releases for Android targets will compute has_unwind=true, and link against libunwind instead of libgcc, causing every project that builds using NDK version r22b or older to fail with the message error: cannot find -lunwind. Because building with r23b or newer requires either nightly Rust or a build-script workaround, it is likely that most Android projects using Rust today are using r22b or older and will therefore be broken by this change. One comment notes that Firefox will be broken8; a quick GitHub search also uncovers libraries like openssl-src among others that build using older NDK versions on CI.

Depending on the version that a project must be upgraded from, going to r23b can be a non-trivial effort. The process for obtaining a toolchain changed in r199, so a developer must rewrite their CI compilation scripts. One commenter noted seeing compilation issues in a codebase containing C++ headers10, although they were able to resolve the issues. These are not reasons to never upgrade, but they indicate that a surprise NDK upgrade may be unwelcome work requiring more than just a version number bump.

Options

Currently, an undetermined (but probably large) subset of the Android Rust ecosystem will find itself broken on December 15th, when Rust version 1.66.0 comes out. What should be done about it? Some ideas, in no particular order:

1. Roll back to version r15c by reverting the change

Reverting the change is the simplest option, and buys an indefinite amount of time to define a more graceful deprecation strategy. Doing so, however, prolongs the usage of a build tool that's well past end of life, and means that any features introduced in the last 5 years aren't available to Rust developers.

2. Downgrade to a newer toolchain that is older than r23

With a net-new change to use r21e (newest compatible previous LTS version, released January 2021) or r22b (released March 2021)11, the binaries produced by CI could take advantage of a significantly newer toolchain and retain compatibility with older toolchains downstream. A Rust binary compiled using r22b can be linked into an application using a toolchain as old as r10e (and likely older, but I didn't have an older NDK at hand).

However, both r21e and r22b are still considered obsolete and unsupported. Upgrading to one of these versions means that breakage will still occur later, when we do decide to upgrade.

3. Warn users about breakage over a longer time period

On the r25b pull request, @jonhoo mentioned that a blog post would help alert users to the change. @jonhoo compared this to the change in minimum linux-gnu versions that happened in August12 and the accompanying announcement13. That change was pre-warned over several releases; in order to take a similar action here, the r25b PR would need to be reverted. While this impacts just the Android targets, rather than tier-1 linux targets as the glibc bump did, the impact on those developers is much more severe. By raising awareness of the issue and defining a timeline for the change, the blog post would help to reduce impact and limit unpleasant surprises.

4. Make no change, but still warn users about the looming breakage

Even if no action is taken to delay this change, a blog post will be a prudent way to alert developers that the change is happening in six weeks. If it is safe to recommend the INPUT(-lunwind) workaround for broad use, the blog post could recommend immediate migration to r23 or newer before the change hits stable.


Conclusion

As time marches on, developer tools evolve, and long-term support versions fade into unsupported obsolescence. It's natural and expected for Rust to drop support for older releases of the Android NDK, as with any other toolchain or environment that has passed end of life.

However, the current implementation will require a big-bang migration to r23+ for any project wishing to adopt 1.66.0. Some developers may have adopted r23, if they are attentive to Rust community workarounds, or brave enough to run their CI only on nightly. It's likely that many have not. In the interest of preserving existing project compatibility, it may be prudent to attempt a graceful migration to libunwind wherever possible, which can be done after still upgrading to a version that's newer than the current one. By giving developers ample warning that allows them to adopt today's workaround, the Rust project can keep the ecosystem current without causing sudden surprises.

Footnotes

  1. Android targets: arm-linux-androideabi, armv7-linux-androideabi, thumbv7neon-linux-androideabi-ndk, i686-linux-android-ndk, aarch64-linux-android, x86_64-linux-android

  2. Android NDK r25 revision history.

  3. NDK release process, detailing the implications of LTS releases.

  4. Platform support for *-linux-android and *-linux-androideabi and PR: Add a platform support document for Android #101780

  5. libunwind availability in the Android NDK: Screen Shot 2022-10-27 at 9 18 15 PM

  6. Apparent first attestation of the INPUT(-lunwind) workaround

  7. echo "INPUT(-lunwind)" search results on GitHub

  8. Comment from @glandium indicating Firefox breakage

  9. Standalone toolchain (deprecated) documentation

  10. Comment reporting compilation errors caused by the upgrade

  11. In order to use r22b, the logic in library/unwind/build.rs will also need to prefer libgcc over libunwind, since r22b includes libunwind.a within the C++ STL in arm-linux-androideabi only, for reasons that are not clear.

  12. https://github.com/rust-lang/rust/pull/95026

  13. Increasing the glibc and Linux kernel requirements

@Mark-Simulacrum Mark-Simulacrum added I-compiler-nominated Nominated for discussion during a compiler team meeting. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Oct 28, 2022
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 28, 2022
@Mark-Simulacrum Mark-Simulacrum modified the milestones: 1.65.0, 1.66.0 Oct 28, 2022
@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Oct 28, 2022

Nominating for compiler discussion.

I would probably be somewhat in favor of fully rolling back the bump on soon to be beta 1.66 (we branch today), and landing a bump (edit: on master) only to r21/r22 so we retain compatibility for slightly longer.

We should in parallel issue a blog post with our strategy for this bump, and ideally a strategy for future bumps. It looks like Android's LTS releases are only supported until the next one is issued (once a year roughly) which means there's not really an overlap period in which we can stick with a supported copy until user's migrate. This situation definitely reminds me of our similar challenges on BSDs.

@tmandry
Copy link
Member

tmandry commented Oct 28, 2022

I see a couple issues here:

Difficulty in moving between ndk versions

I think we should strive to match the experience of the ndk itself here. In particular if there is no ndk release with both libgcc and libunwind, the Rust project shouldn't go out of its way to ease the upgrade experience beyond giving people enough warning. (If someone wants to contribute such a soft transition path and it's not a lot of maintenance overhead, that's fine.)

Still, matching the ndk experience requires being in line with the expected upgrade cadence of ndk releases. I think this is ultimately up to the Android target maintainers.

Tying rustup releases to ndk and target API levels

It doesn't seem ideal to have your version of Rust coupled to your target API level, especially on a frequently updating platform such as Android. (Unless you can mix code targeting different API levels in the same binary, in which case it might be fine?) I expect to see similar issues with Fuchsia, where the goal is to update the platform even more frequently.

Thinking out loud a bit, this makes me wonder if we should just point people toward -Z build-std and stabilize the functionality in some capacity. This gives people maximum flexibility to mix and match ndk versions, set target API levels, and manage upgrade paths on their own. On my machine it's not that much overhead (+~17s to initial build time), but I would like to be conscious of the impact to people who are building on slower laptops.

@alex-pinkus
Copy link
Author

alex-pinkus commented Nov 1, 2022

On point 1: Totally agree that the Rust project isn't the right place to ease the android-ndk upgrade experience in general. Is there clear direction somewhere on what "enough warning" means? With the glibc version bump, the blog post was published almost two months in advance.

On point 2: In the general case, at least, rustup does fairly well with staying uncoupled from ndk versions. As I mentioned, a distribution of rust that uses ndk r22b will build fine against an application using as old as r10e, and possibly older. This weird situation is more a problem of the android-ndk than it is a problem of anything Rust has done. (As a side note, with target API levels, I understand the story to be a bit more complex, with the current min of android-14 changing to android-19 with this version bump, which restricts the possible targets the application can declare, but users can declare a lower min or target if their ndk supports it, even if that causes issues for the stdlib).

Rust is fairly unique here because, if you combine points 1 and 2, every project that uses Rust has a dependency by definition that will break them when it upgrades. C++ doesn't have this problem because their stdlib gets distributed as part of the ndk. Theoretically, a random C++ library might also have this problem if they link against libgcc, but no one library is going to have the kind of reach across C++ that the Rust stdlib has across Rust. It's also not uncommon in my experience for Android C++ projects to download their dependencies from source anyway (fresco is a good example) and build them directly.

Pointing people toward a stabilized version of -Z build-std definitely seems like a preferred solution here. Initial builds are going to have other far greater bottlenecks, and as an added bonus, Android developers are often also binary-size-conscious which build-std would help with. It's not clear to me whether you're suggesting that as a prerequisite for the ndk-bump or just as a useful tool in the future. Is build-std far enough along that some form of it could be stabilized in the right timeframe?

@apiraino
Copy link
Contributor

apiraino commented Nov 2, 2022

I think there is enough meat here to fire up an MCP FCP. Similar situation were handled that way. An MCP was suggested in the first place by @jyn514 in this comment.

Anyway, this will be discussed tomorrow. Between the previous discussion on Zulip and this issue comments, I think there is already great context provided.

@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 2, 2022
@Mark-Simulacrum Mark-Simulacrum added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Nov 2, 2022
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 2, 2022
@Mark-Simulacrum Mark-Simulacrum removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 2, 2022
@alex-pinkus
Copy link
Author

@apiraino, just wondering, did the aforementioned discussion take place (sounds like it would have been on Nov 3)? Any outcomes to share?

@apiraino
Copy link
Contributor

apiraino commented Nov 8, 2022

@alex-pinkus correct - the team wanted to discuss this issue but unfortunately time ran out. This week (on Thu, 10th) the topic will be again on the table and hopefully it will be discussed (and a comment posted here for visibility).

@chriswailes
Copy link
Contributor

An announcement sounds good to me, even if that means delaying the update for a bit. However, I feel that updating to a different unsupported NDK would be a step sideways at best. This is mainly because it would buy us very little (as far as I'm aware) and it would cause churn for developers already using custom NDK setups.

Building against an older NDK leaves us with two (related) issues though:

  1. Because the NDK is no longer supported the Rust community needs to paper over any issues
  2. We're unable to test against NDK betas, file patches against the NDK, or request point releases for breaking issues

I'm also concerned about burdening new developers who'd like to use modern NDKs for their projects but are required to take extra steps to do so... but I can't really quantify that in any meaningful way so I guess i'll just mention it.

Going forward I think it'd be ideal to integrate the NDK betas into the nightly/beta rustc builds. This would allow developers time to prepare for the changes, especially if a blog post is timed with the release containing the updated NDK.

As for supporting multiple APIs in the same binary, I've just finished a rough draft of an RFC I hope to PR next week for API version tests in cfg. This should allow for the mixing of code targeting different API levels to live in the same codebase.

@apiraino
Copy link
Contributor

visited during T-compiler meeting on Zulip. Two the main points:

@rustbot label -I-compiler-nominated

@rustbot rustbot removed the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Nov 10, 2022
@chriswailes
Copy link
Contributor

To clarify, how hard is it going to be for developers to update their Rust projects to the new NDK if they are using something newer than r15 but older than r25b? For Android's Rust toolchain I've updated the NDK twice (23->24->25) and haven't had an issue with any of the transitions (no changes needed to build system or source).

@chriswailes
Copy link
Contributor

Sorry to keep popping in here, but do we know why Firefox is stuck on an older version of the NDK? Is it for a reason beyond Rust's dependency on an older version? If Firefox is using an out-of-support NDK for other reasons I think that may be their bigger issue and I'm not sure that's a good reason to holding back on updating the official Rust builds. If, however, Firefox is having to use an older NDK to support using Rust this seems like this would fix the issue for them.

@alex-pinkus
Copy link
Author

With the NDK updates you're referring to for Android's Rust toolchain, you were building in-tree, right? As in, you don't use a prebuilt rust stdlib for the target, you produce one as part of the AOSP build?

I don't have visibility into the firefox project and their reasons, but with the rustup-vended stdlib linking against libgcc, I suspect that a lot of projects are "stuck". While we'll fix that when we complete this upgrade, we'll break all of those projects all at once to do so. The upgrade won't be hard (at worst, it's removing any make_standalone_toolchain invocations, rewriting paths, and fixing the kinds of rare compilation issues that have been reported), but that won't stop it from being disruptive.

@chriswailes
Copy link
Contributor

Yes, we build the stdlib as part of our Rust toolchain distribution. This doesn't occur during the AOSP build itself, we just do a "release" following upstream releases. I can see how the situation you're describing is different.

I 100% agree that we should notify developers about this issue, even if that means delaying the update for a bit. I just wanted to understand the concerns about developer experience a bit more concretely. Thanks for the information.

@glandium
Copy link
Contributor

I don't think it does. The main difference for Firefox is that libunwind is now necessary to build (it's added to the linker command line) while it wasn't before, and libunwind is not in NDKs before... I don't know what version, but it's not in the one we're using.

@smeenai
Copy link

smeenai commented Dec 13, 2022

r23 added and switched to using LLVM libunwind (along with compiler-rt) for all architectures. Prior to that, it was only used for armv7 (and the NDK only provided it for that architecture); other architectures used libgcc for unwinding. https://reviews.llvm.org/D96403 was the relevant Clang change.

@chriswailes
Copy link
Contributor

@glandium I'm not entirely sure what you're asking for here.

As someone currently stuck on r21d, your message doesn't tell me my builds will break with that new version of rust.

I changed the message to indicate that builds will break. Are you also asking for a detailed explanation of why they will break to be included? Again, if you could provide the language you'd like to see in the announcement that would be very helpful.

@dvc94ch
Copy link
Contributor

dvc94ch commented Dec 14, 2022

Why will builds break? Most builds won't break, there's only a few builds that will have an issue

@chriswailes
Copy link
Contributor

It might break due to

workarounds you might have regarding libunwind.

But I can expand on that a bit more. I'm currently creating a pull request to the blog to help move things along.

@chriswailes
Copy link
Contributor

PR for announcement created here.

@alex-pinkus
Copy link
Author

alex-pinkus commented Dec 14, 2022

Thanks for opening up the PR! I'll add my comments there.

I think the breakage @glandium is referring to is the same one I'm concerned about: anyone using a currently-supported NDK version (i.e. r22 or below) will be unable to build until they switch their NDK version. In other words, anyone not currently applying a workaround will be broken.

I'll share a suggestion on the PR for how we might message that without seeming too gloomy. Edit: @chriswailes was too fast for me and already addressed this on the PR -- thank you!

@pnkfelix
Copy link
Member

@rustbot assign @pnkfelix

(the T-compiler had previously talked about needing an FCP here, but I failed to follow up on it at that time. Self assigning now as a forcing function.)

@apiraino apiraino removed the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Dec 15, 2022
@dvc94ch
Copy link
Contributor

dvc94ch commented Dec 15, 2022

the problem with making simple changes to a large open source project is that they're simple enough that everyone can comment on them, have an opinion and bike shed it to death. if this was some complex compiler issue, I'm sure it would have been fixed and released by now and there would have been at most 1-2 people commenting on the PR.

gendx added a commit to gendx/android-rust-library that referenced this issue Jan 5, 2023
This is necessary since upstream reverted compatibility to NDK <= 22: rust-lang/rust#104628.

See the discussion at rust-lang/rust#103673.
@pcc
Copy link
Contributor

pcc commented Mar 2, 2023

This was resolved in the end by accepting the breakage and notifying users via a blog post; see #105716 and rust-lang/blog.rust-lang.org#1055. So I think we can mark this as closed now.

@jyn514 jyn514 closed this as completed Mar 2, 2023
pcc added a commit to pcc/rust that referenced this issue Nov 2, 2023
Linking libgcc is no longer supported (see rust-lang#103673), so remove the
related link attributes and the check in unwind's build.rs. The check
was the last remaining significant piece of logic in build.rs, so
remove build.rs as well.
pcc added a commit to pcc/rust that referenced this issue Nov 3, 2023
Linking libgcc is no longer supported (see rust-lang#103673), so remove the
related link attributes and the check in unwind's build.rs. The check
was the last remaining significant piece of logic in build.rs, so
remove build.rs as well.
pcc added a commit to pcc/rust that referenced this issue Nov 3, 2023
Linking libgcc is no longer supported (see rust-lang#103673), so remove the
related link attributes and the check in unwind's build.rs. The check
was the last remaining significant piece of logic in build.rs, so
remove build.rs as well.
pcc added a commit to pcc/rust that referenced this issue Nov 3, 2023
Linking libgcc is no longer supported (see rust-lang#103673), so remove the
related link attributes and the check in unwind's build.rs. The check
was the last remaining significant piece of logic in build.rs, so
remove build.rs as well.
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 5, 2023
…imulacrum

Remove obsolete support for linking unwinder on Android

Linking libgcc is no longer supported (see rust-lang#103673), so remove the related link attributes and the check in unwind's build.rs. The check was the last remaining significant piece of logic in build.rs, so remove build.rs as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression-from-stable-to-beta Performance or correctness regression from stable to beta.
Projects
None yet
Development

No branches or pull requests