Skip to content
This repository has been archived by the owner on May 21, 2019. It is now read-only.

powi only: don't override arm calling convention #26

Merged
merged 2 commits into from
Nov 10, 2016

Conversation

TimNN
Copy link
Collaborator

@TimNN TimNN commented Nov 8, 2016

This should be a short term fix for rust-lang/rust#37630, while not regressing rust-lang/rust#37559.

cc @alexcrichton, @japaric, #25

@alexcrichton
Copy link
Member

Thanks! With this strategy, though, I feel like we still may have not settled this ABI issue. Clearly the compiler and compiler-rt are disagreeing about the ABI of these intrinsics, but we should figure out how to get them to agree on all of them because in theory clang does something here that's correct which we should be mirroring.

So previously the compiler and compiler-rt disagreed about the ABI of powi, so we assumed they disagreed about all of them. It looks like some assembly at least is the aapcs ABI which the compiler now would not be expecting. In summary this PR is assuming that powi is the only intrinsic that we disagreed on ABI with, right?

Do we have any information leading us to that conclusion? Or are we basically just in "fix regressions as we see them" mode? (just trying to get a handle on the current state of affairs)

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 8, 2016

This is the >>"fix regressions as we see them" mode<< right now. As I see the situation, this affects beta and is about to hit stable.

A complete fix would probably be more difficult and involve patching llvm, which would take a much longer time.

Although it would of course be interesting to know, what the exceptional situation is: "Everything is HF except [...]" or "everything is SF except [...]".

An alternative fix would probably be to revert the previous patch and just don't link agains compiler-rt on arm (hf?) targets, restoring the old (and apparently working) behaviour of linking agains gcc_s, which should give us the required time to investigate the llvm situation.

@alexcrichton
Copy link
Member

Hm so the timeliness here may not be too much of a factor. Stable builds are already branched and underway, so we're not going to fix that with the 1.13.0 release (I believe). In that sense I wouldn't mind taking some more time to investigate what's going on here and dig into the disagreement.

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 8, 2016

Sure, that's fine with me, although the new regression (failure to parse floats properly) seems much worse to me than the old regression (failure of powi).

@alexcrichton
Copy link
Member

cc @brson, thoughts about the regression status? summary:

  • We backported the ABI change to compiler-rt to the old beta branch, which just completed the 1.13.0 build
  • This fixed the powi regression but introduced a new regression (failure to parse floats). Note that the regressions here are only on hard-float ARM I believe
  • This PR fixes the two known regressions, but very likely still regressed behavior from old stable to the stable we're about to release. It's unknown what those regressions would look like.

We could either backport this PR and make a new stable build, or punt and fix this problem in beta. I'm tempted to fix this problem in beta (a non-tier-1 platform is not worth the amount of trouble to delay everything, just a very good thing to fix)

@brson
Copy link

brson commented Nov 8, 2016

I think we have to accept rust-lang/rust#37630 as a known regression for 1.13. There's a lot of risk making any changes now, and as you say, it's a tier 2 platform.

@cuviper
Copy link
Member

cuviper commented Nov 8, 2016

I suspect all of the rust-declared float intrinsics will have this problem, not just powi. But I don't have easy access to try others -- @TimNN can you test a few and see?

Even though it's Tier 2, I'll be blocked from shipping this to Fedora until there's a fix. I've been trying to enable all archs for the eventual Firefox oxidation, so Tier 1 x86-only is too restrictive. But, if we do get a proper fix for this in the next beta, I can definitely backport that for our 1.13 build.

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 8, 2016

On a nightly before my original fix (nightly-2016-10-28):

I tested (forf32 and f64) sqrt, powi, sin, cos, pow: only powi was affected.

Note that powi was only affected when called through a wrapper function: A direct call to the intrinsic (use --cfg direct) resulted in the correct results. Edit: the direct call was optimized out.

This was using cross-compiling + qemu on Ubuntu.

test code.

@cuviper
Copy link
Member

cuviper commented Nov 9, 2016

Interesting! Maybe the ABI gets lost/corrupted when inlining that wrapper? And how about the same test on a nightly that does have your fix -- then are the direct and wrapper versions both ok?

I'd still feel more comfortable if we knew why the other calls are ok -- are they perhaps inlining direct instructions instead of making a real function call?

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 9, 2016

There is not yet a nightly with my fix, however I did test on the latest beta:

On the latest beta things worked fine for the above mentioned functions -- both with and without a wrapper function.

I also added tests for __floatundidf. The only way to trigger that seemed to be to use a cast (1u64 as f64).

That failed for u64 -> f64 but worked for u32 -> f32 (this is on beta). The nightlies without my fix always produced the correct result.

@alexcrichton
Copy link
Member

@TimNN the functions you mentioned here all come from libc, right? not compiler-rt? (except for powi). In that sense I wouldn't expect a calling convention mismatch as we're calling those functions ourselves and LLVM isn't lowering intrinsics.

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 9, 2016

@alexcrichton: I used the intrinsics from libcore/src/intrinsics.rs (and expected that they would be directed to compiler-rt, although that assumption seems to be false then).

@alexcrichton
Copy link
Member

Wow good point. I don't really know what LLVM is doing with intrinsics. Surely there's someone from LLVM who either knows what's up here or could help us...

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 9, 2016

One thing that I've been thinking about is that the now failing function (__floatundidf) is triggered by a simple cast from u64 to f64.

This generates ir like this: %1 = uitofp i64 %0 to double, which generates a call to __floatundidf.

The powi ir looks like this instead: %2 = call double @llvm.powi.f64(double %0, i32 %1) and generates a call to __powidf2.

This was tested on beta, test code, ir and asm.

@cuviper
Copy link
Member

cuviper commented Nov 9, 2016

@TimNN I'm looking disassembly of your test's --emit=obj. Unoptimized, I see direct vsqrt.f32 and vsqrt.f64, calls to compiler-rt __powisf2 and __powidf2, and calls to libm sinf, sin, cosf, cos, powf, pow. That's wrapped; with --cfg direct I still see the vsqrt and libm calls, but the powi is gone! (probably inlined+const-folded, even though it's unoptimized)

With optimized code, they all disappear, whether wrapped or direct, so I guess LLVM is const-folding those intrinsics entirely away.

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 9, 2016

@cuviper: Yeah, I just noticed the powi behaviour as well (for the smaller test from my previous comment). This could probably be fixed by using some blackboxes in the old test case as well.

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 9, 2016

I added blackbox calls to the old test as well and now powi fails in --direct mode as well.

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 9, 2016

Note that the u32 -> f32|f64 cast works because there's an instruction for that (thus those don't hit __floatundidf).

@alexcrichton
Copy link
Member

@TimNN so I think the next step here is to answer the question of what LLVM expects each ABI to be and why. LLVM seems to be clearly expecting not aapcs for powi, but it does aapcs for casts. It seems to not be clear why, and could perhaps indicate some deeper problems?

Would you be up for investigating that? (I also can do so)

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 10, 2016

@alexcrichton: I already looked a bit at llvm, but I find the llvm source to be... not easily approachable. I'll try to look some more, but don't know how much time to do so I will have in the near future.


An interesting place to look at seems to be ARMISelLowering.cpp, especially the ARMTargetLowering::ARMTargetLowering method. The referenced document seems to be http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf.

Edit: more specifically, these lines.

@alexcrichton
Copy link
Member

Ok, I've done a bit of investigation, and here's what I've come up with. Following @TimNN's lead the relevant set of declarations seems fairly small and targeted. Furthermore, it basically seems to follow the pattern that anything with the __aeabi_ prefix uses the AAPCS calling convention where other intrinsics (like powisf2) use the native calling convention (they're not explicitly listed there).

Next up, it turns out that a lot of intrinsics in compiler-rt have aliases defined for them through the ARM_EABI_FNALIAS macro. That means that intrinsics like __floatundidf mentioned above aren't literally called themselves, but rather through the __aeabi_ alias.

Finally, if we take a look at all intrinsics that are interested in floats, ignore all those with aliases, and ignore all those related to 128-bit floats, we're only left with two we're interested in: powidf2 and powisf2. To compound on that, I couldn't get clang at least to call powisf2, only powidf2.

So to sum that up:

  • Compiling most intrinsics with AAPCS seems good because they have __aeabi_ aliases which LLVM seems to prefer calling. Additionally LLVM seems to expect __aeabi_ functions to have the AAPCS calling convention.
  • In terms of intrinsics without aliases we're only interested in powidf2 and powisf2, which are miscompiled on compiler-rt's master branch with the AAPCS calling convention (or so it seems).

tl;dr; @TimNN this patch I now believe to be correct. Odd though it may be!

@alexcrichton alexcrichton merged commit 3bc0272 into rust-lang:rust-llvm-2016-07-18 Nov 10, 2016
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Nov 10, 2016
This update of compiler-rt includes rust-lang/compiler-rt#26 which provides a
targeted fix to the powisf2 intrinsics to keep rust-lang#37559 fixed but also address
the new issue of rust-lang#37630. I've also [written up my thoughts][1] on why it appears
that this is the correct fix for now (hoepfully at least).

Closes rust-lang#37630

[1]: rust-lang/compiler-rt#26 (comment)
@TimNN TimNN deleted the arm-cc branch November 10, 2016 17:25
bors added a commit to rust-lang/rust that referenced this pull request Nov 11, 2016
std: Update compiler-rt for more ABI fixes

This update of compiler-rt includes rust-lang/compiler-rt#26 which provides a
targeted fix to the powisf2 intrinsics to keep #37559 fixed but also address
the new issue of #37630. I've also [written up my thoughts][1] on why it appears
that this is the correct fix for now (hoepfully at least).

Closes #37630

[1]: rust-lang/compiler-rt#26 (comment)
@parched
Copy link

parched commented Nov 11, 2016

@alexcrichton Yes this looks correct for now, but note that upstream LLVM now uses AAPCS for all arm intrinsics llvm-mirror/llvm@c4c2318

Although that seems to conflict with what GCC does https://gcc.gnu.org/ml/gcc/2016-09/msg00162.html where they use AAPCS for all __aeabi_* functions but use the default CC for others.

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 11, 2016

Note the linked bug report: https://llvm.org/bugs/show_bug.cgi?id=30543.

@alexcrichton
Copy link
Member

Ok, sounds like we should perhaps go back to AAPCS everywhere when updating to LLVM 4.0? Seems like something good to watch out for!

@cuviper
Copy link
Member

cuviper commented Nov 11, 2016

That will have to be conditional, so long as we support pre-4.0 external LLVM.

@cuviper
Copy link
Member

cuviper commented Nov 11, 2016

Or maybe powi should be a priority for the project to rewrite builtins in Rust. It's a small pure-C implementation -- should be easy to translate.

@TimNN
Copy link
Collaborator Author

TimNN commented Nov 11, 2016

@cuviper: The powi intrinsics have already been ported, see https://github.com/rust-lang-nursery/compiler-builtins/blob/master/src/float/pow.rs (although the whole rust-lang-nursery/compiler-builtins has not yet been integrated into rust-lang/rust).

@cuviper
Copy link
Member

cuviper commented Nov 11, 2016

Ah, great! Hopefully that integration can beat LLVM 4.0's release, and we won't have to worry about it.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Nov 16, 2016
This update of compiler-rt includes rust-lang/compiler-rt#26 which provides a
targeted fix to the powisf2 intrinsics to keep rust-lang#37559 fixed but also address
the new issue of rust-lang#37630. I've also [written up my thoughts][1] on why it appears
that this is the correct fix for now (hoepfully at least).

Closes rust-lang#37630

[1]: rust-lang/compiler-rt#26 (comment)
TimNN pushed a commit that referenced this pull request Apr 23, 2017
powi only: don't override arm calling convention
TimNN added a commit that referenced this pull request Jul 20, 2017
powi only: don't override arm calling convention
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants