-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARM] Emit an error when the hard-float ABI is enabled but can't be used. #111334
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-backend-arm Author: Chris Copeland (chrisnc) ChangesCurrently, compiling for eabihf with a CPU lacking floating-point Fixes #110383. Full diff: https://github.com/llvm/llvm-project/pull/111334.diff 1 Files Affected:
diff --git a/llvm/lib/Target/ARM/ARMTargetMachine.cpp b/llvm/lib/Target/ARM/ARMTargetMachine.cpp
index 7553778c574033..82927a7430b5d0 100644
--- a/llvm/lib/Target/ARM/ARMTargetMachine.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetMachine.cpp
@@ -309,11 +309,15 @@ ARMBaseTargetMachine::getSubtargetImpl(const Function &F) const {
// function that reside in TargetOptions.
resetTargetOptions(F);
I = std::make_unique<ARMSubtarget>(TargetTriple, CPU, FS, *this, isLittle,
- F.hasMinSize());
+ F.hasMinSize());
if (!I->isThumb() && !I->hasARMOps())
F.getContext().emitError("Function '" + F.getName() + "' uses ARM "
"instructions, but the target does not support ARM mode execution.");
+
+ if (I->isTargetHardFloat() && !I->hasFPRegs())
+ F.getContext().emitError("The hardfloat ABI is enabled, but the target "
+ "lacks floating-point registers.");
}
return I.get();
|
b0e9c78
to
b73e1b3
Compare
clang tests are passing now, but some llvm tests are not, e.g., invocations of |
It looks like there is already a warning for this in clang, it only triggers from -mfloat-abi=hard though: https://godbolt.org/z/dcaz8had4. Could it be made to work with any hard-float env? And maybe be made an error down-gradable to a warning? Generally clang-level warnings/errors are more user friendly then the llvm-level errors (but both may be useful for other frontends). |
That seems like a decent improvement to the clang warning, but the backend behavior of still emitting the VFP ABI tag in this case seems like a bug, regardless of the frontend, so IMO codegen should be aborted in this situation. |
Whatever we choose to do I think clang and llvm should be consistent. There's no point in clang giving a warning if llvm is going to give an error. I can see two positions most clearly:
Personally I don't have a problem with clang giving an error in this case as that is what GCC does. Appendix, some empirical data:
Clang -mfloat-abi
Actual behaviour is soft-float but with incorrect build attributes. Clang --target=arm-none-eabihf
Actual behaviour is soft-float but with incorrect build attributes. |
My preference would be to remove the clang warning and add the LLVM error so that other frontends can benefit from this checking, as well as the LLVM tools themselves. The LLVM test failures reveal a few places where this issue seems to be ignored. |
What would be really nice is if the LLVM backend could tell the frontend about the error rather than reporting it itself, that way frontends can use their nice error reporting code but don't have to re-implement the error detection logic. :) Anyway, that's a different issue... |
b73e1b3
to
5e916c3
Compare
I've updated this PR to remove the aforementioned |
5e916c3
to
4bd5ec3
Compare
…sed. Currently, compiling for an eabihf target with a CPU lacking floating-point registers will silently use the soft-float ABI instead, even though the Arm attributes section still has Tag_ABI_VFP_args: VFP registers, which leads to silent ABI mismatches at link time. Update all ARM tests that were using an affected combination to enable the necessary FPU features or use a soft-float ABI. [clang] Remove the warning from clang that detected this case only if -mfloat-abi=hard or -mhard-float were specified explicitly. Fixes llvm#110383.
4bd5ec3
to
9903206
Compare
All unit tests are now updated and passing and this change is ready for review. Thanks! |
Friendly bump :) @davemgreen @smithp35 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not yet convinced that llvm is the right place for this error message. There's quite a lot of test changes that I presume needed to make and at least the LTO use case looks like we don't want to require extra information. I'd also prefer this to be a front-end error/warning as this is a front-end/source-level mistake, with better diagnostic control options available.
I think if it is implemented in llvm then it will need to use the per-function subtarget to not risk fallout with LTO builds.
Apologies I'm probably not particularly enthusiastic about the approach. I've not worked in this area for some time so I'm happy for other reviewer's opinions to take precedence.
|
||
if (!I->isThumb() && !I->hasARMOps()) | ||
F.getContext().emitError("Function '" + F.getName() + "' uses ARM " | ||
"instructions, but the target does not support ARM mode execution."); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC [*] TargetMachine::getSubtargetImpl is the "global" subtarget. It has been a long time since I've looked at the area so I could be wrong.
Putting the warning here makes me nervous as individual functions have their own subtargets. It is common for LTO to rely entirely on the per function subtarget. I can remember an Android problem that was caused by using the global subtarget for a per-function property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getSubtargetImpl
takes a const Function &
as a parameter, and looks at that function's attributes, and can give per-function diagnostics for the thumb check, so I believe it is not the "global" subtarget.
@@ -459,9 +459,6 @@ def warn_drv_assuming_mfloat_abi_is : Warning< | |||
def warn_drv_unsupported_float_abi_by_lib : Warning< | |||
"float ABI '%0' is not supported by current library">, | |||
InGroup<UnsupportedABI>; | |||
def warn_drv_no_floating_point_registers: Warning< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a build break for anyone that is explicitly referring to this warning.
I'm not sure what the protocol for removing warnings is in clang. For example are they release noted/deprecated or just left in and ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-Wunsupported-abi
still exists (it is a group containing several warnings). The only thing that is removed here is the particular manifestation of this warning for the ARM target. It should only be a build breakage if -Wunsupported-abi
were removed completely.
There's a lot of complicated logic required to determine, for each architecture, which target features affect the ABI in which ways. Are you suggesting that logic should be duplicated across all frontends? That seems like a huge waste of effort to me. Most frontends are inevitably going to get it subtly wrong, so in the end we have a ton more bugs than we did if there was a central location where such know-how could be properly encoded, and where there are enough people that can confidently answer ABI questions like this. |
It is a tradeoff. Yes it means redoing effort in front-ends, but it can mean a better experience for the end-user, and to me that is often more important than the work that we do as toolchain developers. I'm not saying I've definitely got the balance right in this case which is why I invite more opinions :-) An error in the backend does not have a lot of context. In some cases it may end up triggering on a large LTO build without any source file information given in the error message. |
Yeah ideally LLVM would provide APIs where frontends can explicitly query that kind of information, and then provide nice errors with context. What this PR does is only the second-best thing, in my eyes. But very few frontends are able to have an expert for each and every architecture LLVM supports, so "do it in the frontend" is often simply not an option. |
LLVM not checking for this leads to it producing incorrect outputs given certain inputs and arguments when using |
FWIW, virtually all of the test changes were due to cases where LLVM would silently use soft-float despite the hard-float ABI being explicitly requested, which is exactly the incorrect behavior that this PR is meant to fix. The total number of affected test cases is fairly small.
The direct invocation of llvm-lto case without target information is one I will look into; however, like the others, this change was made because LLVM was generating code in a manner where the hard-float ABI was requested but couldn't be fulfilled. Anywhere that might be happening can lead to unsound ABI-mixing. To me the question is whether the equivalent of that command is invoked internally without the correct target information in typical usage of LTO driven by clang/lld. I'm inclined to say no, based on there being zero other test failures related to this change, but I will do some digging to understand the situation. |
@ostannard or @davemgreen any other feedback on this change? |
I agree with the previous comments from @davemgreen and @smithp35, user-facing diagnostics should always be emitted by clang, so that cases like LTO work correctly. If you want to reduce the duplication of the code which works out what ABIs are compatible (which I think would be a good idea), you could probably put that code in |
I agree but was having trouble putting it into words. I don't have a reference I can put my hands on, but we have generally considered the llvm error messages to be a poor substitute to those produced by clang and have often gone the other way, converting llvm errors to clang errors. I would think the best interface here is a error/warning that is downgradable to a warning/silent and that is difficult to do in llvm at the moment with the way the errors are setup. They are mostly reserved for things that have gone very wrong or cannot be supported. It would be good to share this kind of logic where we can though. Either by building a proper way for the backends to report errors to the frontends, or for the logic to be shared in TargetParser. |
I assume you didn't also add those errors to all the other frontends, so every time you do this (assuming you actually remove the LLVM error at that point) you risk introducing bugs into other frontends (if they were relying on LLVM to reject some nonsensical situations). IMO this should be done with more consideration of non-clang frontends. |
As a Rust compiler dev, this part of LLVM feels extremely fragile. If we do anything wrong, LLVM will just use some ABI, and things will keep working for some situations, but really we have subtle ABI bugs that will eventually bite our users. The rules we have to follow are not documented anywhere (to my knowledge), and require target-specific expertise -- even if we get the x86 part right, we'll need a PowerPC expert to also get soft-float handling right for that target. It's not a pleasant experience at all. Having LLVM be more robust, and throw errors rather than being like "yeah let's just continue, I'm sure it's going to be fine", would go a long way towards making LLVM easier to use and more reliable as a backend. It is concerning if attempts at improving the robustness of LLVM in this space are being blocked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay in responding.
I don't have much to add over davemgreen and ostannard.
Leaving aside whether this should be an error in the front or back-end. As mentioned before I haven't worked in this area for a long time, and there are enough test changes here that make me nervous enough that this is going to break some existing use cases. Some do look like sloppy triple selection, although a lot of the tests don't use any floating point so it could be seen as harsh to fault them. If I understood the area better that wouldn't be a problem, but as I'm not enthusiastic about moving a diagnostic to the backend, this isn't a change that I am best placed to approve. I won't object if there's a consensus in the other direction.
One use case I'd want to keep working is where we have two functions in the same object that are hardfp with hardware registers, and soft (with or without hardware registers). Something like:
__attribute__((target("arch=cortex-m33"))) __attribute__((pcs("aapcs-vfp"))) float f1(float a, float b) {
return a + b;
}
__attribute__((target("arch=cortex-m0"))) __attribute__((pcs("aapcs"))) float f2(float a, float b) {
return a + b;
}
There are some source files that do run-time selection based on hardware detection.
@@ -16,7 +16,7 @@ | |||
|
|||
// RUN: %clang --target=arm-none-eabi -mcpu=cortex-m33 -mfloat-abi=hard -O1 %s -flto=thin -c -o %t.call_thin.bc -DCALL_LIB | |||
// RUN: %clang --target=arm-none-eabi -mcpu=cortex-m33 -mfloat-abi=hard -O1 %s -flto=thin -c -o %t.define_thin.bc -DDEFINE_LIB | |||
// RUN: llvm-lto2 run -o %t.lto_thin -save-temps %t.call_thin.bc %t.define_thin.bc \ | |||
// RUN: llvm-lto2 run --mcpu=cortex-m33 --float-abi=hard -o %t.lto_thin -save-temps %t.call_thin.bc %t.define_thin.bc \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was this particular test update that makes me most nervous about the change.
IIUC https://reviews.llvm.org/D89573 after a fix for https://bugs.llvm.org/show_bug.cgi?id=45524
From disassembling the bitcode the module has triple thumbv8m.main-unknown-none-eabihf and have target features that include the necessary hardware floating point registers.
The llvm-dis output from lto also looks like it preserves these. I would not expect any additional flags to be necessary here.
The following example is very common and we would need to avoid requiring additional mcpu and float-abi flags in the LTO step.
clang --target=arm-none-eabi -mcpu=cortex-m33 -mfloat-abi=hard -c -flto file1.c
clang --target=arm-none-eabi -mcpu=cortex-m33 -mfloat-abi=hard -c -flto file2.c
clang file1.o file2.o -o file.exe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug into this further as promised, and the error goes away if you just remove the call to printf
from the test, which is otherwise expected to be replaced with a call to putchar
. This means that the llvm-lto2
invocation is eventually creating a new subtarget in which the target features and ABI are not inherited from the settings of either input file, and because the ones it lands on are incompatible, an error is raised. Pre-existing functions that were generated with a specific ABI and features are preserved as you would expect.
Is this possibly because, when generating code for builtins, the lto pass does actually need the caller to specify the target and features those builtins should be generated with respect to, and if it needs to insert a call to a builtin that it is providing (with default / no target features, because none were given) into a context where the ABI is set by a caller that came from the input which requires some target features, this leads to the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test this theory, I had the error output the function name just like the nearby check for thumb does, and this is what I get when running this test unmodified:
The hard-float ABI is enabled for function 'llvm.lifetime.start.p0', but the target lacks floating-point registers.
This means that this function was being generated during LTO itself with an invalid (feature set, ABI) combination. Depending on what this function's parameters are, that could lead to incorrect behavior, so in general, it may not be safe to LTO without enough features enabled to support the ABI that the compiled inputs use, and it is good for LLVM to complain about this rather than potentially emit erroneous code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the investigation. I think that this use case of LTO needs to work without an error as while we can provide the target for this specific test case, it may cause other uses of LTO without a target on the linker driver to fail. This may mean that we have to fix this in LTO.
From git annotate the test case was added in https://reviews.llvm.org/D89573 ostensibly to make sure that if LTO calls a C function it uses the correct ABI.
Looking at llvm.lifetime.start this is a lifetime marker intrinsic https://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic . IIUC lifetime markers are for communicating lifetimes of objects to optimisers, so while they influence code-generation they don't get code generated for them.
It may be an area where the subtarget for the function isn't being passed through correctly, although in the case of lifetime markers it is benign. This is an area that needs a lot more in depth expertise of LLVM and LTO than I posess. I can ask around internally to see if anyone is willing to take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making the error only fire if any of the function's argument types or return types are floating-point? That would eliminate the need for this test change and probably several others that weren't actually using floating-point, and be more defensible in that if you encounter such a function with the ABI/feature combination being checked for, it will generate incorrect code for it if not for this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be a good way forward. That would help reduce the chances of this breaking some existing projects.
That would then leave us with the debate about front-end and back-end error messages. I've been thinking about how to get past the differences of opinion here. My thoughts so far:
- Make the clang warning an error (to match GCC) and make sure it triggers for the target and the -mfloat-abi=hard case. Then clang users will see a front-end error and shouldn't see the backend error unless they've messed about behind the compilers back with target attributes. Other front-ends can choose their own policy.
- Make the back-end error opt-in, with clang opting out and retaining the existing clang warning. That retains the existing behaviour for clang. I personally would prefer clang/gcc to behave the same way and give an error message.
While I think that either of those would take care of user facing concerns, there's a possible objection of code complexity, but I'd have to leave that in the hands of the code-owners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted this approach, but there's an issue due to how the getSubtargetImpl
function works. It essentially maintains a map of subtargets it's seen before, keyed by a string of cpu + feature list, and only does the error checking when it sees a new combination, otherwise returns the subtarget that's already been created for the given cpu + feature list key. What this means is that if a non-fp-using function is encountered first in a translation unit, then the check passes because its use of inconsistent ABI+feature is harmless, but subsequent functions that should be flagged will be ignored because the map already has a subtarget for them.
Any suggestions on how to deal with this would be appreciated, or if there is a maintainer for this specific subsystem who could provide guidance here that would also be helpful.
I can confirm that this does not emit an error. |
I am not exactly sure how LLVM developers are imagining that frontend developers determine that they should diagnose a situation and provide a friendlier error? The primary motivation we have is that we notice that LLVM vomits up a nasty stacktrace in some situation, and then we figure it out and cover up the sewer hole that people could otherwise fallthrough to. I suppose this also works if LLVM developers use their mystical powers to instantly transmit an awareness of all the situations every frontend developer should be diagnosing and erroring on. Was there a psychic mind meld I missed? Because I had to spend an entire night and day going over all the edge cases for the interrupt ABIs that we "support" in Rust, instead of merely possessing mystical knowledge about which cases the frontend should catch and error on. |
if (I->isTargetHardFloat() && !I->hasFPRegs()) | ||
F.getContext().emitError("The hard-float ABI is enabled, but the target " | ||
"lacks floating-point registers."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to check the target triple and do automatic hard float detection (in ARMTargetMachine.isTargetHardFloat
). Shouldn't this instead check Options.FloatABIType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. I initially avoided that because one of the values is Default
which is meant to defer to the triple, but in the constructor of ARMBaseTargetMachine
it sets the internal Options.FloatABIType
to one or the other based on the passed in option and the triple, so it should be correct to just use that. I was testing with things like --target=arm-none-eabi -mcpu=cortex-m0 -mfloat-abi=hard
and seeing the error trigger, so something else in the frontend may be generating a different triple that makes this "work".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I remember now where I got stuck on this before. I need to obtain this property on the ARMSubtarget
rather than the ARMBaseTargetMachine
, but neither of them expose Options
publicly, and ARMSubtarget
just invokes isTargetHardFloat
on its TM
member, without accounting for its own Options
, even though they are copied from the ARMBaseTargetMachine
when the subtarget is created, which should be set to either hard or soft, based on the value that was set in ARMBaseTargetMachine
's constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ARMSubtarget.isTargetHardFloat
seems like a footgun, I hope the rest of the backend properly has Options.FloatABIType
override the target default.
RISC-V has similar checks here: llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp Lines 88 to 100 in ed572f2
So maybe the ARM checks could be added in a similar place? RISC-V handles ABI variants in a very clean way, and it does seem to work in practice too, so it'd be a good model for other architectures to follow. |
I could try this approach here and see how it goes. |
Currently, compiling for eabihf with a CPU lacking floating-point
registers will silently use the soft-float ABI instead, even though the
Arm attributes section still has Tag_ABI_VFP_args: VFP registers, which
leads to silent ABI mismatches at link time.
Fixes #110383.