-
Notifications
You must be signed in to change notification settings - Fork 15.7k
[ARM] enable FENV_ACCESS pragma support for hard-float targets #137101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-clang Author: Erik Enikeev (Varnike) ChangesAdded support for FENV_ACCESS pragma on hard-float ARM platforms. Also changes were made to clang/test/Parser/pragma-fp-warn.c so that for thumbv7a only the soft-float-abi target case is checked. Full diff: https://github.com/llvm/llvm-project/pull/137101.diff 2 Files Affected:
diff --git a/clang/lib/Basic/Targets/ARM.cpp b/clang/lib/Basic/Targets/ARM.cpp
index ca2c1ffbb0eb7..e3ab6e9abf78a 100644
--- a/clang/lib/Basic/Targets/ARM.cpp
+++ b/clang/lib/Basic/Targets/ARM.cpp
@@ -363,6 +363,8 @@ ARMTargetInfo::ARMTargetInfo(const llvm::Triple &Triple,
: "\01mcount";
SoftFloatABI = llvm::is_contained(Opts.FeaturesAsWritten, "+soft-float-abi");
+ if (!SoftFloatABI)
+ HasStrictFP = true;
}
StringRef ARMTargetInfo::getABI() const { return ABI; }
diff --git a/clang/test/Parser/pragma-fp-warn.c b/clang/test/Parser/pragma-fp-warn.c
index c52bd4e4805ab..f743cb87997dc 100644
--- a/clang/test/Parser/pragma-fp-warn.c
+++ b/clang/test/Parser/pragma-fp-warn.c
@@ -1,6 +1,6 @@
// RUN: %clang_cc1 -triple wasm32 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
-// RUN: %clang_cc1 -triple thumbv7 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
+// RUN: %clang_cc1 -triple thumbv7 -fsyntax-only -target-feature +soft-float-abi -Wno-unknown-pragmas -Wignored-pragmas -verify %s
// RUN: %clang_cc1 -DEXPOK -triple aarch64 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
// RUN: %clang_cc1 -DEXPOK -triple x86_64 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
// RUN: %clang_cc1 -DEXPOK -triple systemz -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
|
|
I believe the backend would still need work to make sure this is supported, which has not been done yet. I was expecting it to fail more noisily, but it appears the strict nodes are lowered to generic nodes. That doesn't mean that strict-fp is supported by the Arm backend, as it might not handle the instructions correctly at the moment (moving instructions past one another, for example). |
|
@john-brawn-arm did the work for AArch64, I'm not sure if he would have an idea how much work it would involve for the Arm backend. |
|
It's been several years since I looked at this, but from my notes what's needed before setting HasStrictFP=true is:
|
|
Simple example where setting HasStrictFP=true without doing the above gives wrong results: With --target=aarch64-none-elf we get a divide instruction and so return FE_DIVBYZERO, with --target=arm-non-eabi we get no divide instruction and return 0. |
f57085e to
1b9b201
Compare
0231231 to
6010639
Compare
|
Hello! Thank you for the comments and sorry for the delayed response. I've now returned to this task and plan to work towards full support for strict fp on hard-float ARM targets. Based on @john-brawn-arm's example and suggestions, I was able to add support for STRICT_FDIV. With such changes the example is compiled correctly (with divide instruction). Could you please review these changes and let me know if you spot any issues? Are there any additional cases that require special handling? Also, is there a test suite that would be sufficient to cover most strict fp usage scenarios? If these changes are acceptable, I plan to proceed with support for the remaining strict fp ops. |
|
Hi - I went looking at out internal tracker for what happened the last time we enabled strictfp for AArch64. This was the list of patches mentioned, there might have been some more either before these or some minor fixups. We would probably need something similar for Arm - it is worth not underestimating the amount of work we would need to do, but if you are happy to try and push it through we would need to handle all the strict variants of the instructions and add mayRaiseFPException/FPCR to the relevant places. |
|
Now I have an approximate volume of work and plan to try to do it. I have provided these edits rather as an example of correction for a particular case (not taking tests into account), by analogy with which other cases can be considered. Therefore, I would like to know how correct and complete they are. |
|
@john-brawn-arm any ideas? It looks OK to me. (It would need to handle all instructions before HasStrictFP was added though). |
|
Hi, I added the processing of remaining strict fp ops for VFP and adjusted the tests. Could you please review these changes and tell what else needs to be done? I also have a question about whether it's necessary to change the instruction selection patterns for MVE and NEON (as was done for VFP), since, as far as I understand, they do not provide full ieee compliance on all ARM platforms. Regarding HasStrictFP setting, I have left this commit for now; however, it can only be considered once the conditions you specified are met. |
|
@john-brawn-arm, @davemgreen can you please take a look at the patch? |
davemgreen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it looks sensible. We will need strict-fp tests for the new operations now supported, and there is quite a bit going on in this patch. Neon and MVE sometimes don't implement ieee-fp exactly, I would need to remind myself of the details.
I would not expect the tests to change (like llvm/test/CodeGen/Thumb2/mve-fmas.ll) - strictfp should not make the normal case worse. Do you know what might be causing those to be different?
Will it be enough to add strict fp tests similar to those made for AArch64? Regarding the changes in some tests (e.g. Thumb2/mve-fmas.ll): they arise after updating patterns in ARMInstrVFP.td, which in turn reorders some instructions. For VFP tests, as far as I understand, such strict fp behavior can be expected, but I’m unsure to what extent the MVE tests should be affected. |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Yeah I think so for the most part. There is the added complication of soft-fp but it should mostly be a case of testing all the strict-fp operations under a couple of configs to make sure everything works OK.
I think we need to teach it that the fpscr bits of nofpexcept instructions and |
|
@davemgreen, could this be submitted? |
|
@john-brawn-arm do you know what else is needed to support strict-fp? I think we probably need to do something about vector operations - both MVE and Neon. (Which might be annoying if we need to add new intrinsics for all of them as they do not match the expected behaviour of the fp instructions). |
Looking at what happens if I do I think this is is the correct thing to do, as both NEON and MVE instructions ignore the FPSCR exception and rounding control bits and treat them as if they were zero, so to get the correct strict behaviour we need to use scalar instructions. I think we do need to have a test of this i.e. have a version of the fp-intrinsics-vector.ll test in llvm/test/CodeGen/ARM. |
Yeah I believe we might need to scalarize strict-fp vector operations. The Vector operations for Armv7 behave in certain ways that make them difficult outside of fast-math. The intrinsics produced from C need to still produce vector operations though https://godbolt.org/z/99vKe7esc. |
For now in strict fp mode such C intrinsics will be scalarized.
As far as I understand, in the backend it is not possible to determine that a given strict fp intrinsic was originally produced from С and keep it in vector form (e.g. |
|
ping |
|
I put up #169156 to correctly handle the vadd/vsub/vmul intrinsics from C, with an explanation in the summary of how the instructions behave. If people are happy with that approach I can work on adding the others for MVE too. |
|
I see, great thanks! |
|
Did you have a patch where we set |
|
No, should I add it? |
|
We might need it to stop the automatic mutation of strict-fp nodes to non-strict equivalents. All the targets that support strict-fp have it set to true, so I assume we should be doing the same. It looks like some of the existing tests fail with it on. |
|
Set If necessary, this can be moved into a separate PR. |
davemgreen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If necessary, this can be moved into a separate PR.
Yeah that might be useful - so we can add this part before turning the whole feature on. There are some strict-fp vector operations that could theoretically be supported - I'm not sure yet if they are very worth adding or not.
| @@ -0,0 +1,1515 @@ | |||
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6 | |||
| ; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-a55 %s -disable-strictnode-mutation -o - | FileCheck %s --check-prefixes=CHECK | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove -mcpu=cortex-a55 and add -mattr=+neon+etc. And remove -disable-strictnode-mutation now?
Can remove --check-prefixes=CHECK too.
|
|
||
| attributes #0 = { strictfp } | ||
|
|
||
| declare <4 x float> @llvm.experimental.constrained.fadd.v4f32(<4 x float>, <4 x float>, metadata, metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove all of these nowadays.
| setOperationAction(ISD::STRICT_FMINNUM, MVT::f16, Legal); | ||
| setOperationAction(ISD::STRICT_FMAXNUM, MVT::f16, Legal); | ||
|
|
||
| if (Subtarget->hasVFPv3()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FullFP16 should imply VFPv3. The fpext/fptrunc for f16 was added in +fp16? fpext/fptrunc usually go through custom, but for legal operations should this be next to the if (!Subtarget->hasFP16()) check below? Sometimes it needs to go through Custom because there isn't a good way to represent two types.
|
Made fixes based on comments. Also corrected lowering of
For review convenience, I’ve left the changes here for now. If everything looks good, I’ll create a separate PR and revert this one to its original state. |
| setOperationAction(ISD::STRICT_FP_ROUND, MVT::f32, Legal); | ||
| setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f64, Legal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these always work with fptrunc f128 -> f32 and fpext f16 -> f64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Corrected for
fpext f16->f64case
according to checks in LowerFP_EXTEND:
assert((!Subtarget->hasFP64() || !Subtarget->hasFPARMv8Base()) &&
"With both FP DP and 16, any FP conversion is legal!");
if both features are enabled we can mark it Legal and mark Custom otherwise.
- For
f128->f32casefptruncwill be softened anyway, since f128 is not legal.
| @@ -0,0 +1,1571 @@ | |||
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6 | |||
| ; RUN: llc -mtriple=armv7a-- -mattr=+neon,+vfp4 %s -o - | FileCheck %s | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this armv7a-none-eablhf can help make the tests a little cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed, but it didn't affect anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - that was a typo and should have been armv7a-none-eabihf.
|
I put together these for the various MVE intrinsics There is one last one for fprtosi and sitofp that I still need to make (it was being difficult as it involves a type parameter). The same is probably needed for Neon too, to prevent the intrinsics from scalarizing. If you can remove the clang change from this commit (or create a new one) I think it looks OK. |
|
Moved changes to #170136. This PR has been reverted to its original state. |
…g for several strict ops (#170136) Changes in this PR were discussed and reviewed in llvm/llvm-project#137101.
…al strict ops (llvm#170136) Changes in this PR were discussed and reviewed in llvm#137101.
…al strict ops (llvm#170136) Changes in this PR were discussed and reviewed in llvm#137101.
|
The MVE side is now all committed, making sure that we don't scalarize the intrinsics. There was a question about whether we should be using target specific intrinsics for it, but I believe we need to as we can't really convert the constrained intrinsics into mve instructions. I think we need something similar for Neon, at least to stop the intrinsics from scalarizing. I've not looked into how many different types of nodes that would be. A few other things I have noticed:
|
Addressed this in #173666.
Could you please provide more details about the issue? |
Added support for FENV_ACCESS pragma on hard-float ARM platforms. Also changes were made to clang/test/Parser/pragma-fp-warn.c so that for thumbv7a only the soft-float-abi target case is checked.