Skip to content

Conversation

@Varnike
Copy link
Contributor

@Varnike Varnike commented Apr 24, 2025

Added support for FENV_ACCESS pragma on hard-float ARM platforms. Also changes were made to clang/test/Parser/pragma-fp-warn.c so that for thumbv7a only the soft-float-abi target case is checked.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:ARM clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 24, 2025

@llvm/pr-subscribers-llvm-globalisel
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-clang

Author: Erik Enikeev (Varnike)

Changes

Added support for FENV_ACCESS pragma on hard-float ARM platforms. Also changes were made to clang/test/Parser/pragma-fp-warn.c so that for thumbv7a only the soft-float-abi target case is checked.


Full diff: https://github.com/llvm/llvm-project/pull/137101.diff

2 Files Affected:

  • (modified) clang/lib/Basic/Targets/ARM.cpp (+2)
  • (modified) clang/test/Parser/pragma-fp-warn.c (+1-1)
diff --git a/clang/lib/Basic/Targets/ARM.cpp b/clang/lib/Basic/Targets/ARM.cpp
index ca2c1ffbb0eb7..e3ab6e9abf78a 100644
--- a/clang/lib/Basic/Targets/ARM.cpp
+++ b/clang/lib/Basic/Targets/ARM.cpp
@@ -363,6 +363,8 @@ ARMTargetInfo::ARMTargetInfo(const llvm::Triple &Triple,
                            : "\01mcount";
 
   SoftFloatABI = llvm::is_contained(Opts.FeaturesAsWritten, "+soft-float-abi");
+  if (!SoftFloatABI)
+    HasStrictFP = true;
 }
 
 StringRef ARMTargetInfo::getABI() const { return ABI; }
diff --git a/clang/test/Parser/pragma-fp-warn.c b/clang/test/Parser/pragma-fp-warn.c
index c52bd4e4805ab..f743cb87997dc 100644
--- a/clang/test/Parser/pragma-fp-warn.c
+++ b/clang/test/Parser/pragma-fp-warn.c
@@ -1,6 +1,6 @@
 
 // RUN: %clang_cc1 -triple wasm32 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
-// RUN: %clang_cc1 -triple thumbv7 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
+// RUN: %clang_cc1 -triple thumbv7 -fsyntax-only -target-feature +soft-float-abi  -Wno-unknown-pragmas -Wignored-pragmas -verify %s
 // RUN: %clang_cc1 -DEXPOK -triple aarch64 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
 // RUN: %clang_cc1 -DEXPOK -triple x86_64 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
 // RUN: %clang_cc1 -DEXPOK -triple systemz -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s

@davemgreen
Copy link
Collaborator

I believe the backend would still need work to make sure this is supported, which has not been done yet. I was expecting it to fail more noisily, but it appears the strict nodes are lowered to generic nodes. That doesn't mean that strict-fp is supported by the Arm backend, as it might not handle the instructions correctly at the moment (moving instructions past one another, for example).

@davemgreen
Copy link
Collaborator

@john-brawn-arm did the work for AArch64, I'm not sure if he would have an idea how much work it would involve for the Arm backend.

@john-brawn-arm
Copy link
Collaborator

It's been several years since I looked at this, but from my notes what's needed before setting HasStrictFP=true is:

  • Instruction selection patterns for strict fp ops. This is probably just changing e.g. "fadd" to "any_fadd" as I expect for most or all of the instruction selection patterns work fine with strict fp.
  • Instructions that can raise fp exceptions need to have mayRaiseFPException=1. Without this transformations can move around or just remove instructions that raise fp exceptions.
  • We need proper modelling of reads/write of the FPSCR rounding control bits for controllable rounding modes to be handled correctly.

@john-brawn-arm
Copy link
Collaborator

Simple example where setting HasStrictFP=true without doing the above gives wrong results:

int fetestexcept( int excepts );
#define FE_DIVBYZERO 0x04

int fn(float x) {
#pragma STDC FENV_ACCESS ON
  x / 0;
  return fetestexcept(FE_DIVBYZERO);
}

With --target=aarch64-none-elf we get a divide instruction and so return FE_DIVBYZERO, with --target=arm-non-eabi we get no divide instruction and return 0.

@Varnike Varnike force-pushed the main branch 2 times, most recently from f57085e to 1b9b201 Compare June 9, 2025 16:08
@Varnike Varnike force-pushed the main branch 2 times, most recently from 0231231 to 6010639 Compare July 16, 2025 11:18
@Varnike
Copy link
Contributor Author

Varnike commented Jul 16, 2025

Hello!

Thank you for the comments and sorry for the delayed response.

I've now returned to this task and plan to work towards full support for strict fp on hard-float ARM targets.

Based on @john-brawn-arm's example and suggestions, I was able to add support for STRICT_FDIV. With such changes the example is compiled correctly (with divide instruction). Could you please review these changes and let me know if you spot any issues? Are there any additional cases that require special handling? Also, is there a test suite that would be sufficient to cover most strict fp usage scenarios?

If these changes are acceptable, I plan to proceed with support for the remaining strict fp ops.

@davemgreen
Copy link
Collaborator

Hi - I went looking at out internal tracker for what happened the last time we enabled strictfp for AArch64. This was the list of patches mentioned, there might have been some more either before these or some minor fixups.

94843ea7d7e5 [AArch64] Make machine combiner patterns preserve MIFlags
6f53960d6416 [AArch64] Adjust machine-combiner-reassociate.mir test
bca998ed3c9a [AArch64] Generate fcmps when appropriate for neon intrinsics
0d8092dd485a [AArch64] Fix legalization of v1f64 strict_fsetcc and strict_fsetccs
d4342efb6959 [AArch64] Add instruction selection for strict FP
9d68ed08178d [AArch64] Allow strict opcodes in fp->int->fp patterns
b670da798d35 [AArch64] Allow strict opcodes in indexed fmul and fma patterns
d916856bee11 [AArch64] Allow strict opcodes in faddp patterns
8e17c9613f36 [AArch64] Add some missing strict FP vector lowering
bbd7eac800e6 [AArch64] Remove an unused variable in my previous patch
12c1022679d4 [AArch64] Lowering and legalization of strict FP16
1b1466c34669 [AArch64] Adjust aarch64 constrained intrinsics tests and un-XFAIL
27a8735a444f [AArch64] Add mayRaiseFPException to appropriate instructions
88ac25b357aa [MachineCSE] Allow PRE of instructions that read physical registers
2d8c1597e51c [MIRVRegNamer] Avoid opcode hash collision
49510c50200c [AArch64] Mark all instructions that read/write FPCR as doing so
9e3264ab2021 [FPEnv] Enable strict fp for AArch64 in clang

We would probably need something similar for Arm - it is worth not underestimating the amount of work we would need to do, but if you are happy to try and push it through we would need to handle all the strict variants of the instructions and add mayRaiseFPException/FPCR to the relevant places.

@Varnike
Copy link
Contributor Author

Varnike commented Jul 18, 2025

Now I have an approximate volume of work and plan to try to do it. I have provided these edits rather as an example of correction for a particular case (not taking tests into account), by analogy with which other cases can be considered. Therefore, I would like to know how correct and complete they are.

@davemgreen
Copy link
Collaborator

@john-brawn-arm any ideas? It looks OK to me. (It would need to handle all instructions before HasStrictFP was added though).

@Varnike
Copy link
Contributor Author

Varnike commented Aug 1, 2025

Hi,

I added the processing of remaining strict fp ops for VFP and adjusted the tests. Could you please review these changes and tell what else needs to be done? I also have a question about whether it's necessary to change the instruction selection patterns for MVE and NEON (as was done for VFP), since, as far as I understand, they do not provide full ieee compliance on all ARM platforms.

Regarding HasStrictFP setting, I have left this commit for now; however, it can only be considered once the conditions you specified are met.

@Varnike
Copy link
Contributor Author

Varnike commented Aug 6, 2025

@john-brawn-arm, @davemgreen can you please take a look at the patch?

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it looks sensible. We will need strict-fp tests for the new operations now supported, and there is quite a bit going on in this patch. Neon and MVE sometimes don't implement ieee-fp exactly, I would need to remind myself of the details.

I would not expect the tests to change (like llvm/test/CodeGen/Thumb2/mve-fmas.ll) - strictfp should not make the normal case worse. Do you know what might be causing those to be different?

@Varnike
Copy link
Contributor Author

Varnike commented Aug 13, 2025

We will need strict-fp tests for the new operations now supported, and there is quite a bit going on in this patch.

Will it be enough to add strict fp tests similar to those made for AArch64?

Regarding the changes in some tests (e.g. Thumb2/mve-fmas.ll): they arise after updating patterns in ARMInstrVFP.td, which in turn reorders some instructions. For VFP tests, as far as I understand, such strict fp behavior can be expected, but I’m unsure to what extent the MVE tests should be affected.

@github-actions
Copy link

github-actions bot commented Aug 14, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@davemgreen
Copy link
Collaborator

Will it be enough to add strict fp tests similar to those made for AArch64?

Yeah I think so for the most part. There is the added complication of soft-fp but it should mostly be a case of testing all the strict-fp operations under a couple of configs to make sure everything works OK.
Note that there is talk of changing how the constrained-fp intrinsics are represented in the IR https://discourse.llvm.org/t/rfc-change-of-strict-fp-operation-representation-in-ir/85021, but my understanding is that they will work the same for codegen.

Regarding the changes in some tests (e.g. Thumb2/mve-fmas.ll): they arise after updating patterns in ARMInstrVFP.td, which in turn reorders some instructions. For VFP tests, as far as I understand, such strict fp behavior can be expected, but I’m unsure to what extent the MVE tests should be affected.

I think we need to teach it that the fpscr bits of nofpexcept instructions and fpscr_nzcv do not interact and can be treated separately for scheduling. In general we need to be careful not to make perf worse for all the people who do not use strict-fp. Maybe in this case it is caused by the vmrs APSR_nzcv, fpscr instructions?

@Varnike
Copy link
Contributor Author

Varnike commented Aug 27, 2025

  • Managed to remove the connection between fpscr bits of nofpexcept instructions and fpscr_nzcv, now the tests like Thumb2/mve-fmas.ll do not change.
  • Added strict fp tests.
  • Function calls marked as possibly changing FPSCR (by analogy with https://reviews.llvm.org/D143001).

@Varnike
Copy link
Contributor Author

Varnike commented Nov 11, 2025

Once the necessary strict fp support has been added to the arm backend, HasStrictFP=true can be enabled.

@davemgreen, could this be submitted?

@Varnike Varnike requested a review from davemgreen November 12, 2025 10:04
@davemgreen
Copy link
Collaborator

@john-brawn-arm do you know what else is needed to support strict-fp?

I think we probably need to do something about vector operations - both MVE and Neon. (Which might be annoying if we need to add new intrinsics for all of them as they do not match the expected behaviour of the fp instructions).

@john-brawn-arm
Copy link
Collaborator

@john-brawn-arm do you know what else is needed to support strict-fp?

I think we probably need to do something about vector operations - both MVE and Neon. (Which might be annoying if we need to add new intrinsics for all of them as they do not match the expected behaviour of the fp instructions).

Looking at what happens if I do llc llvm/test/CodeGen/AArch64/fp-intrinsics-vector.ll -mtriple=arm-none-eabi -mcpu=cortex-a55, the strict vector operations are legalized into a sequence of strict scalar operations, which are then selected as normal. This is happening because we don't have any specific handling of the strict vector operations in the ARM target, and this is the default behaviour.

I think this is is the correct thing to do, as both NEON and MVE instructions ignore the FPSCR exception and rounding control bits and treat them as if they were zero, so to get the correct strict behaviour we need to use scalar instructions.

I think we do need to have a test of this i.e. have a version of the fp-intrinsics-vector.ll test in llvm/test/CodeGen/ARM.

@davemgreen
Copy link
Collaborator

Looking at what happens if I do llc llvm/test/CodeGen/AArch64/fp-intrinsics-vector.ll -mtriple=arm-none-eabi -mcpu=cortex-a55, the strict vector operations are legalized into a sequence of strict scalar operations, which are then selected as normal. This is happening because we don't have any specific handling of the strict vector operations in the ARM target, and this is the default behaviour.

I think this is is the correct thing to do, as both NEON and MVE instructions ignore the FPSCR exception and rounding control bits and treat them as if they were zero, so to get the correct strict behaviour we need to use scalar instructions.

I think we do need to have a test of this i.e. have a version of the fp-intrinsics-vector.ll test in llvm/test/CodeGen/ARM.

Yeah I believe we might need to scalarize strict-fp vector operations. The Vector operations for Armv7 behave in certain ways that make them difficult outside of fast-math.

The intrinsics produced from C need to still produce vector operations though https://godbolt.org/z/99vKe7esc.

@Varnike
Copy link
Contributor Author

Varnike commented Nov 14, 2025

The intrinsics produced from C need to still produce vector operations though https://godbolt.org/z/99vKe7esc.

For now in strict fp mode such C intrinsics will be scalarized.

I think this is is the correct thing to do, as both NEON and MVE instructions ignore the FPSCR exception and rounding control bits and treat them as if they were zero, so to get the correct strict behaviour we need to use scalar instructions.

As far as I understand, in the backend it is not possible to determine that a given strict fp intrinsic was originally produced from С and keep it in vector form (e.g. strict_fadd which will be generated in the given example). As a result, such intrinsics would also be scalarized. What is the correct way to handle this case?

@Varnike
Copy link
Contributor Author

Varnike commented Nov 20, 2025

ping

@davemgreen
Copy link
Collaborator

I put up #169156 to correctly handle the vadd/vsub/vmul intrinsics from C, with an explanation in the summary of how the instructions behave. If people are happy with that approach I can work on adding the others for MVE too.

@Varnike
Copy link
Contributor Author

Varnike commented Nov 24, 2025

I see, great thanks!

@davemgreen
Copy link
Collaborator

Did you have a patch where we set IsStrictFPEnabled=true in ARMTargetLowering::ARMTargetLowering?

@Varnike
Copy link
Contributor Author

Varnike commented Nov 25, 2025

No, should I add it?

@davemgreen
Copy link
Collaborator

We might need it to stop the automatic mutation of strict-fp nodes to non-strict equivalents. All the targets that support strict-fp have it set to true, so I assume we should be doing the same. It looks like some of the existing tests fail with it on.

@Varnike
Copy link
Contributor Author

Varnike commented Nov 25, 2025

Set IsStrictFPEnabled=true for ARM and fix lowering for several strict fp ops. Changes in fp-intrinsics-vector.ll are related to the corresponding scalar operations.

If necessary, this can be moved into a separate PR.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If necessary, this can be moved into a separate PR.

Yeah that might be useful - so we can add this part before turning the whole feature on. There are some strict-fp vector operations that could theoretically be supported - I'm not sure yet if they are very worth adding or not.

@@ -0,0 +1,1515 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-a55 %s -disable-strictnode-mutation -o - | FileCheck %s --check-prefixes=CHECK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove -mcpu=cortex-a55 and add -mattr=+neon+etc. And remove -disable-strictnode-mutation now?
Can remove --check-prefixes=CHECK too.


attributes #0 = { strictfp }

declare <4 x float> @llvm.experimental.constrained.fadd.v4f32(<4 x float>, <4 x float>, metadata, metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove all of these nowadays.

setOperationAction(ISD::STRICT_FMINNUM, MVT::f16, Legal);
setOperationAction(ISD::STRICT_FMAXNUM, MVT::f16, Legal);

if (Subtarget->hasVFPv3()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FullFP16 should imply VFPv3. The fpext/fptrunc for f16 was added in +fp16? fpext/fptrunc usually go through custom, but for legal operations should this be next to the if (!Subtarget->hasFP16()) check below? Sometimes it needs to go through Custom because there isn't a good way to represent two types.

@Varnike
Copy link
Contributor Author

Varnike commented Nov 25, 2025

Made fixes based on comments. Also corrected lowering of STRICT_FP16_TO_FP and STRICT_FP_TO_FP16.

Yeah that might be useful - so we can add this part before turning the whole feature on. There are some strict-fp vector operations that could theoretically be supported - I'm not sure yet if they are very worth adding or not.

For review convenience, I’ve left the changes here for now. If everything looks good, I’ll create a separate PR and revert this one to its original state.

Comment on lines 618 to 619
setOperationAction(ISD::STRICT_FP_ROUND, MVT::f32, Legal);
setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f64, Legal);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these always work with fptrunc f128 -> f32 and fpext f16 -> f64?

Copy link
Contributor Author

@Varnike Varnike Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Corrected for fpext f16->f64 case

according to checks in LowerFP_EXTEND:

  assert((!Subtarget->hasFP64() || !Subtarget->hasFPARMv8Base()) &&
         "With both FP DP and 16, any FP conversion is legal!");

if both features are enabled we can mark it Legal and mark Custom otherwise.

  1. For f128->f32 case fptrunc will be softened anyway, since f128 is not legal.

@@ -0,0 +1,1571 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
; RUN: llc -mtriple=armv7a-- -mattr=+neon,+vfp4 %s -o - | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this armv7a-none-eablhf can help make the tests a little cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed, but it didn't affect anything

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - that was a typo and should have been armv7a-none-eabihf.

@davemgreen
Copy link
Collaborator

I put together these for the various MVE intrinsics
#169156
#169771
#169798
#169797
#169795

There is one last one for fprtosi and sitofp that I still need to make (it was being difficult as it involves a type parameter). The same is probably needed for Neon too, to prevent the intrinsics from scalarizing. If you can remove the clang change from this commit (or create a new one) I think it looks OK.

@Varnike
Copy link
Contributor Author

Varnike commented Dec 1, 2025

Moved changes to #170136. This PR has been reverted to its original state.

davemgreen pushed a commit that referenced this pull request Dec 1, 2025
…al strict ops (#170136)

Changes in this PR were discussed and reviewed in
#137101.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Dec 1, 2025
…g for several strict ops (#170136)

Changes in this PR were discussed and reviewed in
llvm/llvm-project#137101.
kcloudy0717 pushed a commit to kcloudy0717/llvm-project that referenced this pull request Dec 4, 2025
…al strict ops (llvm#170136)

Changes in this PR were discussed and reviewed in
llvm#137101.
honeygoyal pushed a commit to honeygoyal/llvm-project that referenced this pull request Dec 9, 2025
…al strict ops (llvm#170136)

Changes in this PR were discussed and reviewed in
llvm#137101.
@davemgreen
Copy link
Collaborator

The MVE side is now all committed, making sure that we don't scalarize the intrinsics. There was a question about whether we should be using target specific intrinsics for it, but I believe we need to as we can't really convert the constrained intrinsics into mve instructions. I think we need something similar for Neon, at least to stop the intrinsics from scalarizing. I've not looked into how many different types of nodes that would be.

A few other things I have noticed:

  • ldexp does not seem to work for fp16 vectors https://godbolt.org/z/MW3jqx797. It should be able to scalarize.
  • There are a number of problems with the various min/max's, some that are present for both standard nodes and strictfp.
  • On systems without fullfp16, fp16 nodes do not lower at the moment https://godbolt.org/z/3qh3fbWjG. (For systems without fp64 we do manage to produce libcalls).

@Varnike
Copy link
Contributor Author

Varnike commented Dec 26, 2025

On systems without fullfp16, fp16 nodes do not lower at the moment https://godbolt.org/z/3qh3fbWjG. (For systems without fp64 we do manage to produce libcalls).

Addressed this in #173666.

There are a number of problems with the various min/max's, some that are present for both standard nodes and strictfp.

Could you please provide more details about the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:ARM clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category llvm:globalisel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants