support --fix-v4bx #50764

nickdesaulniers · 2021-08-09T20:29:34Z


Bugzilla Link	51422
Version	unspecified
OS	All
Blocks	#4440
CC	@arndb,@efriedma-quic,@MaskRay,@nathanchance,@smithp35

Extended Description

In order to link armv4 Linux kernel images with LLD, it's required that LLD support --fix-v4bx, otherwise we get the warning:

ld.lld: warning: lld uses blx instruction, no object with architecture supporting feature detected

IIUC, there's even a relocation type for this: R_ARM_V4BX.

ClangBuiltLinux/linux#964

efriedma-quic · 2021-08-09T21:49:46Z

There's two separate issues here:

Supporting armv4t.
Supporting armv4 without thumb.

Supporting armv4t requires some modifications to the relocation handling for calls; that's what the warning is about.

Supporting armv4 without thumb means we need to use a different sequence for function returns. Historically, this is a compiler thing, not a linker thing, but R_ARM_V4BX is a trick to allow writing code that works for both armv4t and armv4-without-thumb targets. --fix-v4bx essentially converts arm-mode armv4t code to armv4.

If you're not using --fix-v4bx, the linker doesn't really care about the difference between armv4t and armv4; it doesn't generate function returns, anyway.

MaskRay · 2021-08-09T23:26:57Z

(The documentation says the support is for >=v6.)

Isn't armv4 so obscure that it it quite irrelevant now?
Even if it isn't so obscure, are these users happy with their current toolchain and don't consider a switch?

efriedma-quic · 2021-08-10T03:10:29Z

armv5t has both the register and immediate forms of blx, I think.

I think the issue here is there are some legacy hardware enthusiasts who build Linux for very old hardware, like https://en.wikipedia.org/wiki/Risc_PC , so legacy configs targeting armv4 persist in the kernel source tree, and randconfig stumbles over them.

arndb · 2021-08-10T07:38:53Z

There are three classes of ARMv4 (not T) users that I can see from a kernel perspective:

RiscPC/sa110 has only one actual user (rmk) I'm aware of, and it has the added problem that the hardware does not support 32-bit memory accesses, so kernels are actually built for ARMv3 despite this being an ARMv4 processor. gcc-9 removed ARMv3 support a few years ago, and I see no reason to add this to clang. As soon as gcc-9 or higher is required for building kernels, this platform will be removed.
There are a handful of hobbyist users of legacy StrongARM machines that run ARMv4 kernels on SA1100 handhelds or Netwinder. We will probably discuss removing those as well when time is up for RiscPC, but they could remain for a few years longer.
There are two remaining embedded platforms based on Faraday FA526 cores that are deployed commercially in settings that require kernel updates: Cortina Gemini and MOXA Art. The hardware is old, but the kernel ports are well written and maintained, they will probably outlive a number of ARMv4T and ARMv5E platforms we currently support. It would be nice to be able to build kernels for these using clang.

For reference, we are still adding support for ARMv5E (ARM926E) SoC platforms, including SoCs designed as recently as 2020 (Microchip SAM9X60).
For ARMv4T, we currently support AT91RM9200, S3C24xx, i.MX1, EP72xx, EP93xx, OMAP1, and ARM Integrator based on ARM920T and ARM710T. Debian Buster (released in 2019) dropped support for these, but there is still some work going into the kernel for at least ep93xx, omap1 and s3c24xx, similar to the ARMv4/FA526 platforms.

smithp35 · 2021-08-10T08:40:25Z

On the LLD side I originally implemented v7-A and above then later added v5 as there was at least one person had some need for it, I think on the BSD side. It probably wouldn't be a huge amount of extra code and tests to support v4t but up until now there hasn't been anyone needing it in LLD. The assumption was that older hardware would already have toolchains they could use, and we could avoid extra complexity in the code.

For v4t support in lld, there are a few things we'd need to do:

We'd need a new v4t thunks as ldr pc, [address] cannot change state in v4t. The sequence becomes:
ldr ip, [address]
bx ip
As there is no BLX there is no Thunk sharing between Arm and Thumb state. The v4t Thunks isCompatibleWith would need to include only relocations from the same state.
ARM::needsThunk() would need a case for v4t that accounts for the lack of a BLX. I think it would be something like (pseudo code):
return isV4t || !inBranchRange(type, branchAddr, dst + a);

If the Thunks code has done its job properly then the relocation handling code should never be presented with an opportunity to use a BLX.

On the linker side R_ARM_V4BX is trivial to implement (just write Arm state mov pc, lr encoding to the location), but as Eli points out the compiler has to generate the relocation when targeting Arm v4.

If there is a need for support then it may be worth approaching Linaro TCWG. I could do it, but I've not got a lot of spare time so it might take a few weeks rather than a few days.

efriedma-quic · 2021-08-10T20:19:13Z

As there is no BLX there is no Thunk sharing between Arm and Thumb state.
The v4t Thunks isCompatibleWith would need to include only relocations from
the same state.

The first thing any v4t Thumb thunk will do is switch to ARM mode. The remaining code is the same, so it could be shared. Not sure that's worth doing, though.

efriedma-quic · 2021-08-10T20:22:38Z

Also, the kernel doesn't actually have any thumb code, so it doesn't actually need v4t thunks, I think. It would be enough to just suppress the warning if there aren't any Thumb symbols.

smithp35 · 2021-08-11T08:54:05Z

The first thing any v4t Thumb thunk will do is switch to ARM mode. The
remaining code is the same, so it could be shared. Not sure that's worth
doing, though.

It is possible, although it would mean two entry points for thunk that we'd need to select between per relocation type. My instinct would be to keep it simple unless there was a real need to minimise the code-size.

Also, the kernel doesn't actually have any thumb code, so it doesn't
actually need v4t thunks, I think. It would be enough to just suppress the
warning if there aren't any Thumb symbols.

That would be possible, although at the moment that would require LLD to scan all the local symbols of every input object looking for the absence of $t. Although only programs that could trigger the warning would need to run the check. We can't use the BuildAttributes as objects built with -marm still have Tag_THUMB_ISA_use=1 (Thumb instructions were permitted to be used)

Would a suppression of the warning be acceptable to the kernel folk? Or are there v4t kernels that use Thumb code that need supporting?

arndb · 2021-08-11T12:59:32Z

That would be possible, although at the moment that would require LLD to
scan all the local symbols of every input object looking for the absence of
$t. Although only programs that could trigger the warning would need to run
the check. We can't use the BuildAttributes as objects built with -marm
still have Tag_THUMB_ISA_use=1 (Thumb instructions were permitted to be used)

Would a suppression of the warning be acceptable to the kernel folk? Or are
there v4t kernels that use Thumb code that need supporting?

I'm sure there is no thumb code in pre-v7 kernels that we need to link against. Suppressing the warning would seem to do the trick for armv4t kernels, but if I build for armv4, I still see 'bx' instructions in the thunks.

I checked the compiler output from a recent clang and I see that building for armv4 does not generate any 'bx' instructions, so I don't think we actually need the R_ARM_V4BX/--fix-v4bx trick any more to work around compiler-generated instructions, but we still need to work around the linker adding them.

Could we just have a linker flag to ask for armv4 style thunks, and use them for both v4 and v4t kernels?

smithp35 · 2021-08-11T14:17:57Z

It does look like (ARMInstrInfo.td MOVPCLR https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/ARM/ARMInstrInfo.td#L2443) the return sequence is predicated so clang/llvm will use mov pc, lr for -march=armv4 so in theory no compiler work required.

We can tell if all objects are ArmV4 from the build attributes. Wouldn't need a new command line option.

Looking at LLD I'd expect that the only thunks needed for an Arm v4 kernel would be long range Arm to Arm. If --pic-veneer is set then these will have bx lr in them. However if --pic-veneer isn't set I'd expect to see all the thunks to be of the form:
ldr pc, [pc, #-4] ; l1
l1: .word

In theory these should be OK for v4. May be worth trying a build without --pic-veneer (The kernel wouldn't be position independent though).

I think a new PI Thunk for v4 would look something like:
ldr ip, [pc, #-8] ; l1
add pc, ip, pc ; not interworking but OK for v4 as there is no Thumb
l1: .word destination -4 - l1 ; offset to destination

Although if we're adding thunks for armv4, it may just be worth adding v4t thunks at the same time.

arndb · 2021-08-11T15:21:39Z

It does look like (ARMInstrInfo.td MOVPCLR
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/ARM/
ARMInstrInfo.td#L2443) the return sequence is predicated so clang/llvm will
use mov pc, lr for -march=armv4 so in theory no compiler work required.

Maybe it was the 'blx mcount' when building with '-pg' that caused the problems? I remember debugging into that in the past, but don't remember the exactly problem and this seems fine now, it always uses 'bl mcount' here with the clang version I see.

We can tell if all objects are ArmV4 from the build attributes. Wouldn't
need a new command line option.

Looking at LLD I'd expect that the only thunks needed for an Arm v4 kernel
would be long range Arm to Arm.

Correct

If --pic-veneer is set then these will have
bx lr in them. However if --pic-veneer isn't set I'd expect to see all the
thunks to be of the form:
ldr pc, [pc, #-4] ; l1
l1: .word

In theory these should be OK for v4. May be worth trying a build without
--pic-veneer (The kernel wouldn't be position independent though).

Yes, that works. I see the --pic-veneer flag was added by Ard in
https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=8323/1

to work around veneers that are inserted into the early-boot code that is run before the MMU is enabled.

We can probably get away with this on actual ARMv4/v4t hardware since their memory is too constrained to actually run a kernel that is large enough to require veneers, but it seems a little wrong. In theory one can run a large (allyesconfig style) kernel with ARMv4 enabled on an ARMv5 machine with a lot of memory.

I think a new PI Thunk for v4 would look something like:
ldr ip, [pc, #-8] ; l1
add pc, ip, pc ; not interworking but OK for v4 as there is no Thumb
l1: .word destination -4 - l1 ; offset to destination

Although if we're adding thunks for armv4, it may just be worth adding v4t
thunks at the same time.

That would be nice for user space, but in the kernel we don't really care about the distinction. I suggested a command line flag since we could then just unconditionally pick the v4 veneers for linking a v4/v4t kernel and avoid both the bx instructions and the warning.

lld cannot build for ARMv4/v4T targets because it inserts 'blx' instructions that are unsupported there: ld.lld: warning: lld uses blx instruction, no object with architecture supporting feature detected Add a Kconfig time dependency to prevent those targets from being selected in randconfig builds. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: llvm/llvm-project#50764 Link: ClangBuiltLinux#964 Link: https://lore.kernel.org/r/20221215162635.3750763-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>

nickdesaulniers · 2022-12-28T08:34:17Z

https://reviews.llvm.org/rG62c605771a30 looks perhaps related? cc @stuij

stuij · 2022-12-29T00:40:26Z

Ah, this is bad timing! I was unaware of this ticket.

Yes, so my patch stops lld inserting BLX instructions for v4(t), and instead now creates veneers that are sensible for these architectures. Currently I only added support for absolute relocations, but PIC relocations are high on the list and should follow shortly. I think it's just two cases that we should still account for, and adding them is trivial. Which includes v4 PIC thunks not generating a BX which right now they still do.

And as we don't insert BLX anymore, the LLD BLX warning is not emitted anymore.

Let me know if I'm missing more things that need fixing. ATM I'm quite motivated to get v4(t) working for LLVM, so happy to help out where I can on this.

stuij · 2023-01-09T12:00:35Z

I just put up a review for LLD that will make it support position independent thunks for v4/v4T:
https://reviews.llvm.org/D141272

I think that would fix this issue, and no more work would be needed on the LLVM side. To summarise:

We now (after patch lands) support both v4 and v4t relocations in linker, for both absolute and position independent configs.
We won't need support for --fix-v4bx or R_ARM_V4BX, as no BX is generated on v4, either by compiler or linker.
No special linker flag is needed for v4 as target architecture is inferred from build attributes.
We don't generate a BLX warning anymore, so no need to guard against that.

Am I missing anything?

This would mean the intel-lab-lkp/linux@6a7ee50 patch can be reverted. Would we need to wait until say LLVM 16 is released (assuming this patch lands before that)?

By the way, relatedly, I just pushed a fix (747fc27) for v4/v5/v6 where a b.w would be generated (not supported for these arches) if a range extension is needed and the distance between thunk section and target is smaller than b.w. So this would affect jumps spanning more than 32MB. Seeing the lack of a bug report, I guess this doesn't happen very often.

- Position independent thunks now work for both Armv4 and Armv4T - Armv4 arm->arm thunks don't emit a BX anymore, which doesn't exist for the arch. This fixes llvm/llvm-project#50764. - Armv4 and Armv4T both have the same arm->arm behaviour. Which also is desirable for the above ticket. Reviewed By: MaskRay, peter.smith Differential Revision: https://reviews.llvm.org/D141272

linusw · 2024-02-08T22:11:45Z

I just tested this with clang 17.0.6 on the Gemini FA526 target, close to ARMv4T.
It works like a charm:

sl-boot>go 0x400000
Booting Linux on physical CPU 0x0
Linux version 6.8.0-rc1+ (linus@lino) (clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708), LLD 17.0.6) #49 PREEMPT Thu Feb 8 23:02:26 CET 2024
CPU: FA526 [66015261] revision 1 (ARMv4), cr=0000397f
CPU: VIVT data cache, VIVT instruction cache
OF: fdt: Machine model: D-Link DIR-685 Xtreme N Storage Router
(...)

It boots all the way to userspace, gets network, routes packets and has framebuffer.

linusw · 2024-02-15T13:47:37Z

Also tested with clang 17.0.6 on the SA110 Netwinder, which is a pure ARMv4 targer.
It boots fine and works as expected:

Press '*' TWICE to enter debug......Booting kernel...
Uncompressing Linux... done, booting the kernel.
Booting Linux on physical CPU 0x0
Linux version 6.8.0-rc4-00034-g8d3dea210042 (linus@lino) (clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708), LLD 17.0.6) #8 Th4
CPU: StrongARM-110 [4401a103] revision 3 (ARMv4), cr=0000517f
CPU: VIVT data cache, VIVT instruction cache
Machine: Rebel-NetWinder
(...)

All the way to userspace, network and graphics.

stuij · 2024-02-19T16:51:22Z

Yay! Tackar for posting this, and I bet I speak for all armv4(t) fans all over the world.

I Googled the Gemini SoC, and landed on a page you wrote. I'm tempted to buy a router, or perhaps one of those strongarm servers.

- Position independent thunks now work for both Armv4 and Armv4T - Armv4 arm->arm thunks don't emit a BX anymore, which doesn't exist for the arch. This fixes llvm/llvm-project#50764. - Armv4 and Armv4T both have the same arm->arm behaviour. Which also is desirable for the above ticket. Reviewed By: MaskRay, peter.smith Differential Revision: https://reviews.llvm.org/D141272

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021

stuij self-assigned this Jan 7, 2023

stuij closed this as completed in 6f9ff1b Jan 13, 2023

This was referenced Jan 13, 2023

retest armv4 with lld ClangBuiltLinux/linux#1789

Closed

lld uses blx instruction, no object with architecture supporting feature detected ClangBuiltLinux/linux#964

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support --fix-v4bx #50764

support --fix-v4bx #50764

nickdesaulniers commented Aug 9, 2021

efriedma-quic commented Aug 9, 2021

MaskRay commented Aug 9, 2021

efriedma-quic commented Aug 10, 2021

arndb mannequin commented Aug 10, 2021

smithp35 commented Aug 10, 2021

efriedma-quic commented Aug 10, 2021

efriedma-quic commented Aug 10, 2021

smithp35 commented Aug 11, 2021

arndb mannequin commented Aug 11, 2021

smithp35 commented Aug 11, 2021

arndb mannequin commented Aug 11, 2021

nickdesaulniers commented Dec 28, 2022

stuij commented Dec 29, 2022

stuij commented Jan 9, 2023

linusw commented Feb 8, 2024

linusw commented Feb 15, 2024 •

edited

Loading

stuij commented Feb 19, 2024

support --fix-v4bx #50764

support --fix-v4bx #50764

Comments

nickdesaulniers commented Aug 9, 2021

Extended Description

efriedma-quic commented Aug 9, 2021

MaskRay commented Aug 9, 2021

efriedma-quic commented Aug 10, 2021

arndb mannequin commented Aug 10, 2021

smithp35 commented Aug 10, 2021

efriedma-quic commented Aug 10, 2021

efriedma-quic commented Aug 10, 2021

smithp35 commented Aug 11, 2021

arndb mannequin commented Aug 11, 2021

smithp35 commented Aug 11, 2021

arndb mannequin commented Aug 11, 2021

nickdesaulniers commented Dec 28, 2022

stuij commented Dec 29, 2022

stuij commented Jan 9, 2023

linusw commented Feb 8, 2024

linusw commented Feb 15, 2024 • edited Loading

stuij commented Feb 19, 2024

linusw commented Feb 15, 2024 •

edited

Loading