-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A64: Add support ARMv9.0 instructions #100
base: master
Are you sure you want to change the base?
Conversation
Armv8.3 Load-Acquire RCpc instructions Instructions added: - LDAPR - LDAPRH - LDAPRB
Armv8.4 Load-Acquire RCpc instructions version 2 Instructions added: - STLUR - STLURB - STLURH - LDAPURB - LDAPURSB - LDAPURSH - LDAPUR - LDAPURSW
Armv8.0 Speculation Barrier Instructions added: - SB
Additional SVE Features added: - FEAT_SVE_AES - FEAT_SVE_PMULL128 - FEAT_SVE_BitPerm - FEAT_SVE_SHA3 - FEAT_SVE_SM4 - FEAT_SVE_B16B16 - FEAT_SVE2p1 Link to spreadsheet with added instructions: https://docs.google.com/spreadsheets/d/1YYuc-CM2gdZJpnx3xm_31EOWQ5T0cZVLmgNM6vW0I2M/edit?usp=sharing
Renamed 'pop_neon' and 'push_neon' to 'pop_simd' and 'push_simd' respectively, with an IFDEF selecting the corresponding codepath. However, there are a few issues yet to be ironed out. Firstly, on a system with a Vector Length (VL) of more than 128-bits it doesn't work. This has been tested with QEMU and natively on an AWS Graviton machine(c7g.4xlarge). The dispatcher for some reason takes the wrong path on a 'CBZ' instruction at around basic block 152. This leads to early program termination with a negative return status. Secondly, it only works with a VL of 128-bits on a dynamically compiled (target) binary. Statically compiled ones just SEGFAULT. Thirdly, having the instructions for pushing and popping the predicate registers makes it so that nothing works at all. Regardless of VL and how the (target) binary was compiled (dynamically or statically).
Hi @ijan1, Thank you for your contribution. The code changes look good to me, but before we merge, let's discuss points you've raised.
At the start-up, MAMBO should check the vector length supported by the system, and quit if it's not 128-bits. Not having a general support it's okay, but should be signaled. But before fixing that, take a look at my comments regarding the predicates.
Again, we should have a check for that. If SVE is detected, the loader should check if loaded binary is static and quit with a message.
This is the biggest issue at the moment, as we get into it may work ground or may not ground. And it seems more and more to me, the failure is not SVE specific, but rather relates to |
Me again, I had another look at the aarch64 dispatcher, and found something interesting: .global syscall_wrapper
.global syscall_wrapper_svc
syscall_wrapper:
STP X30, X29, [SP, #-16]!
BL push_x4_x21
STP X0, X1, [SP, #-32]!
STP X2, X3, [SP, #16]
BL push_neon
MRS X19, NZCV
MRS X20, FPCR
MRS X21, FPSR
MOV X0, X8
ADD X1, SP, #512
MOV X2, X29
LDR X3, disp_thread_data
LDR X4, syscall_handler_pre_addr
BLR X4
CBZ X0, s_w_r
ADD X9, SP, #512
LDP X0, X1, [X9, #0]
LDP X2, X3, [X9, #16]
LDP X4, X5, [X9, #32]
LDP X6, X7, [X9, #48]
LDR X8, [X9, #64]
// Balance the stack on rt_sigreturn, which doesn't return here
CMP X8, #0x8b
BNE svc
ADD SP, SP, #(64 + 144 + 512)
svc: SVC 0
syscall_wrapper_svc:
ADD X1, SP, #512
STR X0, [X1, #0]
MOV X0, X8
MOV X2, X29
LDR X3, disp_thread_data
LDR X4, syscall_handler_post_addr
BLR X4
s_w_r:
BL pop_neon
MSR NZCV, X19
MSR FPCR, X20
MSR FPSR, X21
LDP X2, X3, [SP, #16]
LDP X0, X1, [SP], #32
BL pop_x4_x21
LDP X29, X30, [SP, #16]
STP X0, X1, [SP, #16]
LDP X0, X1, [SP], #16
B checked_cc_return See how 512 repeats multiple times in the context of the stack, and it also calls |
`syscall_handler_pre` expects its second argument to be a pointer to the arguments of the system call, and it used a fixed offset to account for the NEON register (512), however with the addition of SVE and SVE2 this magic number no longer works. This commit changes it so that the address is calculated dynamically when using SVE's VL and PL registers and falls back to the NEON value in the absence of SVE.
Support for instructions up to ARMv9.0, most notably SVE and SVE2.
However, there are a few issues yet to be fixed.
Firstly, on a system with a Vector Length (VL) of more than 128-bits it doesn't work. This has been tested with QEMU and natively on an AWS Graviton machine(c7g.4xlarge). The dispatcher for some reason takes the wrong path on a 'CBZ' instruction at around basic block 152. This leads to early program termination with a negative return status.
Secondly, it only works with a VL of 128-bits on a dynamically compiled (target) binary. Statically compiled ones just SEGFAULT.
Thirdly, having the instructions for pushing and popping the predicate registers makes it so that nothing works at all. Regardless of VL and how the (target) binary was compiled (dynamically or statically). I was thinking of disabling them temporarily because compiling Mambo with support for SVE and SVE2 instructions doesn't insert any such instructions, but that's not a guarantee with changes in compiler and/or compiler version.
With a VL of 128-bits (using QEMU), Highway[1] and ARMRAL[2] pass all SVE and SVE2 test.
I have attached a spreadsheet[3] with the SVE/SVE2 instructions added for easier visualisation and some notes on which ones I think might need a bit more work.
[1]https://github.com/google/highway
[2] https://developer.arm.com/documentation/102249/2310/?lang=en
[3] https://docs.google.com/spreadsheets/d/1jABn1_99A2nKaIqYgZTg_JrHdmxqhq-r4T9hTIFiKo0/edit?usp=sharing