Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of -march=armv8.4-a+sve #115

Closed
fxcoudert opened this issue Sep 5, 2023 · 8 comments
Closed

Handling of -march=armv8.4-a+sve #115

fxcoudert opened this issue Sep 5, 2023 · 8 comments

Comments

@fxcoudert
Copy link
Contributor

From OpenBLAS OpenMathLib/OpenBLAS#4212

We want to compile SVE-enabled code with gcc -march=armv8.4-a+sve -mtune=neoverse-v1. The full compilation line is:

gcc-13 -O2 -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DDYNAMIC_ARCH -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=56 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.24\" -march=armv8.4-a+sve -mtune=neoverse-v1 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=_dgemm_kernel_NEOVERSEV1 -DASMFNAME=_dgemm_kernel_NEOVERSEV1_ -DNAME=dgemm_kernel_NEOVERSEV1_ -DCNAME=dgemm_kernel_NEOVERSEV1 -DCHAR_NAME=\"dgemm_kernel_NEOVERSEV1_\" -DCHAR_CNAME=\"dgemm_kernel_NEOVERSEV1\" -DNO_AFFINITY -DTS=_NEOVERSEV1 -I.. -DBUILD_KERNEL -DTABLE_NAME=gotoblas_NEOVERSEV1 -DDOUBLE -UCOMPLEX -c -DDOUBLE -UCOMPLEX ../kernel/arm64/dgemm_kernel_sve_v2x8.S -o dgemm_kernel_NEOVERSEV1.o

which in turns calls the assembler as:

clang -cc1as -triple arm64-apple-macosx13.0.0 -filetype obj -main-file-name ccMnB12p.s -target-cpu apple-m1 -target-feature +v8.5a -target-feature +crc -target-feature +lse -target-feature +rdm -target-feature +crypto -target-feature +dotprod -target-feature +fp-armv8 -target-feature +neon -target-feature +fp16fml -target-feature +ras -target-feature +rcpc -target-feature +zcm -target-feature +zcz -target-feature +fullfp16 -target-feature +sm4 -target-feature +sha3 -target-feature +sha2 -target-feature +aes -I .. -fdebug-compilation-dir=/private/tmp/openblas-20230905-12149-1dh80k/OpenBLAS-0.3.24/kernel -dwarf-debug-producer "Apple clang version 14.0.3 (clang-1403.0.22.14.1)" -I .. -dwarf-version=4 -mrelocation-model pic --mrelax-relocations -mllvm -disable-aligned-alloc-awareness=1 -o dgemm_kernel_NEOVERSEV1.o /private/tmp/ccMnB12p.s

which errors out with:

dgemm_kernel_NEOVERSEV1.s:21:2: error: instruction requires: sve or sme
 dup z7.d, x18
 ^
dgemm_kernel_NEOVERSEV1.s:22:5: error: instruction requires: sve or sme
    cntd x19
    ^
dgemm_kernel_NEOVERSEV1.s:26:5: error: instruction requires: sve or sme
    ptrue p0.d
    ^
dgemm_kernel_NEOVERSEV1.s:40:2: error: instruction requires: sve or sme
 dup z16.d, #0
 ^

See that SVE was lost. Apple's clang, however, accepts SVE, and actually adding -target-feature +sve makes the assembler succeed.

The source file in this case is (reduced):

$ cat dgemm_kernel_NEOVERSEV1.s
 .text
 .p2align 2
 .global _dgemm_kernel_NEOVERSEV1

_dgemm_kernel_NEOVERSEV1:
 .align 5
 add sp, sp, #-(11 * 16)
 stp d8, d9, [sp, #(0 * 16)]
 stp d10, d11, [sp, #(1 * 16)]
 stp d12, d13, [sp, #(2 * 16)]
 stp d14, d15, [sp, #(3 * 16)]
 stp d16, d17, [sp, #(4 * 16)]
 stp x18, x19, [sp, #(5 * 16)]
 stp x20, x21, [sp, #(6 * 16)]
 stp x22, x23, [sp, #(7 * 16)]
 stp x24, x25, [sp, #(8 * 16)]
 stp x26, x27, [sp, #(9 * 16)]
 str x28, [sp, #(10 * 16)]

 fmov x18, d0
 dup z7.d, x18
    cntd x19
    lsl x20, x19, #1

 lsl x6, x6, #3
    ptrue p0.d

 mov x11, x4
 mov x10, x1
 asr x10, x10, #3

 .align 5
 mov x12, x5
    add x5, x5, x6, lsl #3
 mov x16, x3

 .align 5

 mov x11, x4
 dup z16.d, #0

 ret
@fxcoudert
Copy link
Contributor Author

Now, I think there is no Apple hardware that supports SVE at this time. Not sure if that will change, and probably not high priority at all, but probably at some point it would be nice to allow compilation if the assembler allows it.

@iains
Copy link
Owner

iains commented Sep 5, 2023

I made no attempt to support SVE/SVE2 at this time - because of lack of h/w - I also have no idea if future versions of Apple h/w will use it. I guess supporting the asm output is nice from the PoV of test suites, but it is likely to be quite misleading if a user thinks it works (and then fails at runtime).

Probably the right solution is to make sure that there is a fallback in the package that uses capabilities of the actual h/w? (but I confess to not looking at any details so far).

GCC does support sve/sve2 so I presume we could enable it (if there is some compelling reason).

@iains
Copy link
Owner

iains commented Sep 5, 2023

since the assembler that GCC uses is, effectively, clang -cc1as, we could also (probably) cook up a spec that handles the +sve and passes it through to the assembler line. I guess the question is whether the +sve part of the march is visible at that stage.

@iains
Copy link
Owner

iains commented Sep 5, 2023

This patch takes some significant liberties (i.e. no configure checks etc.) on the basis that so far we have only used clang -cc1as as our assembler.

  • there is no validity checking
  • I have only smoke-tested it on arm64-darwin21 (we'd need to make sure it does not break Big Sur)

the other comments apply - i.e. we should really make sure that we emit code that the hardware can execute :)

edit: however, if this works in the circumstances you need and solves a problem (without breaking anything else) - we can push this into the branch.

diff --git a/gcc/config/aarch64/darwin.h b/gcc/config/aarch64/darwin.h
index aac8228dfea..4db34970c56 100644
--- a/gcc/config/aarch64/darwin.h
+++ b/gcc/config/aarch64/darwin.h
@@ -90,7 +90,7 @@ along with GCC; see the file COPYING3.  If not see
 "%{!mkernel:%{!static:-fPIC}} " DARWIN_CC1_SPEC
 
 #undef ASM_SPEC
-#define ASM_SPEC "-arch %(darwin_arch) "\
+#define ASM_SPEC "-arch %(darwin_arch) %{march*} %{mtune*} "\
   ASM_OPTIONS " %{static} " ASM_MMACOSX_VERSION_MIN_SPEC
 
 #undef ENDFILE_SPEC

@fxcoudert
Copy link
Contributor Author

@iains I have no real urgent need, we have a good workaround anyway, just wanted to flag the issue for consideration at some point

@iains
Copy link
Owner

iains commented Sep 5, 2023

@iains I have no real urgent need, we have a good workaround anyway, just wanted to flag the issue for consideration at some point

I'm looking for a good mechanism for deciding what to do in this case;
we can (of course) check that the change produces no testsuite regressions - but in terms for whether it does anything actually useful, I will defer to you guys and any testing you are able to do.

@iains
Copy link
Owner

iains commented Sep 28, 2023

are you able to check whether the patch above works reliably for your cases ?
(it seems a reasonable approach while we are using clang -cc1as as the assembler, if we changed to a different wrapper we'd need to revise the spec).

Actually splitting the march argument up would seem to require code; and so would need to be handled in the driver somewhere (but at the moment I do not see any pressing need to do so if this works for you).

iains added a commit that referenced this issue Oct 20, 2023
All current aarch64 Darwin versions use 'clang -cc1as' as their assembler
this means that we can safely pass -march to the assembler fixing Issue #115.

gcc/ChangeLog:

	* config/aarch64/darwin.h (ASM_SPEC): Pass -march values to the
	assembler.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
iains added a commit that referenced this issue Oct 20, 2023
All current aarch64 Darwin versions use 'clang -cc1as' as their assembler
this means that we can safely pass -march to the assembler fixing Issue #115.

gcc/ChangeLog:

	* config/aarch64/darwin.h (ASM_SPEC): Pass -march values to the
	assembler.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
fxcoudert pushed a commit to fxcoudert/gcc-darwin-arm64 that referenced this issue Oct 27, 2023
All current aarch64 Darwin versions use 'clang -cc1as' as their assembler
this means that we can safely pass -march to the assembler fixing Issue iains#115.

gcc/ChangeLog:

	* config/aarch64/darwin.h (ASM_SPEC): Pass -march values to the
	assembler.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
fxcoudert pushed a commit to fxcoudert/gcc-darwin-arm64 that referenced this issue Oct 29, 2023
All current aarch64 Darwin versions use 'clang -cc1as' as their assembler
this means that we can safely pass -march to the assembler fixing Issue iains#115.

gcc/ChangeLog:

	* config/aarch64/darwin.h (ASM_SPEC): Pass -march values to the
	assembler.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
@iains
Copy link
Owner

iains commented Feb 11, 2024

I think this is fixed for some time now; @fxcoudert if you agree please close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants