-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
module: small fixes for FreeBSD/aarch64 #14715
Conversation
919844e
to
cd24237
Compare
@@ -507,6 +507,16 @@ CFLAGS.zstd_lazy.c+= ${__ZFS_ZSTD_AARCH64_FLAGS} | |||
CFLAGS.zstd_ldm.c+= ${__ZFS_ZSTD_AARCH64_FLAGS} | |||
CFLAGS.zstd_opt.c+= ${__ZFS_ZSTD_AARCH64_FLAGS} | |||
|
|||
sha256-armv8.o: sha256-armv8.S |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this hackish thing ;-)
I would change sha(256|512)-armv8.S to fit without this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, but I just copied the technique from the next couple of lines after it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm not really sure what you mean. We have to pass -mgeneral-regs-only normally to avoid unexpected fpu use, but we really need the neon instructions here. We don't have a better way to strip it from just these four files AFAICT, and removing the offending instructions completely defeats the point of the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a runtime test to check for fpu+armv8 crypto ... when this test is okay, the code gets executed.
When this can not be done this way, we may need some more compiler options... but I would favorite the current test + execute thing.... of cause - the register x18 thing ... I didn't know. I will fix this ... but currently I don't have the time for it. Maybe next week.
The current work arround may be some #ifdef __FreeBSD__
within the assembly + glue code ... but when this has one or two weeks, then I have a pull request for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-mgeneral-regs-only forbids use of floating-point instructions in assembly. These files, by design, use floating-point. Therefore the flag must be stripped specifically for these files, otherwise they will not build. What is your exact objection? It's unclear what your two paragraphs are actually saying specifically on a technical level. This isn't OS-specific; it's just a difference between LLVM, which applies -mgeneral-regs-only to the assembly, and GCC, which doesn't forward it on to GNU as (and there may not even be a flag to do it?). Compiling the Linux module with LLVM would hit the exact same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit odd to me that CFLAGS applies to assembly, but that is a complaint to file with LLVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CFLAGS is just the name for it in the build system. And nothing says -mfoo is purely for the compiler and not the assembler; there's precedent with things like -march governing which instructions you get, with -mgeneral-regs-only being implemented in LLVM as disabling the fp-armv8, crypto and neon features, i.e. the same as what you get with -march=armv8-a+nofp+nosimd+nocrypto. It's just a difference of opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-mgeneral-regs-only does more than just that. It permanently prevents you from turning on these instructions via pragmas, which has caused me plenty of grief late last year when looking into aarch64 optimizations on key fletcher4 functions. :/
In any case, I have far too much to do to have time to complain to LLVM about this and even if I did and they agreed, we would still need to support the older versions anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
#define kfpu_allowed() 1 | ||
#define kfpu_initialize(tsk) do {} while (0) | ||
#define kfpu_begin() do {} while (0) | ||
#define kfpu_end() do {} while (0) | ||
#define kfpu_begin() do { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been tested? Maybe we can wait a bit, until I fixed blake3 and sha2 assembly.
I think I dig deeper into this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as noted in the x18 issue, I tested this with hacked-around blake3 and can actually boot again now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to enter the fpu state because the checksum benchmark is called from the boot thread. In this thread floating point instructions are disabled so we need to enter a fpu section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been tested? Maybe we can wait a bit, until I fixed blake3 and sha2 assembly. I think I dig deeper into this tomorrow.
Master being broken on FreeBSD is a big problem. Rather than leaving master broken while we figure out the ideal solution, we should fix it now and revise it to be more elegant later to allow development that depends on aarch64 working on FreeBSD to continue independently of this.
Btw. you can make a FreeBSD differential revision as well and we can have it in base "earlier" than it gets committed to OpenZFS so that we make arm64 working again. |
Regarding x18, Linux doesn't use it in the same way FreeBSD does, but it does use it if you enable the SHADOW_CALL_STACK config option. So you'll surely find that using ZFS with CONFIG_SHADOW_CALL_STACK on Linux also blows up. So, again, this is not a FreeBSD-specific problem. |
On a related note, why does OpenZFS even have 2000+ line assembly files checked in that are the result of modifying the output of a random version of a random compiler run on an unknown source file? There are no comments, no hope of verifying the assembly does what it's supposed to do beyond black-box testing, and no ability to make changes as needed. Why is this not just done like any other C file and built from the true source, whatever C produced these gargantuan opaque blobs? |
At least with the SHA2 files they're taken from OpenSSL's output and "modified [...] to fit into OpenZFS", though who knows what those modifications are, one would have to diff them against whatever the OpenSSL perl scripts spit out. |
The reference BLAKE3 implementation also has a blake3_neon.c, which makes me wonder why we're using converted-with-simde versions of the SSE2 and SSE4.1 (I don't even want to ask why we want both of those on non-x86...) source rather than the native NEON version? |
64e2cbc
to
bd049f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
sha256-armv8.S and sha512-armv8.S need the same treatment as the sse bits; removal of -mgeneral-regs-only from flags. This fixes errors about requiring NEON, which is a difference in clang vs. gcc treatment of -mgeneral-regs-only being specified on asm files. Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU usage, otherwise we panic promptly at boot as soon as ZFS attempts to do checksum benchmarking. Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
@kevans91 would you mind doing a force-push to re-run buildbot? I had to restart buildbot to install some updates and I think it caused some of your builds to fail. |
bd049f3
to
9f62b28
Compare
I'm not sure I understand how the FreeBSD amd64 buildbot failed... nothing in the configure output looks fatal. stdio for shell_1 indicates it ran gmake, but there's no output from that that I can find anywhere obvious... (and this PR shouldn't have affected anything there, anyways) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kfpu_begin() / kfpu_end()
works on my system also.
I benchmarked and the neon impl. isn't as fast as the converted sse2/sse41 impl. The first versions of the BLAKE3 patchset used the neon c file... the huge differences are seen on e.g. Apple M1. |
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU usage, otherwise we panic promptly at boot as soon as ZFS attempts to do checksum benchmarking. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Kyle Evans <kevans@FreeBSD.org> Closes #14715
sha256-armv8.S and sha512-armv8.S need the same treatment as the sse bits; removal of -mgeneral-regs-only from flags. This fixes errors about requiring NEON, which is a difference in clang vs. gcc treatment of -mgeneral-regs-only being specified on asm files. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Kyle Evans <kevans@FreeBSD.org> Closes openzfs#14715
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU usage, otherwise we panic promptly at boot as soon as ZFS attempts to do checksum benchmarking. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Kyle Evans <kevans@FreeBSD.org> Closes openzfs#14715
sha256-armv8.S and sha512-armv8.S need the same treatment as the sse bits; removal of -mgeneral-regs-only from flags. This fixes errors about requiring NEON, which is a difference in clang vs. gcc treatment of -mgeneral-regs-only being specified on asm files. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Kyle Evans <kevans@FreeBSD.org> Closes openzfs#14715
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU usage, otherwise we panic promptly at boot as soon as ZFS attempts to do checksum benchmarking. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Kyle Evans <kevans@FreeBSD.org> Closes openzfs#14715
sha256-armv8.S and sha512-armv8.S need the same treatment as the sse bits; removal of -mgeneral-regs-only from flags. This fixes errors about requiring NEON, which is a difference in clang vs. gcc treatment of -mgeneral-regs-only being specified on asm files. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Kyle Evans <kevans@FreeBSD.org> Closes openzfs#14715
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU usage, otherwise we panic promptly at boot as soon as ZFS attempts to do checksum benchmarking. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Kyle Evans <kevans@FreeBSD.org> Closes openzfs#14715
Motivation and Context
Following PR #13741, FreeBSD/aarch64 experiences a panic at boot as we try to use FPU instructions while doing the startup benchmarking.
Description
The primary fix here is to add a proper implementation of kfpu_begin()/kfpu_end() for aarch64, which is implemented much like the x86 version -- fpu_kern(9) to wrap sections that want to do FPU things. I discovered two other issues in the openzfs build while I was forward porting these to the openzfs repo, I fixed one (need to remove -mgeneral-regs-only for sha256/sha512 .S files) and punted on the other one (non-trivial conflict between sys/elf_common.h and spl/sys/vnode.h. More details on the non-trivial conflict in the "resync" commit. (Edit to note: the one I punted on has since been fixed by PR #14674; verbiage about the conflict has been removed from commit messages)
How Has This Been Tested?
Booted FreeBSD/aarch64 with changes applied. Another change was needed to stop clobbering x18 in the blake2 implementation; @mcmilk said he was working on a patch for this (I tested with a naive patch that "fixed" it but may not have done so properly).
Types of changes
Checklist:
Signed-off-by
.