Replies: 17 comments 19 replies
-
My personal thinking is, that all fletcher based code should move from This would also resolve the issues with compilers, which generate sometimes bad code: |
Beta Was this translation helpful? Give feedback.
-
The XSAVE instruction requires memory to save the FPU state to, it's called the XSAVE area. On Linux we allocate a global buffer per CPU for it. zfs/include/os/linux/kernel/linux/simd_x86.h Lines 243 to 252 in 620a977 All calls to Splitting the [1] |
Beta Was this translation helpful? Give feedback.
-
Looking at zfs_fletcher_avx512.c in more detail the calls to |
Beta Was this translation helpful? Give feedback.
-
How important is the per-core thing? I can probably do like Linux on Windows, but on macOS, the |
Beta Was this translation helpful? Give feedback.
-
I'd say we do this to avoid moving around cachelines. Please note that we're doing the whole kfpu dance only to account for the fact that the native Linux implementation is GPL only. So I wonder if there's no OS provided way to ensure integrity of the FPU state on Windows and macOS you could use there. |
Beta Was this translation helpful? Give feedback.
-
@lundman Windows should have a save state area called https://github.com/reactos/reactos/search?q=PXSTATE_SAVE&type=code You should be able to use that. |
Beta Was this translation helpful? Give feedback.
-
@lundman It looks like XNU already supports this. Is there anything stopping you from using https://github.com/apple/darwin-xnu/blob/2ff845c2e033bd0ff64b5b6aa6063a1f8f65aa32/osfmk/mach/i386/fp_reg.h#L130-L134 |
Beta Was this translation helpful? Give feedback.
-
That is annoying. On the bright side, we probably could share code for saving registers between Linux and XNU, provided that the region in the |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
In that case, does XNU support kernel thread local storage? |
Beta Was this translation helpful? Give feedback.
-
Nope. We did a spl-tsd.c which just matches based on thread ptr value. Luckily, TSDs arent used much in ZFS. |
Beta Was this translation helpful? Give feedback.
-
OK so there's currently just 4 places, and each one of them take a So I simply added some local vars to it:
and invented new It will do for now, we can still think of other things of course, just means it isn't a roadblock here. |
Beta Was this translation helpful? Give feedback.
-
Would something like this work on XNU?
I assume that they have not banned the use of per cpu variables on XNU. |
Beta Was this translation helpful? Give feedback.
-
So, PERCPU won't work, no header to start with, but it boils down to However, I can possibly do something with:
seems awfully deep, but it is what it is. I'll try it next time I can reboot my M1. |
Beta Was this translation helpful? Give feedback.
-
What would you think of this small refactoring I did (untested)? AttilaFueloep/zfs@master...zfs-refactor-fletcher You could then add your state to |
Beta Was this translation helpful? Give feedback.
-
In the currently code, after it does the sse2 benchmark, which is a loop of 32, the calls to
Should byteswap also call begin? or perhaps not, if _init does. Or should it call end before returning? Which style is expected? I just threw in an atomic inc/dec in begin/end to see why it kept writing over my SaveState. |
Beta Was this translation helpful? Give feedback.
-
All right, #14600. |
Beta Was this translation helpful? Give feedback.
-
Trying out this discussion thing, hoping to get input from @AttilaFueloep @mcmilk @behlendorf
At the moment we have things like;
https://github.com/openzfs/zfs/blob/master/module/zcommon/zfs_fletcher_avx512.c#L42
https://github.com/openzfs/zfs/blob/master/module/zcommon/zfs_fletcher_avx512.c#L76
So the
kfpu_being()
andkfpu_end()
are in separate functions. Windows sort of want to allocate some space to save it:So it gets difficult when they are split across functions. They used to be nested even, but it seems maybe the nested calls
have been removed.
Can we make them simpler? Other solutions considered?
Beta Was this translation helpful? Give feedback.
All reactions