zcommon: Refactor FPU state handling in fletcher4 #14600

AttilaFueloep · 2023-03-09T19:15:20Z

Motivation and Context

Currently calls to kfpu_begin() and kfpu_end() are split between the init() and fini() functions of the particular SIMD implementation. This was done in #14247 as an optimization measure for the ABD adapter. Unfortunately the split complicates FPU handling on platforms that use a local FPU state buffer, like Windows and macOS.

Please see also discussion #14574.

Description

To ease porting, we introduce a boolean struct member in fletcher_4_ops_t, indicating use of the FPU, and move the FPU state handling from the SIMD implementations to the call sites.

How Has This Been Tested?

Parallel runs of 2 x fio and a mprime -t for 30 minutes. mprime -t usually fails almost immediately in case of FPU save and restore problems.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

AttilaFueloep · 2023-03-09T20:18:02Z

Not sure I did the right thing with the abi file, the diff seems pretty huge. But at least checkstyle is happy.

ryao

It looks good to me, but we have a minor optimization opportunity that we probably should exploit as part of this. See my comment on the struct.

ryao · 2023-03-09T20:57:48Z

include/zfs_fletcher.h

@@ -126,6 +126,7 @@ typedef struct fletcher_4_func {
 	fletcher_4_fini_f fini_byteswap;
 	fletcher_4_compute_f compute_byteswap;
 	boolean_t (*valid)(void);
+	boolean_t uses_fpu;


It occurs to me that we could mark this structure as requiring 64-byte alignment so that subsequent accesses are L1 cache accesses.

As a more general change, we could also rename ->init_native and ->fini_native to ->init and ->fini respectively and drop the separate init_byteswap()/fini_byteswap() functions, since we only need to have different compute functions to support different endianness. That should be a separate commit, but it is entirely optional.

Yeah, I thought about that, it would squeeze the size of the struct below 64 bytes again, currently it is 72 bytes. Let me have a more thorough look tomorrow.

Fitting the entire structure in a cacheline does not matter that much here because the name field is not accessed during normal operation, so just fitting everything but that is fine. I just thought it would be nice to simplify this code because the extra init and fini functions make it slightly harder to understand.

Aligning the structure to a cacheline on the other hand should avoid another memory access whenever the fields needed during normal operation would have spanned two cachelines.

I can remove init_byteswap()/fini_byteswap() in a follow up PR. Just wondering why they were implemented in the first place.

ryao · 2023-03-09T23:08:32Z

module/zcommon/zfs_fletcher.c

 	ops->compute_byteswap(&ctx, buf, size);
+	if (ops->uses_fpu == B_TRUE) {
+		kfpu_end();
+	}


kfpu_begin() should be called before init_byteswap() and kfpu_end() should be called after fini_byteswap(). That way the init and fini functions are allowed to use vector instructions. A future patch to compile GNU C Vector code likely will make use of this.

The same applies to the native versions.

The intention was to minimize the time we're running with interrupts and preemption disabled. So this is a trade-off between ease of future changes and an actual advantage. I'm fine either way, let's see what other reviewers think.

59493b6 already made a decent reduction in the total time that we spend with interrupts disabled not that long ago by not doing context save/restore multiple times per ABD. At the same time, it did kfpu_begin() before the rest of the init function and kfpu_end() after the rest of the fini function, which passed review.

An extra microsecond or two with interrupts disabled will not matter in the slightest since the systems we support do not give real time guarantees. If we were running on a system that tried to give real time guarantees and it were so pedantic to give guarantees that were accurate to a level where a microsecond or two difference would matter, we probably would not be allowed to run with interrupts disabled at all, since disabling interrupts to use SIMD could cause the real time guarantee to be violated. I am not sure if even real-time Linux is that strict.

Anyway, what is important here is not minimizing time with interrupts disabled, but minimizing the total time spent doing processing. If the interrupts are disabled slightly longer than absolutely necessary to reduce the total CPU time spent, then that is preferable. In this case, doing it this way is mostly to prepare for a future small reduction in CPU time. Since we know that there is a possibility of that here, I would prefer us to do it that way so that we do not need another patch. We really are not gaining anything from an extra microsecond or two with interrupts enabled per checksum operation, so there is no need to be frugal on that level.

That's sound reasoning, will change.

lundman

LGTM

behlendorf · 2023-03-10T00:46:10Z

Not sure I did the right thing with the abi file, the diff seems pretty huge. But at least checkstyle is happy.

You can grab an updated ABI file here. It should only include updates for the interfaces which changed.

https://github.com/openzfs/zfs/actions/runs/4377972107

AttilaFueloep · 2023-03-10T01:21:18Z

@behlendorf Yeah, I did, still

$ git diff HEAD~1 HEAD lib/libzfs/libzfs.abi| wc -l
9512

behlendorf · 2023-03-10T02:01:39Z

I wonder if perhaps the last time the ABI was generated it was under ubuntu 20.04. We've moved to ubuntu 22.04 for the checkstyle builder now and we've certainly seen updating the environment and version of libabigail vastly changed the exact output.

ryao · 2023-03-10T02:25:19Z

I wonder if perhaps the last time the ABI was generated it was under ubuntu 20.04. We've moved to ubuntu 22.04 for the checkstyle builder now and we've certainly seen updating the environment and version of libabigail vastly changed the exact output.

Perhaps the commit updating the ABI should be a separate commit prior to this one, so that this commit stays small.

gmelikov · 2023-03-10T08:20:02Z

@behlendorf we use ghcr.io/openzfs/libabigail image, so libabigail should be the same, but I'm not sure what can affect it in this case. Ah, maybe the fact that we're compiling in different release now with different deps.

AttilaFueloep · 2023-03-10T15:11:31Z

Ah, maybe the fact that we're compiling in different release now with different deps.

That seems to be the case, at least it would explain e.g 'libcrypto.so.1.1 -> libcrypto.so.3. It would also explain that the tar file contained all .abi files.

So, what's the way to proceed?

gmelikov · 2023-03-10T15:53:13Z

Give me 1 hour, I'll generate ABI files from pure master and make a PR to get meaningful diffs later in this patch.

Done: #14605

AttilaFueloep · 2023-03-10T16:30:32Z

Thanks, no reason to hurry, take your time.

EDIT: Actually you were faster in fixing than me in replying.

behlendorf · 2023-03-11T00:23:49Z

Merged #14605

AttilaFueloep · 2023-03-11T18:40:09Z

Addressed review feedback, added ABI file and updated PR description to indicate an ABI change.

ryao · 2023-03-12T18:07:20Z

One nit about the title. Instead of writing module/zcommon: ..., it would be more consistent to write zcommon: ....

Currently calls to kfpu_begin() and kfpu_end() are split between the init() and fini() functions of the particular SIMD implementation. This was done in openzfs#14247 as an optimization measure for the ABD adapter. Unfortunately the split complicates FPU handling on platforms that use a local FPU state buffer, like Windows and macOS. To ease porting, we introduce a boolean struct member in fletcher_4_ops_t, indicating use of the FPU, and move the FPU state handling from the SIMD implementations to the call sites. Signed-off-by: Attila Fülöp <attila@fueloep.org>

AttilaFueloep · 2023-03-12T21:57:56Z

Agreed, changed the commit message and the PR title.

lundman · 2023-03-16T04:50:15Z

Ahhh hah, so close :)

	if (ops->uses_fpu == B_TRUE) {
		kfpu_begin();
	}
	ops->init_native(&ctx);
	ops->compute_native(&ctx, buf, size);
	ops->fini_native(&ctx, zcp);
	if (ops->uses_fpu == B_TRUE) {
		kfpu_end();
	}

Since kfpu_begin() declares the variables, local to the body of the if conditional, naturally kfpu_end() won't see them. Obviously a much smaller issue to address, just got to think of a good solution.

Currently calls to kfpu_begin() and kfpu_end() are split between the init() and fini() functions of the particular SIMD implementation. This was done in openzfs#14247 as an optimization measure for the ABD adapter. Unfortunately the split complicates FPU handling on platforms that use a local FPU state buffer, like Windows and macOS. To ease porting, we introduce a boolean struct member in fletcher_4_ops_t, indicating use of the FPU, and move the FPU state handling from the SIMD implementations to the call sites. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes openzfs#14600

AttilaFueloep mentioned this pull request Mar 9, 2023

Add Windows support to OpenZFS #14034

Closed

13 tasks

ryao suggested changes Mar 9, 2023

View reviewed changes

ryao reviewed Mar 9, 2023

View reviewed changes

lundman approved these changes Mar 9, 2023

View reviewed changes

behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 11, 2023

AttilaFueloep force-pushed the zfs-refactor-fletcher branch from 34d4a59 to bcab66d Compare March 11, 2023 03:08

ryao approved these changes Mar 11, 2023

View reviewed changes

AttilaFueloep force-pushed the zfs-refactor-fletcher branch from bcab66d to fff528d Compare March 11, 2023 18:40

AttilaFueloep changed the title ~~module/zcommon: Refactor FPU state handling in fletcher4~~ zcommon: Refactor FPU state handling in fletcher4 Mar 12, 2023

AttilaFueloep force-pushed the zfs-refactor-fletcher branch from fff528d to 5d45089 Compare March 12, 2023 21:57

mcmilk approved these changes Mar 13, 2023

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 14, 2023

behlendorf merged commit 78289b8 into openzfs:master Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zcommon: Refactor FPU state handling in fletcher4 #14600

zcommon: Refactor FPU state handling in fletcher4 #14600

AttilaFueloep commented Mar 9, 2023 •

edited

Loading

AttilaFueloep commented Mar 9, 2023 •

edited

Loading

ryao left a comment

ryao Mar 9, 2023

AttilaFueloep Mar 10, 2023

ryao Mar 10, 2023 •

edited

Loading

AttilaFueloep Mar 11, 2023 •

edited

Loading

ryao Mar 9, 2023

AttilaFueloep Mar 10, 2023

ryao Mar 10, 2023

AttilaFueloep Mar 10, 2023

AttilaFueloep Mar 11, 2023

lundman left a comment

behlendorf commented Mar 10, 2023

AttilaFueloep commented Mar 10, 2023

behlendorf commented Mar 10, 2023

ryao commented Mar 10, 2023

gmelikov commented Mar 10, 2023

AttilaFueloep commented Mar 10, 2023

gmelikov commented Mar 10, 2023 •

edited

Loading

AttilaFueloep commented Mar 10, 2023 •

edited

Loading

behlendorf commented Mar 11, 2023

AttilaFueloep commented Mar 11, 2023

ryao commented Mar 12, 2023

AttilaFueloep commented Mar 12, 2023

lundman commented Mar 16, 2023

zcommon: Refactor FPU state handling in fletcher4 #14600

zcommon: Refactor FPU state handling in fletcher4 #14600

Conversation

AttilaFueloep commented Mar 9, 2023 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

AttilaFueloep commented Mar 9, 2023 • edited Loading

ryao left a comment

Choose a reason for hiding this comment

ryao Mar 9, 2023

Choose a reason for hiding this comment

AttilaFueloep Mar 10, 2023

Choose a reason for hiding this comment

ryao Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

AttilaFueloep Mar 11, 2023 • edited Loading

Choose a reason for hiding this comment

ryao Mar 9, 2023

Choose a reason for hiding this comment

AttilaFueloep Mar 10, 2023

Choose a reason for hiding this comment

ryao Mar 10, 2023

Choose a reason for hiding this comment

AttilaFueloep Mar 10, 2023

Choose a reason for hiding this comment

AttilaFueloep Mar 11, 2023

Choose a reason for hiding this comment

lundman left a comment

Choose a reason for hiding this comment

behlendorf commented Mar 10, 2023

AttilaFueloep commented Mar 10, 2023

behlendorf commented Mar 10, 2023

ryao commented Mar 10, 2023

gmelikov commented Mar 10, 2023

AttilaFueloep commented Mar 10, 2023

gmelikov commented Mar 10, 2023 • edited Loading

AttilaFueloep commented Mar 10, 2023 • edited Loading

behlendorf commented Mar 11, 2023

AttilaFueloep commented Mar 11, 2023

ryao commented Mar 12, 2023

AttilaFueloep commented Mar 12, 2023

lundman commented Mar 16, 2023

AttilaFueloep commented Mar 9, 2023 •

edited

Loading

AttilaFueloep commented Mar 9, 2023 •

edited

Loading

ryao Mar 10, 2023 •

edited

Loading

AttilaFueloep Mar 11, 2023 •

edited

Loading

gmelikov commented Mar 10, 2023 •

edited

Loading

AttilaFueloep commented Mar 10, 2023 •

edited

Loading