Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libspl: implement atomics in terms of atomics #11904

Merged
merged 3 commits into from
Apr 19, 2021

Conversation

nabijaczleweli
Copy link
Contributor

@nabijaczleweli nabijaczleweli commented Apr 15, 2021

Motivation and Context

dunno, but "pthread mutex based atomics" was a hilarious concept

Description

This replaces the generic libspl atomic.c atomics implementation with one based on atomics, thereby changing the generated code from "truly atrocious" to "you'd need to assemble this manually to improve it much", mostly. _nv variants are even branchless! Someone who's less of a coward than me could probably apply some of what this code-gen got right to the amd64 assembly impl.

membar_*()s have code in them now, enter and exit are very likely right (if only because the codegen is correct), but I'm not sure about producer and consumer.

It might be prudent to move this implementation to a header included from atomic.h on non-handrolled platforms, considering that it doesn't use a global object anymore, to allow the optimiser to "go ham", as they say, even harder.

All compiled on default config (-O2 -g) with clang -v saying:

Debian clang version 13.0.0-++20210405110107+e2a0f512eaca-1~exp1~20210405210820.1423
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /bin
Found candidate GCC installation: /bin/../lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/8
Candidate multilib: .;@m64
Selected multilib: .;@m64

The assemblies are too long to inline here, but here's code generated by:

How Has This Been Tested?

The codegen looks right, but if I got something wrong then the testsuite will get it.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the OpenZFS code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes. – none apply
  • I have run the ZFS Test Suite with this change applied. – CI take my hand
  • All commit messages are properly formatted and contain Signed-off-by.

@nabijaczleweli nabijaczleweli force-pushed the tomb branch 2 times, most recently from 96faffb to 2f602f6 Compare April 15, 2021 13:57
@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Apr 15, 2021

Hm, it looks like the buildds won't catch any possible errors because the only ones that run tests are amd64, I'll add a bypass commit for that to the top, for removal on merge.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 15, 2021
@behlendorf
Copy link
Contributor

The generic pthread code was added merely to provide a minimum level of functionality. At the time this was added gcc didn't provide any builtin atomics and the idea of rolling all our own wasn't particularly appealing (it looks like gcc 4.4 introduced them as an experimental feature for some architectures back in 2010). Also because this user space code is only used by ztest and zdb on non-x86 architectures it was never a particularly big concern.

Happily times have changed and both gcc and clang appear to have solid builtin support for atomics now. Which is great. Assuming the atomics are available for all of the platforms OpenZFS currently builds on I'd be inclined to simply drop both of the handrolled x86 and x86_64 implementations as well and rely entirely on the builtins. While we don't have CI bots for all of these architectures we have made an effort to not break the build for these architectures: x86, x86_64, ppc, ppc64, arm, aarch64, sparc64, mips, s390x.

@nabijaczleweli
Copy link
Contributor Author

As long as we require a fresh enough cc to support the intrinsics, I'm relatively sure this will build on every platform; if it doesn't have native support for some desired width, then it's provided by -latomic from the compiler distribution (which needs it anyway to comply with C11). This is (apparently) why ppc64el fails right now.

I can also build-test on s390x (provided I manage to convince that VM that booting is in fact preferable to panicking), this leaves mips (i don't foresee having a good time configuring openzfs on a router) and sparc64 (which is a ports arch, and I have neither the patience nor the expertise to try to install it).

In that case, I'll purge the assembly implementations.

@nabijaczleweli nabijaczleweli force-pushed the tomb branch 2 times, most recently from 082492b to e3a3997 Compare April 15, 2021 17:48
@behlendorf
Copy link
Contributor

From what I can tell it looks like all of the architectures I listed are supported. Although I wasn't able to find a comprehensive list documented anywhere. Also the oldest distribution we support is RHEL 7 which ships with gcc 4.8 as the default compiler, which is new enough. So from what I can gather we should be in good shape to drop this code entirely. If we run in to issues with one of the architectures not covered by the CI we can address it then. So I wouldn't worry to much about spinning up a s390x or mips virtual machine. Unless it sounds like fun!

@nabijaczleweli nabijaczleweli force-pushed the tomb branch 3 times, most recently from 7414be1 to aa306f1 Compare April 15, 2021 22:07
@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Apr 15, 2021

FreeBSD 13 failed because it couldn't fetch the src set, but not before building the problematic bit correctly, so if nothing else fails and the fences in the membars look fine then this should be g2g.

And, yeah, the last time I installed s390x I ended up reporting two separate debian-installer bugs, it was quite fun.

config/user-libatomic.m4 Show resolved Hide resolved
lib/libspl/asm-generic/atomic.c Outdated Show resolved Hide resolved
lib/libspl/asm-generic/atomic.c Outdated Show resolved Hide resolved
lib/libspl/asm-generic/atomic.c Show resolved Hide resolved
lib/libspl/asm-generic/atomic.c Show resolved Hide resolved
@nabijaczleweli nabijaczleweli force-pushed the tomb branch 4 times, most recently from a9f733b to 1eff440 Compare April 16, 2021 19:43
Currently builds on most arches, despite using atomics,
but may break if the arch doesn't support native 2-byte atomics
that therefore need to be served by library code

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
On default-configured amd64 with clang
13.0.0-++20210405110107+e2a0f512eaca-1~exp1~20210405210820.1423
this now generates equivalent code (though with depeer prologues)
to the assembly implementation (though, notably, the _nv variants
are branchless, but atomic_{clear,set}_long_excl are pretty bad
(still infinitely better than the garbage pthread mutexes generate))

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
These are only used by ztest and zdb, absolute speed isn't too big of a
concern

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 16, 2021
@behlendorf behlendorf merged commit fef8bd4 into openzfs:master Apr 19, 2021
@mmatuska
Copy link
Contributor

After this commit, clang complains on FreeBSD on almost all architectures that are not amd64 about misaligned atomics:

error: large atomic operation may incur significant performance penalty [-Werror,-Watomic-alignment]

Is this fine to just ignore the error?

@nabijaczleweli
Copy link
Contributor Author

They're still likely faster than locking a global pthread mutex, so yeah

behlendorf pushed a commit that referenced this pull request Jun 22, 2021
This replaces the generic libspl atomic.c atomics implementation
with one based on builtin gcc atomics.  This functionality was added
as an experimental feature in gcc 4.4.  Today even CentOS 7 ships
with gcc 4.8 as the default compiler we can make this the default.

Furthermore, the builtin atomics are as good or better than our
hand-rolled implementation so it's reasonable to drop that custom code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11904
Closes #12252
Closes #12244
@Bronek
Copy link

Bronek commented Jul 2, 2021

After this commit, clang complains on FreeBSD on almost all architectures that are not amd64 about misaligned atomics:

error: large atomic operation may incur significant performance penalty [-Werror,-Watomic-alignment]

I think unaligned atomic operation may actually crash your program on some platforms (notably intel is not one of these, which is why such atomic operations are less readily detected/fixed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants