-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More aggsum optimizations. #12145
More aggsum optimizations. #12145
Conversation
cd5437a
to
7144634
Compare
f6c29c1
to
ad3d3c4
Compare
- Avoid atomic_add() when updating as_lower_bound/as_upper_bound. Previous code was excessively strong on 64bit systems while not strong enough on 32bit ones. Instead introduce and use real atomic_load() and atomic_store() operations, just an assignments on 64bit machines, but using proper atomics on 32bit ones to avoid torn reads/writes. - Reduce number of buckets on large systems. Extra buckets not as much improve add speed, as hurt reads. Unlike wmsum for aggsum reads are still important. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
/* | ||
* Atomically read variable. | ||
*/ | ||
#define atomic_load_char(p) (*(volatile uchar_t *)(p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know libspl's versions of these functions are okay being a bit loose, but I don't believe that volatile guarantees atomicity, that's just a common implementation side-effect.
How about GCC's __atomic builtins (also provided by Clang)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
volatile does not provide atomicity, it only makes compiler to not optimize memory accesses. Atomicity is provided by machine register size, copied with single instruction. That is why there are #ifdef _LP64. This is a copy/paste from FreeBSD's atomic_common.h, used on all supported architectures, so I believe it to be correct. I have no opinion about __atomic builtins. When I written this, I haven't seen recent наб's commit to start using them in lib/libspl/atomic.c. It actually makes me wonder why they were not inlined here, but that is a different topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be nice to finally have proper wrappers for atomic loads and stores! This is functionality I know would be helpful elsewhere with some kstats which are a bit dodgy in this regard (though pretty harmless).
Since it sounds like these are all well tested already on FreeBSD this seems fine to me. Though I wouldn't object to updating them to use the GCC atomic builtins for consistency in a similar fashion as was done for the other atomics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is the point now to have all the atomics in separate C file, not inlined in the header. Is it supposed to improve compatibility when library is built with compiler supporting atomics and header is used by less capable? Previously it had sense, since alternative implementations were written in assembler. But now with compiler supposed to implement all of that -- what's the point? To troubles compiler/CPU with function calls?
PS: On a second thought, some operations may require more complicated implementation where function call could have sense, but since compiler hides it from us, we have no way to differentiate. For trivial load/store/etc I'd prefer to not call functions for single machine instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well since this is only for libzpool.so
in user space it's probably not a huge deal since only zdb
and ztest
would use these (at least until there's FUSE or BUSE implementation). The only reason I can think of to not move everything to the header (to be inlined) would be if we wanted a single shared user/kernel space header. But we don't have that today and I don't think we'd gain much from it. We're already inlining these on the kernel side, doing the same in user space would make sense.
/* | ||
* Atomically read variable. | ||
*/ | ||
#define atomic_load_char(p) (*(volatile uchar_t *)(p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be nice to finally have proper wrappers for atomic loads and stores! This is functionality I know would be helpful elsewhere with some kstats which are a bit dodgy in this regard (though pretty harmless).
Since it sounds like these are all well tested already on FreeBSD this seems fine to me. Though I wouldn't object to updating them to use the GCC atomic builtins for consistency in a similar fashion as was done for the other atomics.
@nabijaczleweli Would you take a look on atomics side, since you've touched them recently? |
I, uh, don't have much to say, really. I just replaced the libspl atomics with versions that were as strong as the pthreads implementation, and didn't touch anything else (even if I kinda wanted to, these 100% should just be statics), the strength of an atomic operation is to be determined by the call-site and ordering you want there, and these don't have that luxury (you could say they're therefore terminally broken, and you wouldn't be wrong). |
- Avoid atomic_add() when updating as_lower_bound/as_upper_bound. Previous code was excessively strong on 64bit systems while not strong enough on 32bit ones. Instead introduce and use real atomic_load() and atomic_store() operations, just an assignments on 64bit machines, but using proper atomics on 32bit ones to avoid torn reads/writes. - Reduce number of buckets on large systems. Extra buckets not as much improve add speed, as hurt reads. Unlike wmsum for aggsum reads are still important. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes openzfs#12145
- Avoid atomic_add() when updating as_lower_bound/as_upper_bound. Previous code was excessively strong on 64bit systems while not strong enough on 32bit ones. Instead introduce and use real atomic_load() and atomic_store() operations, just an assignments on 64bit machines, but using proper atomics on 32bit ones to avoid torn reads/writes. - Reduce number of buckets on large systems. Extra buckets not as much improve add speed, as hurt reads. Unlike wmsum for aggsum reads are still important. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes openzfs#12145
Avoid atomic_add() when updating as_lower_bound/as_upper_bound.
Previous code was excessively strong on 64bit systems while not
strong enough on 32bit ones. Instead introduce and use real
atomic_load() and atomic_store() operations, just an assignments
on 64bit machines, but using proper atomics on 32bit ones to avoid
torn reads/writes.
Reduce number of buckets on large systems. Extra buckets not as
much improve add speed, as hurt reads. Unlike wmsum for aggsum
reads are still important.
How Has This Been Tested?
On 40-thread FreeBSD system with heavy uncached 4KB ZVOLs reads profiler shows reduction of time spent in all aggsum_*() functions from 3.2% to 1.8% globally, and from 15.6% to 8.8% in ARC evict thread. The nested locks contention sometimes visible in profiles before has vanished.
Types of changes
Checklist:
Signed-off-by
.