-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aligned_accessor example #176
Conversation
402723b
to
191d549
Compare
Fixes for Clang, MSVC, and Intel here: https://godbolt.org/z/vhGz3zcYn I still need to incorporate these into the PR. |
89fd2f7
to
5c66f08
Compare
@crtrott @dalg24 @nliber I updated the PR to fix Clang, MSVC, and Intel builds. Here is the corresponding Compiler Explorer link: https://godbolt.org/z/47dEf59rs |
bc54039
to
1b7b137
Compare
Test with C++14, C++17, and C++20 builds: https://godbolt.org/z/raYsGTq3P |
374ba93
to
d2e8a63
Compare
Latest update (just pushed) includes OpenMP fixes: https://godbolt.org/z/PbMWbKMj7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few drive by comments
ec9872d
to
1c939a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good
Only question I have is do you care to not cast within the outer repetition loop?
Because if that is acceptable you would be able to get rid of a lot of boiler plate code
@dalg24 wrote:
The "casts" in the various This is an example, so the small amount of repetition has educational value. Users can copy and paste as a start to adding their own benchmarks. |
6dbb53f
to
dd60489
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to change the non-aligned element returning & to element 0 too, I am also wondering if we can reduce code duplication here. For example have the same benchmark function templated on a tag and the casted to type:
template<class ArrayType, class constArrayType, class ElementType, class BenchmarkTag, class ... ConstructorArgs>
auto benchmark_add_1d(const BenchMarkTag& tag, const std::size_t num_trials,
const index_type n,
const ElementType x[],
const ElementType y[],
ElementType z[],
ConstructorArgs ... cargs)
{
TICK();
constArrayType x2{x, cargs};
constArrayType y2{x, cargs};
ArrayType z2{x, cargs};
for (std::size_t trial = 0; trial < num_trials; ++trial) {
add(tag, x2, y2, z2);
}
return TOCK();
}
// | ||
// This exists so that we can compare performance | ||
// of aligned and unaligned access. | ||
ElementType* data_unaligned() const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is what we should do here instead of returning also address to element 0. After all you are now measuring something different, like the number of cachelines/pages touched by each thread might be different. I'd rather like to see that the attribute in and of itself does something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crtrott My concern there was that the compiler has complete code visibility, so it could deduce that the allocation is aligned. For the benchmark, I wanted to be able to give it an allocation that I know is aligned to no more than sizeof(ElementType)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crtrott I like the tag idea for the benchmarks, though! I was a bit worried about possibly needing to do type conversion of x, y, and z on every invocation of the inner loop, but I think I could restructure the code to avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crtrott I tried out the tag idea, but I don't think it saves much code. Each benchmark invocation still needs to name the const and nonconst array types and the tag types explicitly.
Perhaps y'all are missing the point a bit -- this is an example. It's code for users to read, that teaches them how to write a custom accessor. It's something more like a tutorial. Reducing code duplication isn't really the point here; the point is simple, readable code that users could easily copy and paste.
aligned_accessor uses compiler identification macros to pick from various, possibly nonstandard ways to mark a pointer (its value, and/or its type) as having a specific alignment. "Aligned" benchmarks use overaligned allocation and the aforementioned way to mark a pointer as aligned. "Unaligned' benchmarks access memory aligned only to the element type, by adding 1 (sizeof(ElementType) bytes) to the array start. If building with OpenMP (i.e., if _OPENMP is defined), more benchmarks will build and run. All of them use "#pragma omp simd", and some add an "aligned" clause. The example builds with C++14, C++17, and C++20. I've tested it with the following compilers and options: * x86-64 GCC 12.1: -std=c++2b -O3 -Wall * x86-64 clang 14.0.0: -std=c++14 -O3 -Wall * x86-64 icc 2021.5.0: -std=c++17 -O3 -qopenmp * x86 msvc v19.32: /std:c++latest * x86-64 GCC 12.1: -std=c++14 -Wall * x86-64 GCC 12.1: -std=c++20 -Wall * x86-64 GCC 9.4.0: -std=c++17
1. Remove unused parameter of valid_byte_alignment 2. Make valid_byte_alignment use C++20 function has_single_bit if available 3. Make aligned_pointer_t use valid_byte_alignment to check alignment at compile time. valid_byte_alignment was unused before. 4. Remove unnecessary parameter from add_aligned_raw_1d. 5. Remove unneeded casts and assignments.
dd60489
to
4e4412c
Compare
Don't use a single allocation for aligned and unaligned access any more. Instead, separate those allocations, and encapsulate them more. Add comments to make it clear why we want a "deliberately unaligned array" for benchmarks and for studying code generation. Also, make sure that the elements of the arrays are filled before we start computing with them.
@crtrott wrote:
I have addressed this in my latest revision. |
aligned_accessor
uses compiler identification macros to pick from various, possibly nonstandard ways to mark a pointer (its value, and/or its type) as having a specific alignment."Aligned" benchmarks use overaligned allocation and the aforementioned way to mark a pointer as aligned.
"Unaligned' benchmarks access memory aligned only to the element type, by adding 1 (
sizeof(ElementType)
bytes) to the array start.