Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aligned_accessor example #176

Merged
merged 4 commits into from
Aug 30, 2022

Conversation

mhoemmen
Copy link
Contributor

  1. aligned_accessor uses compiler identification macros to pick from various, possibly nonstandard ways to mark a pointer (its value, and/or its type) as having a specific alignment.

  2. "Aligned" benchmarks use overaligned allocation and the aforementioned way to mark a pointer as aligned.

  3. "Unaligned' benchmarks access memory aligned only to the element type, by adding 1 (sizeof(ElementType) bytes) to the array start.

@mhoemmen mhoemmen force-pushed the Add-aligned-accessor-example branch from 402723b to 191d549 Compare August 18, 2022 16:00
@mhoemmen
Copy link
Contributor Author

Fixes for Clang, MSVC, and Intel here: https://godbolt.org/z/vhGz3zcYn

I still need to incorporate these into the PR.

@mhoemmen mhoemmen force-pushed the Add-aligned-accessor-example branch from 89fd2f7 to 5c66f08 Compare August 18, 2022 20:25
@mhoemmen
Copy link
Contributor Author

@crtrott @dalg24 @nliber I updated the PR to fix Clang, MSVC, and Intel builds. Here is the corresponding Compiler Explorer link: https://godbolt.org/z/47dEf59rs

@mhoemmen
Copy link
Contributor Author

Test with C++14, C++17, and C++20 builds: https://godbolt.org/z/raYsGTq3P

@mhoemmen mhoemmen force-pushed the Add-aligned-accessor-example branch 2 times, most recently from 374ba93 to d2e8a63 Compare August 18, 2022 22:52
@mhoemmen
Copy link
Contributor Author

Latest update (just pushed) includes OpenMP fixes: https://godbolt.org/z/PbMWbKMj7

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few drive by comments

examples/aligned_accessor/aligned_accessor.cpp Outdated Show resolved Hide resolved
examples/aligned_accessor/aligned_accessor.cpp Outdated Show resolved Hide resolved
examples/aligned_accessor/aligned_accessor.cpp Outdated Show resolved Hide resolved
@mhoemmen mhoemmen force-pushed the Add-aligned-accessor-example branch from ec9872d to 1c939a6 Compare August 19, 2022 18:02
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good
Only question I have is do you care to not cast within the outer repetition loop?
Because if that is acceptable you would be able to get rid of a lot of boiler plate code

examples/aligned_accessor/aligned_accessor.cpp Outdated Show resolved Hide resolved
examples/aligned_accessor/aligned_accessor.cpp Outdated Show resolved Hide resolved
@mhoemmen
Copy link
Contributor Author

mhoemmen commented Aug 19, 2022

@dalg24 wrote:

Only question I have is do you care to not cast within the outer repetition loop?

The "casts" in the various benchmark_ functions don't all have the same form. Some of them actually create an mdspan from a pointer. Others call bless on the pointer, which both casts and calls assume_aligned. Performing the casts or blessings explicitly, outside the outer loop, lets each add_* function be as simple as possible. This hopefully removes any unnecessary barriers to the compiler optimizing it.

This is an example, so the small amount of repetition has educational value. Users can copy and paste as a start to adding their own benchmarks.

@mhoemmen mhoemmen force-pushed the Add-aligned-accessor-example branch from 6dbb53f to dd60489 Compare August 22, 2022 15:26
Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to change the non-aligned element returning & to element 0 too, I am also wondering if we can reduce code duplication here. For example have the same benchmark function templated on a tag and the casted to type:

template<class ArrayType, class constArrayType, class ElementType,  class BenchmarkTag, class ... ConstructorArgs>
auto benchmark_add_1d(const BenchMarkTag& tag, const std::size_t num_trials,
				      const index_type n,
				      const ElementType x[],
				      const ElementType y[],
				      ElementType z[],
                                      ConstructorArgs ... cargs)
{
  TICK();
  constArrayType x2{x, cargs};
  constArrayType y2{x, cargs};
  ArrayType z2{x, cargs};
  for (std::size_t trial = 0; trial < num_trials; ++trial) {
    add(tag, x2, y2, z2);
  }
  return TOCK();
}

//
// This exists so that we can compare performance
// of aligned and unaligned access.
ElementType* data_unaligned() const noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is what we should do here instead of returning also address to element 0. After all you are now measuring something different, like the number of cachelines/pages touched by each thread might be different. I'd rather like to see that the attribute in and of itself does something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crtrott My concern there was that the compiler has complete code visibility, so it could deduce that the allocation is aligned. For the benchmark, I wanted to be able to give it an allocation that I know is aligned to no more than sizeof(ElementType).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crtrott I like the tag idea for the benchmarks, though! I was a bit worried about possibly needing to do type conversion of x, y, and z on every invocation of the inner loop, but I think I could restructure the code to avoid that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crtrott I tried out the tag idea, but I don't think it saves much code. Each benchmark invocation still needs to name the const and nonconst array types and the tag types explicitly.

Perhaps y'all are missing the point a bit -- this is an example. It's code for users to read, that teaches them how to write a custom accessor. It's something more like a tutorial. Reducing code duplication isn't really the point here; the point is simple, readable code that users could easily copy and paste.

aligned_accessor uses compiler identification macros to pick
from various, possibly nonstandard ways to mark a pointer
(its value, and/or its type) as having a specific alignment.

"Aligned" benchmarks use overaligned allocation
and the aforementioned way to mark a pointer as aligned.

"Unaligned' benchmarks access memory aligned only
to the element type, by adding 1 (sizeof(ElementType) bytes)
to the array start.

If building with OpenMP (i.e., if _OPENMP is defined),
more benchmarks will build and run.  All of them use
"#pragma omp simd", and some add an "aligned" clause.

The example builds with C++14, C++17, and C++20.
I've tested it with the following compilers and options:

* x86-64 GCC 12.1: -std=c++2b -O3 -Wall
* x86-64 clang 14.0.0: -std=c++14 -O3 -Wall
* x86-64 icc 2021.5.0: -std=c++17 -O3 -qopenmp
* x86 msvc v19.32: /std:c++latest
* x86-64 GCC 12.1: -std=c++14 -Wall
* x86-64 GCC 12.1: -std=c++20 -Wall
* x86-64 GCC 9.4.0: -std=c++17
1. Remove unused parameter of valid_byte_alignment

2. Make valid_byte_alignment use C++20 function has_single_bit
   if available

3. Make aligned_pointer_t use valid_byte_alignment to check alignment
   at compile time.  valid_byte_alignment was unused before.

4. Remove unnecessary parameter from add_aligned_raw_1d.

5. Remove unneeded casts and assignments.
@mhoemmen mhoemmen force-pushed the Add-aligned-accessor-example branch from dd60489 to 4e4412c Compare August 26, 2022 15:48
Don't use a single allocation for aligned and unaligned access
any more.  Instead, separate those allocations, and encapsulate
them more.  Add comments to make it clear why we want a
"deliberately unaligned array" for benchmarks and for studying
code generation.  Also, make sure that the elements of the arrays
are filled before we start computing with them.
@mhoemmen
Copy link
Contributor Author

@crtrott wrote:

I think we need to change the non-aligned element returning & to element 0 too....

I have addressed this in my latest revision.

@crtrott crtrott merged commit 2450c1a into kokkos:stable Aug 30, 2022
@mhoemmen mhoemmen deleted the Add-aligned-accessor-example branch August 30, 2022 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants