Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

help compiler folding for divideFloor/moduloFloor #4512

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

grg
Copy link
Contributor

@grg grg commented Mar 7, 2024

Add additional test to assist compiler folding to perform these replacements when the divisor is a power-of-two compile-time constant:

  • divideFloor -> right-shift
  • moduloFloor -> bitwise and

This is in response to this PR comment from @ChrisDodd:

Does this get properly folded to a shift when divisor is a constant power of 2? Similarly, moduloFloor should get folded to an and

The proposed solution adds a test that helps the compiler with optimization for a power-of-two compile-time constant:

if (__builtin_constant_p(divisor) && isPowerOfTwo(divisor))

__builtin_constant_p() and __builtin_clz are not supported by all compilers and so are protected by #if guards for gcc/clang.

This Compiler Explorer test shows the impact of the extra test on both clang 15.0.0 and gcc 12.3. (These versions match those available on my test system.) The code contains multiple divideFloor/moduloFloor functions:

  • divisor = 4 without optimization
  • divisor = 4 with optimization above
  • divisor = 5 without optimization
  • divisor = 5 with optimization above

@grg grg force-pushed the grgibb/bitrange_compiler_optimization branch 2 times, most recently from cf13472 to 6d444bd Compare March 7, 2024 23:30
@grg grg marked this pull request as ready for review March 8, 2024 00:17
@grg grg requested review from ChrisDodd, asl and vlstill March 8, 2024 00:17
Copy link
Contributor

@asl asl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments about possible simplifications and C++20-related usage

lib/bitrange.h Outdated
*
* This function is declared constexpr to allow it to be used in tests to optimize code generation.
*/
constexpr bool isPowerOfTwo(int value) { return value && !(value & (value - 1)); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intended behavior wrt negative numbers? This currently returns false for e.g. -2, but it takes int as an argument. If negatives are not expected, I'd go with explicit unsigned here, or maybe even something like this?

template <typename T, typename = std::enable_if_t<std::is_unsigned_v<T>>>
[[nodiscard]] constexpr inline bool isPowerOfTwo(T value) noexcept {
 return (value != 0) && ((value & (value - 1)) == 0);
}

Copy link
Contributor Author

@grg grg Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For divideFloor/moduloFloor, I think it makes sense to restrict the divisor to unsigned numbers. The dividend does need to be signed, but I don't call isPowerOfTwo or log2ForPowerOfTwo on the dividend.

The test code at least has a few tests for negative divisors, so I can't restrict divideFloor/moduloFloor to unsigned.

I've switched to absl::has_single_bit, which has forced me to do an explicit type cast when calling that function.

lib/bitrange.h Outdated
*/
constexpr bool isPowerOfTwo(int value) { return value && !(value & (value - 1)); }

constexpr int log2ForPowerOfTwo(int value) { return sizeof(value) * 8 - __builtin_clz(value) - 1; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explicitly assumes that sizeof(value) == 4 as __builtin_clz accepts unsigned argument. Also for the clarity I'd change the argument to unsigned.

In fact, we can use abseil implementations of C++20 <bit> routines: absl::countl_zero and their friends. With FIXME to replace the implementation with standard ones.

Instead of isPowerOfTwo above we could use absl::has_single_bit.

Even if we'd use our own implementations I'd use C++20 names to ensure we can just drop them in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I'd known about the Abseil alternatives earlier: would have saved my attempting to roll my own equivalents to the STL functions that exist in C++20 and beyond.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry. I would probably need to add some short summary to docs of what is available....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do you know of a more portable alternative to __builtin_constant_p?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do you know of a more portable alternative to __builtin_constant_p?

Unfortunately, not. I believe the question was raised few times, but there is no direct alternative. Maybe if consteval from C++23 is the closest alternative.

lib/bitrange.h Outdated Show resolved Hide resolved
@grg grg requested a review from asl March 8, 2024 23:21
@grg
Copy link
Contributor Author

grg commented Mar 8, 2024

@asl Do you happen to know what I need to adjust in cmake to get it to build p4ctoolkit (lib directory) correctly? It currently fails with:

In file included from /home/runner/work/p4c/p4c/lib/bitrange.cpp:17,
                 from /home/runner/work/p4c/p4c/build/lib/CMakeFiles/p4ctoolkit.dir/Unity/unity_0_cxx.cxx:10:
/home/runner/work/p4c/p4c/lib/bitrange.h:26:10: fatal error: absl/numeric/bits.h: No such file or directory
   26 | #include "absl/numeric/bits.h"
      |          ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [lib/CMakeFiles/p4ctoolkit.dir/build.make:104: lib/CMakeFiles/p4ctoolkit.dir/Unity/unity_0_cxx.cxx.o] Error 1

@asl
Copy link
Contributor

asl commented Mar 9, 2024

@asl Do you happen to know what I need to adjust in cmake to get it to build p4ctoolkit (lib directory) correctly? It currently fails with:

I think you'd need to do target_link_library(lib absl::bits) to have includes / paths propagated properly.

@grg grg force-pushed the grgibb/bitrange_compiler_optimization branch 4 times, most recently from f4927e4 to 67ac50b Compare March 9, 2024 02:59
lib/bitrange.h Outdated
*
* This function is declared constexpr to allow it to be used in tests to optimize code generation.
*/
constexpr unsigned ffs(unsigned value) { return sizeof(value) * 8 - absl::countl_zero(value) - 1; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asl I think this comment was on this line (I force-pushed a few times and it seemed to result in losing the comment in github):

Isn't it countr_zero essentially?

It is as long as value is a power of 2 🙂
I'm only using it in the case where value is a power of 2, so maybe I will switch. But I'll wait until I've got my build issues sorted out.

@@ -93,3 +93,4 @@ add_library(p4ctoolkit STATIC ${LIBP4CTOOLKIT_SRCS})

# Disable libcall (realloc, malloc) optimizations which may cause infinite loops.
set_target_properties(p4ctoolkit PROPERTIES COMPILE_FLAGS -fno-builtin)
target_link_libraries(p4ctoolkit absl::bits)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asl Another comment I lost due to force-pushing:

It should not be private as bits are used in the header. If you'd make it PUBLIC, then you'd not need to modify every user (like gtest below)

I copied and pasted from one of the other places where we were depending upon an Abseil library (before changing it to bits). That usage had the PRIVATE, which is why I copied it here originally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, right. In few other places we only depend inside .cpp, so PRIVATE is a way to reduce visibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this seems to work, is there a "more correct" solution here?

My concern is that I'm trying to fix the problem of the .h file not being present at compile time. Doesn't target_link_libraries only impact linking: couldn't cmake still try the cpp compilation before the header files are available?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't target_link_libraries only impact linking:

Despite the name – it impacts compile time as well. As it pulls the interface include paths and other properties. So, this is the correct way of doing things. See https://cmake.org/cmake/help/latest/manual/cmake-buildsystem.7.html#target-usage-requirements:

Generally, a dependency should be specified in a use of target_link_libraries() with the PRIVATE keyword if it is used by only the implementation of a library, and not in the header files. If a dependency is additionally used in the header files of a library (e.g. for class inheritance), then it should be specified as a PUBLIC dependency. A dependency which is not used by the implementation of a library, but only by its headers should be specified as an INTERFACE dependency. The target_link_libraries() command may be invoked with multiple uses of each keyword:

@grg
Copy link
Contributor Author

grg commented Mar 9, 2024

@asl Do you happen to know what I need to adjust in cmake to get it to build p4ctoolkit (lib directory) correctly? It currently fails with:

I think you'd need to do target_link_library(lib absl::bits) to have includes / paths propagated properly.

That seems to do the trick.

@grg grg force-pushed the grgibb/bitrange_compiler_optimization branch from 46ecd9c to cf1fbd9 Compare March 12, 2024 15:51
@grg
Copy link
Contributor Author

grg commented Mar 12, 2024

@asl I think I've got it down to a minimal set of changes using Abseil functionality as much as possible.

Copy link
Contributor

@asl asl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@grg grg force-pushed the grgibb/bitrange_compiler_optimization branch from cf1fbd9 to fbe4fb8 Compare March 12, 2024 18:35
Add additional test to assist compiler folding to perform these
replacements when the divisor is a power-of-two compile-time constant:
 - divideFloor -> right-shift
 - moduloFloor -> bitwise and

---------

Co-authored-by: Anton Korobeynikov <anton@korobeynikov.info>
@grg grg force-pushed the grgibb/bitrange_compiler_optimization branch from fbe4fb8 to 72b535e Compare March 12, 2024 18:36
@grg grg added this pull request to the merge queue Mar 12, 2024
Merged via the queue into main with commit d5129a4 Mar 12, 2024
17 checks passed
@grg grg deleted the grgibb/bitrange_compiler_optimization branch March 12, 2024 22:15
@fruffy fruffy added the core Topics concerning the core segments of the compiler (frontend, midend, parser) label Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Topics concerning the core segments of the compiler (frontend, midend, parser)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants