Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align benchmark::State to a cacheline. #1230

Merged
merged 7 commits into from
Aug 16, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 41 additions & 2 deletions include/benchmark/benchmark.h
Original file line number Diff line number Diff line change
Expand Up @@ -290,11 +290,50 @@ BENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
#define BENCHMARK_OVERRIDE
#endif

#if defined(__GNUC__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure if this outermost guard is needed, but then i don't know what set of per-cpu defines MSVC provides.

// Determine the cacheline size based on architecture
#if defined(__i386__) || defined(__x86_64__)
#define BENCHMARK_INTERNAL_CACHELINE_SIZE 64
#elif defined(__powerpc64__)
#define BENCHMARK_INTERNAL_CACHELINE_SIZE 128
#elif defined(__aarch64__)
#define BENCHMARK_INTERNAL_CACHELINE_SIZE 64
#elif defined(__arm__)
// Cache line sizes for ARM: These values are not strictly correct since
// cache line sizes depend on implementations, not architectures. There
// are even implementations with cache line sizes configurable at boot
// time.
#if defined(__ARM_ARCH_5T__)
#define BENCHMARK_INTERNAL_CACHELINE_SIZE 32
#elif defined(__ARM_ARCH_7A__)
#define BENCHMARK_INTERNAL_CACHELINE_SIZE 64
#endif // ARM_ARCH
#endif // arches
#endif // __GNUC__

#ifndef BENCHMARK_INTERNAL_CACHELINE_SIZE
// A reasonable default guess. Note that overestimates tend to waste more
// space, while underestimates tend to waste more time.
#define BENCHMARK_INTERNAL_CACHELINE_SIZE 64
#endif

#if defined(__GNUC__)
// Indicates that the declared object be cache aligned using
// `BENCHMARK_INTERNAL_CACHELINE_SIZE` (see above).
#define BENCHMARK_INTERNAL_CACHELINE_ALIGNED \
__attribute__((aligned(BENCHMARK_INTERNAL_CACHELINE_SIZE)))
#elif defined(_MSC_VER)
#define BENCHMARK_INTERNAL_CACHELINE_ALIGNED \
__declspec(align(BENCHMARK_INTERNAL_CACHELINE_SIZE))
#else
#define BENCHMARK_INTERNAL_CACHELINE_ALIGNED
#endif

#if defined(_MSC_VER)
#pragma warning(push)
// C4251: <symbol> needs to have dll-interface to be used by clients of class
#pragma warning(disable : 4251)
#endif
#endif // _MSC_VER_

namespace benchmark {
class BenchmarkReporter;
Expand Down Expand Up @@ -759,7 +798,7 @@ enum Skipped

// State is passed to a running Benchmark and contains state for the
// benchmark to use.
class BENCHMARK_EXPORT State {
class BENCHMARK_EXPORT BENCHMARK_INTERNAL_CACHELINE_ALIGNED State {
public:
struct StateIterator;
friend struct StateIterator;
Expand Down