Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crc32c benchmark is 2X faster for gcc builds vs clang #9891

Closed
mdcallag opened this issue Apr 22, 2022 · 7 comments
Closed

crc32c benchmark is 2X faster for gcc builds vs clang #9891

mdcallag opened this issue Apr 22, 2022 · 7 comments
Assignees
Labels
performance Issues related to performance that may or may not be bugs

Comments

@mdcallag
Copy link
Contributor

mdcallag commented Apr 22, 2022

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://groups.google.com/forum/#!forum/rocksdb or https://www.facebook.com/groups/rocksdb.dev

Expected behavior

I expect similar performance for this whether gcc or clang was used:
db_bench --benchmarks=crc32c

Actual behavior

Results for gcc builds are more than 2X faster than clang builds.

Steps to reproduce the behavior

Using: Ubuntu 20.04, gcc 9.4.0 and clang 10.0.0-4ubuntu1

Create 3 binaries:

  • gcc = make DEBUG_LEVEL=0 V=1 VERBOSE=1 db_bench
  • clang.0 = CC=/usr/bin/clang CXX=/usr/bin/clang++ make DEBUG_LEVEL=0 V=1 VERBOSE=1 db_bench
  • clang.1 = CC=/usr/bin/clang CXX=/usr/bin/clang++ USE_CLANG=1 make DEBUG_LEVEL=0 V=1 VERBOSE=1 db_bench

Then run ./db_bench --benchmarks=crc32c

On an Intel NUC I have at home (see here) the results are 19317 MB/s for gcc vs ~8900 MB/s for the clang builds. I can also repeat this on larger servers but won't share the details here.

A diff of the compiler command lines for gcc vs clang.0

1c1
< g++
---
> /usr/bin/clang++
36a37
> -Wshorten-64-to-32

And a diff for gcc vs clang.1

1c1
< g++
---
> /usr/bin/clang++
12a13
> -Wshift-sign-overflow
23d23
< -fno-builtin-memcmp
36a37
> -Wshorten-64-to-32

Compiler command lines for the crc32 code and note that -march=native -DHAVE_SSE42 -DHAVE_PCLMUL -DHAVE_AVX2 are used in all:

g++  -fno-rtti   -g -W -Wextra -Wall -Wsign-compare -Wshadow -Wunused-parameter -I. -I./include -std=c++17  -faligned-new -DHAVE_ALIGNED_NEW -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX  -DOS_LINUX -fno-builtin-memcmp -DROCKSDB_FALLOCATE_PRESENT -DSNAPPY -DGFLAGS=1 -DZLIB -DLZ4 -DZSTD -DNUMA -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_SCHED_GETCPU_PRESENT -DROCKSDB_AUXV_GETAUXVAL_PRESENT -march=native   -DHAVE_SSE42  -DHAVE_PCLMUL  -DHAVE_AVX2  -DHAVE_BMI  -DHAVE_LZCNT -DHAVE_UINT128_EXTENSION -DROCKSDB_SUPPORT_THREAD_LOCAL -DROCKSDB_JEMALLOC -DJEMALLOC_NO_DEMANGLE  -isystem third-party/gtest-1.8.1/fused-src -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -DNDEBUG -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-invalid-offsetof -c util/crc32c.cc -o util/crc32c.o

/usr/bin/clang++  -fno-rtti   -g -W -Wextra -Wall -Wsign-compare -Wshadow -Wunused-parameter -I. -I./include -std=c++17  -faligned-new -DHAVE_ALIGNED_NEW -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX  -DOS_LINUX -fno-builtin-memcmp -DROCKSDB_FALLOCATE_PRESENT -DSNAPPY -DGFLAGS=1 -DZLIB -DLZ4 -DZSTD -DNUMA -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_SCHED_GETCPU_PRESENT -DROCKSDB_AUXV_GETAUXVAL_PRESENT -Wshorten-64-to-32 -march=native   -DHAVE_SSE42  -DHAVE_PCLMUL  -DHAVE_AVX2  -DHAVE_BMI  -DHAVE_LZCNT -DHAVE_UINT128_EXTENSION -DROCKSDB_SUPPORT_THREAD_LOCAL -DROCKSDB_JEMALLOC -DJEMALLOC_NO_DEMANGLE  -isystem third-party/gtest-1.8.1/fused-src -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -DNDEBUG -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-invalid-offsetof -c util/crc32c.cc -o util/crc32c.o

/usr/bin/clang++  -fno-rtti   -g -W -Wextra -Wall -Wsign-compare -Wshadow -Wunused-parameter -Wshift-sign-overflow -I. -I./include -std=c++17  -faligned-new -DHAVE_ALIGNED_NEW -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX  -DOS_LINUX -DROCKSDB_FALLOCATE_PRESENT -DSNAPPY -DGFLAGS=1 -DZLIB -DLZ4 -DZSTD -DNUMA -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_SCHED_GETCPU_PRESENT -DROCKSDB_AUXV_GETAUXVAL_PRESENT -Wshorten-64-to-32 -march=native   -DHAVE_SSE42  -DHAVE_PCLMUL  -DHAVE_AVX2  -DHAVE_BMI  -DHAVE_LZCNT -DHAVE_UINT128_EXTENSION -DROCKSDB_SUPPORT_THREAD_LOCAL -DROCKSDB_JEMALLOC -DJEMALLOC_NO_DEMANGLE  -isystem third-party/gtest-1.8.1/fused-src -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -DNDEBUG -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-invalid-offsetof -c util/crc32c.cc -o util/crc32c.o
@mdcallag mdcallag added the performance Issues related to performance that may or may not be bugs label Apr 22, 2022
@mdcallag mdcallag self-assigned this Apr 22, 2022
@mdcallag
Copy link
Contributor Author

Results for all of the CPU-intensive microbenchmarks from db_bench

All numbers are MB/s

gcc     clang.0 clang.1 benchmark
19306   8923    8924    crc32c
5032    5021    5038    xxhash
9792    9806    9835    xxhash64
26766   29306   29306   xxh3
660     658     656     compress
5321    5247    5252    uncompress

@pdillinger
Copy link
Contributor

CC: @Cyan4973

@mdcallag
Copy link
Contributor Author

Bug filed for llvm
llvm/llvm-project#55153

@siying
Copy link
Contributor

siying commented May 23, 2022

Since it is not a RocksDB issue and llvm already made changes to fix it, should we close it?

@pdillinger
Copy link
Contributor

Probably worth verifying the LLVM fix in next release

@mdcallag
Copy link
Contributor Author

I don't mind owning it until a fix arrives in a clang release that we use. AFAIK the fix was pushed upstream recently.

@mdcallag
Copy link
Contributor Author

mdcallag commented Sep 2, 2022

fixed in upstream clang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues related to performance that may or may not be bugs
Projects
None yet
Development

No branches or pull requests

3 participants