Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib: Add support for building with Chromium's zlib implementation #5748

Merged
merged 2 commits into from
Jan 5, 2021

Conversation

lhchavez
Copy link
Contributor

@lhchavez lhchavez commented Dec 21, 2020

This change builds libgit2 using Chromium's zlib implementation by
invoking cmake with -DUSE_BUNDLED_ZLIB=Chromium,
which is ~10% faster than the bundled zlib for the core::zstream suite.

This version of zlib has some optimizations:

a) Decompression (Intel+ARM): inflate_fast, adler32, crc32, etc.
b) Compression (Intel): fill_window, longest_match, hash function, etc.

Due to the introduction of SIMD optimizations, and to get the maximum
performance out of this fork of zlib, this requires an x86_64 processor
with SSE4.2 and CLMUL (anything Westmere or later, ~2010). The Chromium
zlib implementation also supports ARM with NEON, but it has not been
enabled in this patch.

Performance

TL;DR: Running just ./libgit2_clar -score::zstream 100 times in a loop
took 0:56.30 before and 0:50.67 after (~10% reduction!).

The bundled and system zlib implementations on an Ubuntu Focal system
perform relatively similar (the bundled one is marginally better due to
the compiler being able to inline some functions), so only the bundled
and Chromium zlibs were compared.

For a more balanced comparison (to ensure that nothing regressed
overall), libgit2_clar under perf was also run, and the zlib-related
functions were compared.

Bundled

cmake \
  -DUSE_BUNDLED_ZLIB=ON \
  -DCMAKE_BUILD_TYPE="RelWithDebInfo" \
  -DCMAKE_C_FLAGS="-fPIC -fno-omit-frame-pointer" \
  -GNinja \
  ..
ninja
perf record --call-graph=dwarf ./libgit2_clar
perf report --children
Samples: 87K of event 'cycles', Event count (approx.): 75923450603
  Children      Self  Command       Shared Objec  Symbol
+    4.14%     0.01%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output_chunk
+    2.91%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output
+    0.69%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output (inlined)
     0.17%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_init
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_reset
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_eos
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_done
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_free (inlined)

Samples: 87K of event 'cycles', Event count (approx.): 75923450603
  Children      Self  Command       Shared Objec  Symbol
+    3.12%     0.01%  libgit2_clar  libgit2_clar  [.] deflate
+    2.65%     1.48%  libgit2_clar  libgit2_clar  [.] deflate_slow
+    1.60%     0.55%  libgit2_clar  libgit2_clar  [.] inflate
+    0.53%     0.00%  libgit2_clar  libgit2_clar  [.] write_deflate
     0.49%     0.36%  libgit2_clar  libgit2_clar  [.] inflate_fast
     0.46%     0.02%  libgit2_clar  libgit2_clar  [.] deflate_fast
     0.19%     0.19%  libgit2_clar  libgit2_clar  [.] inflate_table
     0.16%     0.01%  libgit2_clar  libgit2_clar  [.] inflateInit_
     0.15%     0.00%  libgit2_clar  libgit2_clar  [.] inflateInit2_ (inlined)
     0.10%     0.00%  libgit2_clar  libgit2_clar  [.] deflateInit_
     0.10%     0.00%  libgit2_clar  libgit2_clar  [.] deflateInit2_
     0.03%     0.00%  libgit2_clar  libgit2_clar  [.] deflateReset (inlined)
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] deflateReset
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] inflateEnd
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] deflateEnd
     0.01%     0.00%  libgit2_clar  libgit2_clar  [.] deflateResetKeep
     0.01%     0.01%  libgit2_clar  libgit2_clar  [.] inflateReset2
     0.01%     0.00%  libgit2_clar  libgit2_clar  [.] deflateReset (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateReset (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] deflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateResetKeep (inlined)

Chromium

cmake \
  -DUSE_BUNDLED_ZLIB=Chromium \
  -DCMAKE_BUILD_TYPE="RelWithDebInfo" \
  -DCMAKE_C_FLAGS="-fPIC -fno-omit-frame-pointer" \
  -GNinja \
  ..
ninja
perf record --call-graph=dwarf ./libgit2_clar
perf report --children
Samples: 97K of event 'cycles', Event count (approx.): 80862210917
  Children      Self  Command       Shared Objec  Symbol
+    3.31%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output_chunk
+    2.27%     0.01%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output
+    0.55%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output (inlined)
     0.18%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_init
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_reset
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_free (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_done
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_free

Samples: 97K of event 'cycles', Event count (approx.): 80862210917
  Children      Self  Command       Shared Objec  Symbol
+    2.55%     0.01%  libgit2_clar  libgit2_clar  [.] deflate
+    2.25%     1.41%  libgit2_clar  libgit2_clar  [.] deflate_slow
+    1.10%     0.52%  libgit2_clar  libgit2_clar  [.] inflate
     0.36%     0.00%  libgit2_clar  libgit2_clar  [.] write_deflate
     0.30%     0.03%  libgit2_clar  libgit2_clar  [.] deflate_fast
     0.28%     0.15%  libgit2_clar  libgit2_clar  [.] inflate_fast_chunk_
     0.19%     0.19%  libgit2_clar  libgit2_clar  [.] inflate_table
     0.17%     0.01%  libgit2_clar  libgit2_clar  [.] inflateInit_
     0.16%     0.00%  libgit2_clar  libgit2_clar  [.] inflateInit2_ (inlined)
     0.15%     0.00%  libgit2_clar  libgit2_clar  [.] deflateInit_
     0.15%     0.00%  libgit2_clar  libgit2_clar  [.] deflateInit2_
     0.11%     0.01%  libgit2_clar  libgit2_clar  [.] adler32_z
     0.09%     0.09%  libgit2_clar  libgit2_clar  [.] adler32_simd_
     0.05%     0.00%  libgit2_clar  libgit2_clar  [.] deflateReset (inlined)
     0.05%     0.00%  libgit2_clar  libgit2_clar  [.] deflate_read_buf
     0.03%     0.00%  libgit2_clar  libgit2_clar  [.] inflateEnd
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] deflateReset
     0.01%     0.00%  libgit2_clar  libgit2_clar  [.] deflateEnd
     0.01%     0.01%  libgit2_clar  libgit2_clar  [.] inflateReset2
     0.01%     0.00%  libgit2_clar  libgit2_clar  [.] inflateReset (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] adler32
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateResetKeep (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] deflateResetKeep
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] deflateStateCheck (inlined)

@ethomson
Copy link
Member

Wow, I'm very much torn by this. On one hand, speed! One the other, submodules. And two different bundled zlibs.

It feels like not bundling this and letting the caller deal with it would be the easiest solution for us. But it also looks like that's actually quite difficult because there's no (easy or out of the box) way to build the chromium zlib fork, is that right?

@lhchavez
Copy link
Contributor Author

Wow, I'm very much torn by this. On one hand, speed! One the other, submodules. And two different bundled zlibs.

It feels like not bundling this and letting the caller deal with it would be the easiest solution for us. But it also looks like that's actually quite difficult because there's no (easy or out of the box) way to build the chromium zlib fork, is that right?

i just found out about https://cmake.org/cmake/help/git-stage/module/ExternalProject.html . with that we can achieve the same goal without submodules!

This change builds libgit2 using Chromium's zlib implementation by
invoking cmake with `-DUSE_BUNDLED_ZLIB=ON -DUSE_CHROMIUM_ZLIB=ON`,
which is ~10% faster than the bundled zlib for the core::zstream suite.

This version of zlib has some optimizations:

a) Decompression (Intel+ARM): inflate_fast, adler32, crc32, etc.
b) Compression (Intel): fill_window, longest_match, hash function, etc.

Due to the introduction of SIMD optimizations, and to get the maximum
performance out of this fork of zlib, this requires an x86_64 processor
with SSE4.2 and CLMUL (anything Westmere or later, ~2010). The Chromium
zlib implementation also supports ARM with NEON, but it has not been
enabled in this patch.

Performance
===========

TL;DR: Running just `./libgit2_clar -score::zstream` 100 times in a loop
took 0:56.30 before and 0:50.67 after (~10% reduction!).

The bundled and system zlib implementations on an Ubuntu Focal system
perform relatively similar (the bundled one is marginally better due to
the compiler being able to inline some functions), so only the bundled
and Chromium zlibs were compared.

For a more balanced comparison (to ensure that nothing regressed
overall), `libgit2_clar` under `perf` was also run, and the zlib-related
functions were compared.

Bundled
-------

```shell
cmake \
  -DUSE_BUNDLED_ZLIB=ON \
  -DUSE_CHROMIUM_ZLIB=OFF \
  -DCMAKE_BUILD_TYPE="RelWithDebInfo" \
  -DCMAKE_C_FLAGS="-fPIC -fno-omit-frame-pointer" \
  -GNinja \
  ..
ninja
perf record --call-graph=dwarf ./libgit2_clar
perf report --children
```

```
Samples: 87K of event 'cycles', Event count (approx.): 75923450603
  Children      Self  Command       Shared Objec  Symbol
+    4.14%     0.01%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output_chunk
+    2.91%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output
+    0.69%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output (inlined)
     0.17%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_init
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_reset
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_eos
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_done
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_free (inlined)

Samples: 87K of event 'cycles', Event count (approx.): 75923450603
  Children      Self  Command       Shared Objec  Symbol
+    3.12%     0.01%  libgit2_clar  libgit2_clar  [.] deflate
+    2.65%     1.48%  libgit2_clar  libgit2_clar  [.] deflate_slow
+    1.60%     0.55%  libgit2_clar  libgit2_clar  [.] inflate
+    0.53%     0.00%  libgit2_clar  libgit2_clar  [.] write_deflate
     0.49%     0.36%  libgit2_clar  libgit2_clar  [.] inflate_fast
     0.46%     0.02%  libgit2_clar  libgit2_clar  [.] deflate_fast
     0.19%     0.19%  libgit2_clar  libgit2_clar  [.] inflate_table
     0.16%     0.01%  libgit2_clar  libgit2_clar  [.] inflateInit_
     0.15%     0.00%  libgit2_clar  libgit2_clar  [.] inflateInit2_ (inlined)
     0.10%     0.00%  libgit2_clar  libgit2_clar  [.] deflateInit_
     0.10%     0.00%  libgit2_clar  libgit2_clar  [.] deflateInit2_
     0.03%     0.00%  libgit2_clar  libgit2_clar  [.] deflateReset (inlined)
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] deflateReset
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] inflateEnd
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] deflateEnd
     0.01%     0.00%  libgit2_clar  libgit2_clar  [.] deflateResetKeep
     0.01%     0.01%  libgit2_clar  libgit2_clar  [.] inflateReset2
     0.01%     0.00%  libgit2_clar  libgit2_clar  [.] deflateReset (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateReset (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] deflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateResetKeep (inlined)
```

Chromium
--------

```shell
cmake \
  -DUSE_BUNDLED_ZLIB=ON \
  -DUSE_CHROMIUM_ZLIB=ON \
  -DCMAKE_BUILD_TYPE="RelWithDebInfo" \
  -DCMAKE_C_FLAGS="-fPIC -fno-omit-frame-pointer" \
  -GNinja \
  ..
ninja
perf record --call-graph=dwarf ./libgit2_clar
perf report --children
```

```
Samples: 97K of event 'cycles', Event count (approx.): 80862210917
  Children      Self  Command       Shared Objec  Symbol
+    3.31%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output_chunk
+    2.27%     0.01%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output
+    0.55%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_get_output (inlined)
     0.18%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_init
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_reset
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_free (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_done
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] git_zstream_free

Samples: 97K of event 'cycles', Event count (approx.): 80862210917
  Children      Self  Command       Shared Objec  Symbol
+    2.55%     0.01%  libgit2_clar  libgit2_clar  [.] deflate
+    2.25%     1.41%  libgit2_clar  libgit2_clar  [.] deflate_slow
+    1.10%     0.52%  libgit2_clar  libgit2_clar  [.] inflate
     0.36%     0.00%  libgit2_clar  libgit2_clar  [.] write_deflate
     0.30%     0.03%  libgit2_clar  libgit2_clar  [.] deflate_fast
     0.28%     0.15%  libgit2_clar  libgit2_clar  [.] inflate_fast_chunk_
     0.19%     0.19%  libgit2_clar  libgit2_clar  [.] inflate_table
     0.17%     0.01%  libgit2_clar  libgit2_clar  [.] inflateInit_
     0.16%     0.00%  libgit2_clar  libgit2_clar  [.] inflateInit2_ (inlined)
     0.15%     0.00%  libgit2_clar  libgit2_clar  [.] deflateInit_
     0.15%     0.00%  libgit2_clar  libgit2_clar  [.] deflateInit2_
     0.11%     0.01%  libgit2_clar  libgit2_clar  [.] adler32_z
     0.09%     0.09%  libgit2_clar  libgit2_clar  [.] adler32_simd_
     0.05%     0.00%  libgit2_clar  libgit2_clar  [.] deflateReset (inlined)
     0.05%     0.00%  libgit2_clar  libgit2_clar  [.] deflate_read_buf
     0.03%     0.00%  libgit2_clar  libgit2_clar  [.] inflateEnd
     0.02%     0.00%  libgit2_clar  libgit2_clar  [.] deflateReset
     0.01%     0.00%  libgit2_clar  libgit2_clar  [.] deflateEnd
     0.01%     0.01%  libgit2_clar  libgit2_clar  [.] inflateReset2
     0.01%     0.00%  libgit2_clar  libgit2_clar  [.] inflateReset (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] adler32
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateResetKeep (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] deflateResetKeep
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] inflateStateCheck (inlined)
     0.00%     0.00%  libgit2_clar  libgit2_clar  [.] deflateStateCheck (inlined)
```
@lhchavez
Copy link
Contributor Author

Wow, I'm very much torn by this. On one hand, speed! One the other, submodules. And two different bundled zlibs.

It feels like not bundling this and letting the caller deal with it would be the easiest solution for us. But it also looks like that's actually quite difficult because there's no (easy or out of the box) way to build the chromium zlib fork, is that right?

Done with 100% fewer submodules (although folks that opt into the Chromium flavor are now required to use CMake 3.11. there's always a trade-off u_u).

@carlosmn
Copy link
Member

carlosmn commented Jan 4, 2021

Could this be a backend selection like we do for TLS instead of adding more options, so you'd do like -DUSE_BUNDLED_ZLIB=chromium instead of turning on two knobs? "bundled" might be a misnomer then, but I dunno, seems like a less confusing set of options.

Now `USE_BUNDLED_ZLIB` can be set to the string `Chromium` to enable the
Chromium implementation of zlib.
@lhchavez
Copy link
Contributor Author

lhchavez commented Jan 4, 2021

Could this be a backend selection like we do for TLS instead of adding more options, so you'd do like -DUSE_BUNDLED_ZLIB=chromium instead of turning on two knobs? "bundled" might be a misnomer then, but I dunno, seems like a less confusing set of options.

sure thing!

@ethomson
Copy link
Member

ethomson commented Jan 5, 2021

Could this be a backend selection like we do for TLS instead of adding more options, so you'd do like -DUSE_BUNDLED_ZLIB=chromium instead of turning on two knobs? "bundled" might be a misnomer then, but I dunno, seems like a less confusing set of options.

Ooh, this is nice. Thanks @lhchavez for not adding submodules to our tree 😀 and @carlosmn for this clever idea.

@ethomson ethomson merged commit fa01786 into libgit2:master Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants