-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-3128: [C++] Support system shared zlib #2483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'll work on changing .deb/.rpm to use this features after this change is merged. |
Codecov Report
@@ Coverage Diff @@
## master #2483 +/- ##
==========================================
- Coverage 87.69% 87.67% -0.02%
==========================================
Files 372 372
Lines 57915 57915
==========================================
- Hits 50788 50778 -10
- Misses 7057 7063 +6
- Partials 70 74 +4
Continue to review full report at Codecov.
|
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments below.
cpp/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring needs fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think using the system zlib should be the default except if not found. There's no reason to vendor such a stable library (except on Windows), IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... Sorry for my low quality pull request...
I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the system zlib should be the default except if not found.
I want to do so. But I keep the current behavior for backward compatibility.
If nobody objects it, I'll do so.
cpp/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what "where relevant" means? Also, do we need an option for this? Can't we always use a shared zlib? (the zlib ABI is probably very stable...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @pitrou comment above, I would be OK with making the default "ON", though it will require some changes to our packaging scripts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll remove this option and fix our packaging scripts.
It'll make our build system simpler.
cpp/cmake_modules/FindZLIB.cmake
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments explaining why all this exists? Write-only configuration code is not nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
I'll add a comment that zlib uses zlib.lib on Windows.
FYI: https://github.com/madler/zlib/blob/master/win32/README-WIN32.txt#L50
cpp/cmake_modules/FindZLIB.cmake
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses pkg-config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make things clearer, perhaps we could use a different prefix for the pkg-config results, for example PKG_ZLIB. That would make the below more readable, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. I'll do.
cpp/cmake_modules/FindZLIB.cmake
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ZLIB_LIBS lists directories? Uh :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We define the following API in the header of this file...
# ZLIB_LIBS, directory containing zlib libraries
I think that nobody uses ZLIB_LIBS. Can we remove ZLIB_LIBS?
cpp/cmake_modules/FindZLIB.cmake
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PARQUET_MINIMAL_DEPENDENCY isn't used anymore above, it seems, so perhaps we needn't test it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be used when we merge perquet-cpp into Arrow repository.
But "found the ZLIB shared library" message will be enough. We'll not need "Found the ZLIB header" message.
I'll remove this condition.
@xhochy Do you have any concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 (no concerns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
cpp/cmake_modules/FindZLIB.cmake
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a warning message if the zlib isn't found? Especially as we vendor it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be done in ThirdpartyToolchain.cmake not in FindZLIB.cmake.
We don't care whether vendored or not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, we should make it an error to set both ZLIB_HOME and ARROW_ZLIB_VENDORED. It's already annoying enough to debug configuration issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed ARROW_ZLIB_VENDORED to use system zlib by default and vendored zlib as fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, we should make it an error if ARROW_ZLIB_VENDORED is false and a third-party zlib couldn't be found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this sounds reasonable since building needs the development headers, which aren't always installed (as opposed to the library itself, which is pretty much everywhere).
|
We need more works for CI. |
00e4e34 to
bc32ff3
Compare
|
Now, CI is green again. Can someone review this? Summary:
|
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we deploy Python packages on systems which may not have libz available in $PATH / %PATH%, we either need to
- statically link in those packages or
- bundle libz.so/dll
In case case of conda packages, we should definitely switch to dynamic linking. So that will require changes in https://github.com/conda-forge/arrow-cpp-feedstock/blob/master/recipe/meta.yaml#L40
The question really is the Python wheels. Is zlib shipped with CPython? I guess we should experiment to find out
cpp/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need a flag to link statically since we deploy Python and C++ packages on platforms which may not have libz in the dynamic loader path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this static linking required on Windows? OSX and Linux don't need it in wheels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked at a default install of Python 3.6 on Windows 64 and it doesn't appear to include zlib. I tried doing ctypes.CDLL('z.dll') and CDLL('libz.dll') and neither work. So either we need to bundle the shared library in wheels or statically link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
I'll try to bundle zlib as shared library for wheels.
If it's succeeded, I'll remove static link support for ZLIB_ROOT case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need changing per comment above
cpp/src/arrow/util/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be exposed as a transitive dependency of arrow_static/arrow_shared? I think there's some way to get CMake to automatically include transitive link dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it too but I couldn't...
I try again with ADD_ARROW_TEST(compression-test EXTRA_LINK_LIBS zlib), it works well. Oh.
|
zlib is not part of the |
I had missed that. It's a bit surprising, I wonder if there are GNU/Linux systems without even zlib installed. |
|
I've asked a question about zlib here: https://mail.python.org/mm3/archives/list/distutils-sig@python.org/thread/ZZG6GL3XTBLBJXSITYHEXMFKN43EREB7/ |
e4ec03c to
ae7e277
Compare
wheels bundle libz.so, libz.dylib or zlib.dll. TODO: manylinux1
298a870 to
62858dc
Compare
|
CI is green again. Can someone review this again? Summary:
|
|
Thanks @kou for working on this! I will review today or tomorrow. I added your summary to the PR description so it will end up in the commit message |
ci/travis_script_python.sh
Outdated
| conda create -y -q -p $CONDA_ENV_DIR python=$PYTHON_VERSION cmake curl | ||
| conda activate $CONDA_ENV_DIR | ||
|
|
||
| # We should zlib in the target Python directory to avoid loading wrong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a word missing: "We should zlib".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I've fixed this.
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this. A couple of last questions before merging. I was confused in particular by the amount of code involved with bundling zlib in the python/CMakeLists.txt, but I haven't looked closely enough to see why it's needed.
cpp/src/arrow/util/CMakeLists.txt
Outdated
| ADD_ARROW_TEST(bit-util-test) | ||
| ADD_ARROW_TEST(checked-cast-test) | ||
| ADD_ARROW_TEST(compression-test) | ||
| ADD_ARROW_TEST(compression-test EXTRA_LINK_LIBS zlib) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should transitively linked as a result of linking to libarrow.a. We should make a follow up to make sure that CMake is set to include transitive dependencies when statically linking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add zlib to ARROW_STATIC_LINK_LIBS even when we use zlib as shared library, we can remove this: db17bdb
I'm not sure that it's a good manner in this project...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| endfunction() | ||
|
|
||
| function(bundle_zlib zlib_home) | ||
| if (MSVC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can FindZLIB.cmake not be used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use FindZLIB.cmake here.
It finds zlib.lib to link zlib as shared library. But we need zlib.dll here.
Anyway, I've removed unused codes by 3db0830
The removed codes are for non Windows. I kept them because we may use them later. But we can remove unused codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, no problem. Thank you
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Not sure what's up with the Travis CI failure but it doesn't appear related to this
|
Thanks. |
Debian package recommends not static linking zlib for security reason. If we use zlib as a static library, Debian package lint reports the following error:
embedded-library error detail: https://lintian.debian.org/tags/embedded-library.html
zlib detection by pkg-config is also added because system zlib provides zlib.pc.
Detect order (default):
Detect order with ZLIB_HOME environment variable:
This is not backward compatible. We used vendored zlib by default.
Summary:
pkg-configor from the default system pathZLIB_HOMEis specifiedZLIB_HOMEis specified, system zlib isn't searched bypkg-configZLIB_HOMEand no system zlib)ZLIB_HOMEzlib as shared libraryARROW_BUILD_TOOLCHAINon Travis CI.ARROW_BUILD_TOOLCHAINcauses wronglibpython.dylibload on macOS. See comment inci/travis_script_python.shfor details.after_failurechange in.travis.ymlis a by-product of debugging crash on macOS.ZLIB_LIBShas been removed.