Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] ARROW-1669: [C++] Add Abseil to toolchain #2501

Closed
wants to merge 1 commit into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Aug 30, 2018

For now, this is blocked by manylinux1 which specifies a too old glibc:

_deps/abseil-src/absl/base/policy_checks.h:91:2: error: #error "Minimum required version of glibc is 2.12."
 #error "Minimum required version of glibc is 2.12."
  ^

(example: https://travis-ci.org/apache/arrow/jobs/422680091)

Note: glibc 2.12 is exactly the minimum version specified by manylinux2010. See ARROW-2461

A caveat: we need to remove the ARROW_USE_CLCACHE parameter because setting CMAKE_CXX_COMPILER is poorly supported (it can result in infinite loops). Instead the user has to pass -DCMAKE_CXX_COMPILER=clcache to the cmake invocation.

@wesm
Copy link
Member

wesm commented Aug 30, 2018

What do you think about using git submodules for this? It would require writing "git submodules update --init"

@pitrou
Copy link
Member Author

pitrou commented Aug 30, 2018

I don't really like git submodules, but I agree that might be better than importing all source files in the git history. Another possibility is a tarball as we do for jemalloc (but I'm not sure how to combine that with cmake's add_subdirectory).

Edit: apparently we could use FetchContent.

@pitrou pitrou force-pushed the ARROW-1669-abseil branch from 05e6753 to adbff76 Compare August 30, 2018 17:47
@pitrou
Copy link
Member Author

pitrou commented Aug 30, 2018

Apparently FetchContent is too recent (it seems to have appeared around CMake 3.11)... I'm not sure it would be ok to mandate such a recent version.

@wesm
Copy link
Member

wesm commented Aug 30, 2018

I see, hmm. I'd be in favor of biting the bullet on git submodules; relatively easy to maintain and if we want to add more submodules in the future, the incremental cost to development workflow will not be high. I just interacted with a project using this approach (gRPC https://github.com/grpc/grpc/tree/master/third_party) and it seemed fine to me.

@pitrou
Copy link
Member Author

pitrou commented Aug 30, 2018

Yes, git submodules sound like the least painful approach for now.

@pitrou pitrou force-pushed the ARROW-1669-abseil branch 7 times, most recently from 63585d0 to 56c0319 Compare September 3, 2018 17:48
@pitrou
Copy link
Member Author

pitrou commented Sep 3, 2018

I've switched to a git submodules approach and it seems to work fine.

However, I've now discovered a larger problem: we're willing to use some Abseil functionality (such as the useful string_view) in Arrow headers (such as BinaryArray or BinaryBuilder). However, the Abseil headers aren't installed by the Arrow build procedure, so building anything else, such as parquet-cpp, currently fails. What should we do?

@wesm
Copy link
Member

wesm commented Sep 3, 2018

Indeed. So far we have made a policy of not exposing any transitive dependencies in our public API.

It seems like Abseil isn't so difficult to install; we could bundle the headers and libraries with our Python packages (or have them as a runtime dependency in conda-forge), but in the case of Linux packages it's unclear we'd want to bundle Abseil in an arrow-dev package. I think we need feedback from some others (e.g. @kou)

@pitrou
Copy link
Member Author

pitrou commented Sep 3, 2018

I don't think installing the headers is technically difficult. The main issue is potential conflicts with versions of Abseil installed by other projects.

As for the libraries, we should link statically (as is done in this PR), so no shared lib needs installing.

@kou
Copy link
Member

kou commented Sep 3, 2018

For Linux packages, I'll create packages for Abseil and change Arrow packages to depend on them.
So I want Arrow to support system Abseil. If nobody works on it, I'll work on later as a follow-up pull request.

@pitrou
Copy link
Member Author

pitrou commented Sep 4, 2018

@kou Abseil doesn't support installing, which is the problem here. See abseil/abseil-cpp#38 and abseil/abseil-cpp#111

@pitrou pitrou force-pushed the ARROW-1669-abseil branch 3 times, most recently from 00c964f to f56c80b Compare September 4, 2018 10:26
@pitrou
Copy link
Member Author

pitrou commented Sep 4, 2018

I'm now installing the Abseil headers as part of the build process. It seems to solve issues.

@pitrou
Copy link
Member Author

pitrou commented Sep 4, 2018

Note this seems to increase AppVeyor build times significantly.

@pitrou pitrou force-pushed the ARROW-1669-abseil branch 3 times, most recently from e1070ae to 9bb567c Compare September 4, 2018 12:17
@wesm
Copy link
Member

wesm commented Sep 4, 2018

Unit tests are roughly half of the LOCs in Abseil -- are these being built here? If so we should figure out a way to skip them

@pitrou
Copy link
Member Author

pitrou commented Sep 4, 2018

No, they shouldn't be.

@pitrou
Copy link
Member Author

pitrou commented Sep 4, 2018

Perhaps it's a false alarm: AppVeyor seems generally slower these days.

@wesm
Copy link
Member

wesm commented Sep 4, 2018

I've noticed it's taking upwards 10 minutes until Appveyor begins even compiling the project (even though there's only ~100 MB in conda packages). Maybe wherever these VMs are hosted (Europe?) has slow connectivity to wherever the conda packages are coming from

@pitrou
Copy link
Member Author

pitrou commented Sep 4, 2018

My experience is that Anaconda packages (which are served by a CDN) are generally faster to fetch than conda-forge packages. Also, not involving conda-forge in dependency resolution also saves time.

@kou
Copy link
Member

kou commented Sep 5, 2018

Abseil doesn't support installing, which is the problem here.

Ah, sorry. I didn't know it.
But we will be able to create Linux packages by just using cp until the official CMake install support is implemented.
So, there is no serious problem on Linux packages view.

@pitrou
Copy link
Member Author

pitrou commented Sep 17, 2018

I wonder if Abseil could be made to work on older glibcs simply by removing the check here:
https://github.com/abseil/abseil-cpp/blob/master/absl/base/policy_checks.h#L81

@pitrou pitrou force-pushed the ARROW-1669-abseil branch 2 times, most recently from ad742e1 to ee019c3 Compare September 25, 2018 16:13
@fsaintjacques
Copy link
Contributor

Reading both PEP, it seems we could be breaking the requirements by using gcc from devtoolset. From PEP571 (manylinux2010).

If the wheel contains binary executables or shared objects linked against any whitelisted libraries that also export versioned symbols, they may only depend on the following maximum versions:

GLIBC_2.12
CXXABI_1.3.3
GLIBCXX_3.4.13
GCC_4.3.0

AFAIK, you can't even use C++11 with gcc-4.3 (full support was added in 4.8, which is the lower version required by abseil).

@kszucs could you point me how I could run ldd -r -v on the libarrow.so produced for wheels to see if we even break this requirement? Can I replicate this locally with docker?

@wesm
Copy link
Member

wesm commented Nov 21, 2018

manylinux1 uses devtoolset-2, gcc 4.8 based

@pitrou
Copy link
Member Author

pitrou commented Nov 21, 2018

@fsaintjacques The problem is the glibc version on the system where the Python packages are compiled on: it's too old to support Abseil.

@fsaintjacques
Copy link
Contributor

Yep, I was just wondering if also we're breaking the requirement of pip513, we're not:

# auditwheel show ~/pyarrow-0.11.1-cp36-cp36m-manylinux1_x86_64.whl 

pyarrow-0.11.1-cp36-cp36m-manylinux1_x86_64.whl is consistent with the
following platform tag: "manylinux1_x86_64".

The wheel references external versioned symbols in these system-
provided shared libraries: libgcc_s.so.1 with versions {'GCC_3.4',
'GCC_3.0'}, libc.so.6 with versions {'GLIBC_2.2.5', 'GLIBC_2.3.4',
'GLIBC_2.4', 'GLIBC_2.3.2'}, libstdc++.so.6 with versions
{'CXXABI_1.3', 'GLIBCXX_3.4'}, libm.so.6 with versions
{'GLIBC_2.2.5'}, libpthread.so.0 with versions {'GLIBC_2.2.5',
'GLIBC_2.3.2'}, librt.so.1 with versions {'GLIBC_2.2.5'}, libdl.so.2
with versions {'GLIBC_2.2.5'}

@xhochy
Copy link
Member

xhochy commented Nov 22, 2018

@fsaintjacques be aware that we also run auditwheel as part of our build so we always ensure that our wheel don't violate PEP513.

On a related note, manylinux2010 seems to be progressing in the last days, so we might be soon be able to also to use a newer glibc.

@fsaintjacques
Copy link
Contributor

@xhochy good to know, thank!

@emkornfield
Copy link
Contributor

I'm not sure if this is the correct issue but it looks like manylinux 2010 is now very close pypa/manylinux#179?

@xhochy
Copy link
Member

xhochy commented Mar 18, 2019

@emkornfield manylinux2010 is sadly already quite close since some time. It is just missing someone who knows how to bring it over the last mile :(

@wesm
Copy link
Member

wesm commented May 30, 2019

Closing until we are able to raise our glibc requirement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants