-
-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we use gcc from the default channel for Linux (and maybe OS X)? #29
Comments
I have no idea about most of this, but: We should build with the same toolchain as Anaconda is built with as much as possible. Which is clang on OS-X, I think. And the "manylnux" folks have been working on a Docker image for building manylinux wheels, which is derived from the Anaconda experience -- so that might be a good place to go for Linux: |
I'm pretty happy with the reach of our existing binaries. @ocefpaf - I know there is no time like the present to get this right, but I don't really have any experience of it going wrong. My hunch therefore would be to stick with what we have until we find a problem with it. 👍 / 👎? |
👍 |
Well, it is not a matter of right and wrong. I am pretty happy too. The
+1 let's just document that people should install build_essentials and |
Ah OK. I've not seen these. I'm happy to tighten that requirement down somewhat - it sounds like quite a big ask to install |
My bad, I am on the phone and the # refs above should point to the ioos conda recipe repo. |
build_essentials was a lazy solution from my part. Some cases need only |
In the few cases where I have had issues elsewhere, I find I can use |
So, this ( conda-forge/staged-recipes#164 ) might be such a case where we would want to use |
Here's what I understand: If you ship libgcc (more importantly, libstdc++, which comes with it) and shadow the system libstdc++, and the system libstdc++ is newer than the one you ship, you'll run into unresolved symbol errors at runtime and crash or fail to run. This has been a huge motivator for me to get GCC 5.2 running in our docker build image. I have argued very strongly internally against using the gcc that is in defaults. My main argument against even having this package is that people will use it on unknown platforms - and this means their packages will have an unknown version dependency on GLibC. IMHO, Continuum should just ship all the runtimes, the same way we do with Windows. They are much more nicely backwards/forwards compatible on Linux, but I don't see harm in keeping them controlled on Linux. |
I think an argument can also be made that you should ship no gcc, libstdc++, and similar runtimes and instead always depend on the system provided ones. This seems to be what the manylinux folks are doing with wheel files. I'm not sure which option is better but I think both should be on the table. |
One of the other ideas, I was playing with in that PR is bundling only a few essential components like |
Alternatively static linkage remains a valid option here. |
I have run into issues where a Fortran compiled extension linked against symbols in my system provided If runtimes are shipped on Linux it seems they must be the most up-to-date versions. Keeping these up to date may require significant maintenance. |
Yeah, I am liking the static option more and more. |
I'm not clear on how the Manylinux stuff works to depend on libstdc++ on the system. I'm sure they have something figured out, but I just don't understand it. This is the article that convinced me to pursue the approach I'm behind: http://www.crankuptheamps.com/blog/posts/2014/03/04/Break-The-Chains-of-Version-Dependency/ Note that this is the same approach taken by the Julia team. |
Found it. They place tight restrictions on ABI version:
|
Yup, from my understanding they are defining a base linux system that has a set of "core" libraries which they expect to 1) exist and 2) match a minimum version. But pip does not have a effective method for providing more up-to-date runtimes like conda does. |
I'm warming more to the idea of providing the latest runtimes. Would this allow us to compile package with the GCC 5 libstdc++ ABI and run them on systems using the GCC 4 API? |
I feel like Conda has a better approach here, making the assumption that we should provide it. People can conceivably pip install something without having libstdc++ installed, and end up confused. My wife had that happen with Steam on her Linux computer, for example. Good times. I never thought work would be so useful at home. FWIW, I'm pretty sure Continuum is taking this route, and you can be certain that it will be maintained as long as we're pushing it, because we'll have customers screaming otherwise. @jjhelmus yes. Here's my understanding with GCC5: Compiled with GCC5, CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" => GCC4 compatible, runs fine with libstdc++ from gcc 5 (it is dual-abi). Does not link with libs compiled with GCC5 (abi 5) Compiled with GCC5, CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=1" => GCC5 compatible, runs fine with libstdc++ from gcc 5 (it is dual-abi). Does not link with libs compiled with GCC4. Continuum is planning on the former setting for now, with a planned switch at some point in the future, along with an associated rebuild of (maybe) everything. I have tried to make that ABI info readily accessible with startup scripts in the build docker image: https://github.com/ContinuumIO/docker-images/pull/20/files#diff-8320ce46adf2819c0900060bd6c14c43R16 (also see the start_c++??.sh scripts, which are meant to be simple front-ends) |
Alright, this clarifies the Linux stuff for me.
That's going to be fun. Hopefully, conda-forge has everything and is super fast then. 😄
Thanks. This is really useful. |
I'm on board too. Thanks for the great explanation @msarahan. It took a bit but I'm seeing the light. Of course now I'm going to have to build GCC 5 tonight. Sorry for the long tangent on this PR @jakirkham, did this answer your original concern? |
This is what I am still unclear about, are we shipping gcc on Mac too? |
My current opinion is yes. I'd like to avoid it if possible, but I see the need for OpenMP and Fortran. I'll keep you all in the loop on any discussions we have here. |
Ok. With OpenMP, maybe we can get around it by doing something similar to the Linux strategy namely building the newest clang on our oldest Mac (10.7). Though Fortran remains a different problem. |
Thanks for being receptive, both of you! Now let's go rule the world! (or maybe just build great software) |
Thanks for keeping us in the loop.
Are they mutually exclusive? 😈 |
That sounds really cool, @tkelman. Thanks for sharing. Interesting. Yeah, I think we are staying on CentOS 6 for present, but it is possible if we find it pressing enough that we would go back to CentOS 5. The current thought is that without more pressing reasons (people clamoring for that level of GLIBC compatibility) we will stay on CentOS 6. There was a docker container that used CentOS 5 and gcc 5.2 that @msarahan had proposed. Though there are some concerns like having to rebuild everything on the old CentOS. Also there is an issue due to CentOS 5 being less than a year from EOL. There were some other concerns about dependencies, which are kind of up in the air (CUDA, cuDNN, etc.). I know devtoolsets do interesting things when linking libraries so that things remain portable without needing to package Having a newer How have you been building this image? Is this (and I know this is a long shot) being built on Docker Hub, Quay, or similar? One challenging aspect here has been having a shared infrastructure to do a build on an image like this. We want to avoid a developer bandwidth problem. Personally, I would be really interested in being able to share a common framework with Julia (maybe even packages 😉). So would really love to discuss this more with you. |
The devtoolset does things in a funny way where it is set up to statically link newer pieces of libstdc++ and libgfortran that might not exist on the default centos system compiler versions. We initially tried to use the devtoolset for Julia, but found when building openblas with the devtoolset the openblas shared library doesn't actually end up statically linked to libgfortran. So there's still a dependency on libgfortran which we have to bundle in our binaries, but we don't want to use the system centos libgfortran version as that's too old. So we transitioned to doing something very similar to what the Conda folks are now doing, building our own GCC 5.x from source on CentOS 5. It was ansible based and hooked up to buildbot and I'm now updating/re-doing that in Docker form with 6.x versions. GCC 6 does break a fair amount of code. I'm mainly looking at it as slight future-proofing since Arch and unstable versions of Fedora and openSUSE are likely to upgrade to GCC 6 soon. Due to the glibc issue described in detail by @njsmith here https://sourceware.org/bugzilla/show_bug.cgi?id=19884, "generic linux binaries" need to be built on the oldest glibc version of any system a user wants to use (so old centos/rhel drives this), with as new or newer gcc version as any user has installed as their default system compiler version (so arch/fedora/non-LTS-ubuntu drives this). I'll see whether docker hub's time limit is capable of handling this. I've only used quay a handful of times and haven't hooked it up to github hooks yet (which is really convenient when working with docker hub auto builds) but in manual quay builds it did seem way faster than docker hub. |
Correct. We are aware of this. What do you do to solve this? Wasn't sure if there were other weird things you noticed.
If it has all been built with a new
Do you have any examples.
I see so it is just the mad race to stay newer while supplying old GLIBC support. That makes sense.
Would be interesting to see what you discover. Yeah, I've had so many issues with Docker Hub that I might just want to use quay if for no other reason than it is a little bit more stable. |
Also, there is a similar story with OpenMP as with Fortran when using devtoolsets, if you haven't encountered that yet. |
Some of the linking to system libgfortran with devtoolset might be resolvable with openblas makefile patches to remove things like hardcoded
For the standalone Julia binaries to work on systems that might not have libgfortran installed, we need to bundle a libgfortran. The devtoolset doesn't include its own separate modern shared-library version of libgfortran. But when you build gcc from source in the normal way, you will get a shared libgfortran that you can use just fine. So that's what we do. If any users try building C/C++/Fortran libraries with newer compiler versions than what we used to build Julia, they'll need to delete or rename the runtime libraries that we bundle in the Julia binaries in order to call into them from Julia.
JuliaLang/julia#14829 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69550 Here's my WIP so far: https://github.com/tkelman/c6g6/blob/master/Dockerfile |
@tkelman: I'm curious whetherv you've considered handling the system-vs-shipped gfortran issue the same way we are for manylinux builds, by renaming it to avoid triggering that glibc bug. |
Considering we've had to deal with blas symbol name collisions for some time even from differently named libraries, I'm not sure changing the library file name alone without also renaming all the symbols would fix matters. |
@tkelman: ah, right, you'd need to change the name and also clean up RTLD_GLOBAL usage. But those two things together should work, I think... |
AFAIK the stuff to make Linux work without RTLD_GLOBAL should be equivalent to the stuff needed to make Windows and osx work at all, since they don't support elf's weird symbol collision semantics in the first place. |
I'm still not entirely sure what visibility the automatic dlopen when you use Julia's C FFI uses by default on Linux. It might not be global at all unless you specifically call dlopen asking for it. I don't know quite the right patchelf invocations to rename all the shared libraries that we ship with Julia and keep them interlinked properly. We already use patchelf at build time for rpath modifications, so I wouldn't be opposed to testing it out. Might not even need a source build of Julia, you could try downloading our binaries and calling patchelf on them directly as a proof of concept? |
Just as an FYI. We now use devtoolset-2 (gcc 4.8.2) in our image. I have gone through and audited all existing feedstocks to make sure that they only use the In all cases, where the For all new recipes, please only use the |
I've now tried Docker Hub, Quay, Travis, Circle CI, and Shippable all building the same GCC source-build Dockerfile. I might be spoiled by having ssh access to a pretty nice server where the image takes about half an hour to build. Everywhere else I've tried takes long enough that it's hitting ~1hr timeouts on Hub and Travis, and still going for multiple hours on the others. Building and pushing locally isn't the end of the world as this shouldn't need updating too often, but it would be nicer if one of the hosted automated services were fast enough to handle this without a much longer turnaround time. edit: quay did eventually finish, it just took a really long time |
You can look at the auditwheel source code to see a fully automated script for this, but basically:
|
Thanks @njsmith. We're actually getting a little off topic here, maybe we should move this to an issue on JuliaLang/julia or one of the gcc-from-source dockerfile repos? In Julia's case there's a really easy workaround for running old Julia binaries on distros with newer gcc, of deleting the bundled runtime libraries so that the system versions get used instead. I'd need to be convinced renaming is worth it and won't break things, since some packages do need to be able to find Julia's libgfortran or libstdc++ for ffi purposes, linking and loading libraries that don't have rpath set right on their own, etc. I distrust the devtoolset partial static linking approach since I've seen it not work correctly in complicated examples like openblas and other Julia dependencies. The C++ partial static linking had also caused issues, if I remember correctly. On a normal build of gcc |
Just as FYI, this is complete. |
As this came up at the compiler meeting the other day, I figured I would share it (also posted on gitter). This is an ancient mailing list thread (had to get from archive) on the conditions under which |
I wouldn't trust anything written prior to gcc 5 to still be relevant on this subject. ABI tags threw an additional wrench into this issue, and have still not been entirely implemented in LLVM. There are various patches floating around that I think Arch and a few others have been using, but nothing merged and released yet AFAIK. |
Sure |
Let's close this and re-discuss once we have a |
Most of the time we are OK using the compilers installed in the CIs because we all have similar build tools pre-installed in our machines. However, every now and then someone tries to use the packages in a docker image without those tools. (For example ioos/conda-recipes#723 and ioos/conda-recipes#700.) A few questions:
The text was updated successfully, but these errors were encountered: