Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

figure out if we need to exclude the gcc9 builds for cuda <=10.2 #68

Closed
beckermr opened this issue Oct 9, 2020 · 13 comments
Closed

figure out if we need to exclude the gcc9 builds for cuda <=10.2 #68

beckermr opened this issue Oct 9, 2020 · 13 comments

Comments

@beckermr
Copy link
Member

beckermr commented Oct 9, 2020

cc @jakirkham

@leofang
Copy link
Member

leofang commented Oct 9, 2020

Answer is yes. CUDA 9.2, which is the CUDA version we used to compile for the CUDA-awareness support, only supports up to GCC 7 (?) I think.

@leofang
Copy link
Member

leofang commented Oct 9, 2020

The problem less clear to me is if this would work for CUDA 11, which requires a different glibc version.

@beckermr
Copy link
Member Author

beckermr commented Oct 9, 2020

Thanks @leofang ! See also the comments from

conda-forge/conda-forge.github.io#1160 (comment)

by @kkraus14. they seem to indicate that if openmpi is only looking at CUDA host APIs there is not an issue.

@jakirkham
Copy link
Member

With CUDA 11.0, the nvcc package effectively requires GLIBC 2.17+ at build time and adds this dependency to packages for install time ( conda-forge/nvcc-feedstock#43 ). So there is nothing special packages need to do to get that constraint when building CUDA 11.0 support.

@leofang
Copy link
Member

leofang commented Oct 9, 2020

@jakirkham But do we get the same constraint if we build openmpi only with cuda_compiler_version=="9.2"?

@jakirkham
Copy link
Member

No because that constraint is only added when building for CUDA 11.0. Otherwise we still use the default GLIBC currently 2.12.

@beckermr
Copy link
Member Author

beckermr commented Oct 9, 2020

so what is the conclusion here? Do we understand enough about how openmpi is using CUDA to say if the current builds are ok?

@kkraus14
Copy link

kkraus14 commented Oct 9, 2020

I'm definitely not confident, but based on the build script it looks like only gcc / g++ are being used or configured: https://github.com/conda-forge/openmpi-feedstock/blob/master/recipe/build-mpi.sh

@jakirkham
Copy link
Member

After digging back through things here, I think we came to the conclusion that openmpi was using dlopen for CUDA support (so might be ok), but there are some compile time checks that occur as well, which were a bit confusing. I don't think we ever came to a resolution on that.

xref: open-mpi/ompi#7334

@leofang
Copy link
Member

leofang commented Oct 9, 2020

Yeah, IIRC at build time we don't need nvcc, just the CUDA headers. We don't even need to link to CUDA shared libraries as John mentioned. We settled at CUDA 9.2 because we realized Open MPI doesn't really care the recent CUDA versions.

I am just a bit worried that when we specify cuda_compiler_version=="9.2":

skip: true # [win or (linux64 and cuda_compiler_version != '9.2')]

we don't enforce to use the latest glibc, and when we do conda install -c conda-forge openmpi cudatoolkit=11.0 we might have problems.

@jakirkham
Copy link
Member

Well GLIBC is backwards compatible. So libraries built with an older GLIBC can always be installed on a system with a newer GLIBC. IOW one can install openmpi on a GLIBC 2.17 system (even though it is built using GLIBC 2.12) without issues.

Should add cudatoolkit itself requires the system be able to support an equivalent CUDA version. So if the driver doesn't support that version, Conda won't be able to install that.

So I guess the question is can one configure a system using CentOS 6 that has a new enough driver version to support CUDA 11.0? I would think the answer is no as the associated libraries would also require GLIBC 2.17+

@leofang
Copy link
Member

leofang commented Oct 9, 2020

Sounds good, so looks like conda install -c conda-forge openmpi cudatoolkit=11.0 on Cent OS 7 (or any OS supporting CUDA 11) will just work.

I think this issue can be closed, then? The current status is:

  • We choose cuda_compiler_version=="9.2" to build Open MPI
  • The bot did two builds, with both gcc7 and gcc9
  • Both builds are built with glibc 2.12
  • glibc is backward compatible, so using the new builds with glibc 2.17 + CUDA 11 should work, so will glibc 2.12 + CUDA <11.

@leofang
Copy link
Member

leofang commented Dec 30, 2020

With conda-forge/conda-forge-pinning-feedstock#1052 all CUDA builds (9.2 - 11.0) currently fall back to gcc 7.

@leofang leofang closed this as completed Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants