Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid setting CFLAGS, CPPFLAGS, and CXXFLAGS #59

Merged
merged 6 commits into from
May 23, 2020

Conversation

leofang
Copy link
Member

@leofang leofang commented May 16, 2020

Close #9, close #58.

Checklist

  • Used a fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@leofang
Copy link
Member Author

leofang commented May 16, 2020

@conda-forge-admin, please rerender

@leofang
Copy link
Member Author

leofang commented May 16, 2020

@isuruf @h-vetinari @jakirkham Either conda-forge/docker-images#135 or conda-forge/docker-images#136 caused all CUDA builds to fail here, see https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=163080&view=results.

In this feedstock, we rely on the fact that libcuda.so is not found in the docker image, so import cupy would fail and our test script would exist gracefully. Now it seems the driver becomes available after the changes in the docker images, so import cupy would succeed, but then calling cupy functions (which calls driver API on a system without GPU) errors out.

@leofang
Copy link
Member Author

leofang commented May 16, 2020

I found there're a bunch of recent efforts on CF that attempt to add libcuda.so...I don't understand why is that needed 🤦🏻‍♂️ On the CIs (Azure, etc), no GPU is available, so this effort at most makes something simple like import some_lib works but nothing further. However, if testing locally or running on self-hosted CIs that have GPU (we've done this recently), it'd overwrite the host driver and create chaos...

@leofang
Copy link
Member Author

leofang commented May 16, 2020

OK, tests on CUDA 9.2 passed, because there is no libcuda.so available (see conda-forge/conda-forge-ci-setup-feedstock#93).

@leofang
Copy link
Member Author

leofang commented May 16, 2020

@jakirkham as a workaround I kinda undo #53 in 0ad5edb.

@isuruf
Copy link
Member

isuruf commented May 16, 2020

or running on self-hosted CIs that have GPU (we've done this recently), it'd overwrite the host driver and create chaos...

It wouldn't overwrite it. Do you have a log where it would overwrite?

@h-vetinari
Copy link
Member

h-vetinari commented May 16, 2020

@leofang
Sorry that this broke your build. The presence of libcuda.so is IMO justified in a cuda-image (in fact they were always there in the 10.x images, just not recognised by ldconfig), and some recipes like conda-forge/pyarrow-feedstock#101 depend on this for working.

In this feedstock, we rely on the fact that libcuda.so is not found in the docker image, so import cupy would fail and our test script would exist gracefully.

It seems like relying on a failed import does not yield much information regarding the soundness of the package - is there no other way to do this? In fact, I have the same problem (of a GPU device being necessary) in conda-forge/faiss-split-feedstock#1 to actually test the code, but that can only be solved with CI-instance that have GPUs.

or running on self-hosted CIs that have GPU (we've done this recently), it'd overwrite the host driver and create chaos...

I agree with @isuruf that this wouldn't happen. The default installation for cuda drivers is in /usr/local/nvidia/lib:/usr/local/nvidia/lib64 (corresponding to the value of LD_LIBRARY_PATH) and any driver installed there should have precedence of the one in /usr/local/cuda/compat.

@leofang
Copy link
Member Author

leofang commented May 18, 2020

@isuruf @h-vetinari Thanks for replying!

Sorry that this broke your build. The presence of libcuda.so is IMO justified in a cuda-image (in fact they were always there in the 10.x images, just not recognised by ldconfig), and some recipes like conda-forge/pyarrow-feedstock#101 depend on this for working.

Right, but as I said, this effort at most makes something simple like import some_lib (as in your case) works but nothing further. I know it is a standard test in conda recipes, but as far as I remember none of the bugs we hit in this feedstock could be caught by this test. Feel free to ignore my ranting, though 🙂

It seems like relying on a failed import does not yield much information regarding the soundness of the package - is there no other way to do this? In fact, I have the same problem (of a GPU device being necessary) in conda-forge/faiss-split-feedstock#1 to actually test the code, but that can only be solved with CI-instance that have GPUs.

I second your request, although I have mixed views. I already left a comment in the issue conda-forge/conda-forge.github.io#1062 you opened.

I agree with @isuruf that this wouldn't happen. The default installation for cuda drivers is in /usr/local/nvidia/lib:/usr/local/nvidia/lib64 (corresponding to the value of LD_LIBRARY_PATH) and any driver installed there should have precedence of the one in /usr/local/cuda/compat.

On the machines I have access to, they are always in /usr/lib/x86_64-linux-gnu/. The one installed in /usr/local/cuda/lib64 is just a stub for cross-compiling.

@isuruf
Copy link
Member

isuruf commented May 18, 2020

@leofang, in the docker image, where are the libcuda.so.1s located?
Can you also run ldconfig -v 2>/dev/null | grep -v ^$'\t'?

@leofang
Copy link
Member Author

leofang commented May 18, 2020

@isuruf Not sure what's this for, but here you go.

condaforge/linux-anvil-cuda:9.2:

$ find / -name libcuda.so.1
/usr/lib64/libcuda.so.1
$
$ ldconfig -v 2>/dev/null | grep -v ^$'\t'
/usr/local/cuda-9.2/targets/x86_64-linux/lib:
/lib:
/lib64:
/usr/lib:
/usr/lib64:
/lib64/tls: (hwcap: 0x8000000000000000)
/usr/lib64/sse2: (hwcap: 0x0000000004000000)
/usr/lib64/tls: (hwcap: 0x8000000000000000)

condaforge/linux-anvil-cuda:10.0:

$ find / -name libcuda.so.1
/usr/lib64/libcuda.so.1
/usr/local/cuda-10.0/compat/libcuda.so.1
$
$ ldconfig -v 2>/dev/null | grep -v ^$'\t'
/usr/local/cuda-10.0/targets/x86_64-linux/lib:
/lib:
/lib64:
/usr/lib:
/usr/lib64:
/lib64/tls: (hwcap: 0x8000000000000000)
/usr/lib64/sse2: (hwcap: 0x0000000004000000)
/usr/lib64/tls: (hwcap: 0x8000000000000000)

condaforge/linux-anvil-cuda:10.1:

$ find / -name libcuda.so.1
/usr/lib64/libcuda.so.1
/usr/local/cuda-10.1/compat/libcuda.so.1
$
$ ldconfig -v 2>/dev/null | grep -v ^$'\t'
/usr/local/cuda-10.1/targets/x86_64-linux/lib:
/lib:
/lib64:
/usr/lib:
/usr/lib64:
/lib64/tls: (hwcap: 0x8000000000000000)
/usr/lib64/sse2: (hwcap: 0x0000000004000000)
/usr/lib64/tls: (hwcap: 0x8000000000000000)

condaforge/linux-anvil-cuda:10.2:

$ find / -name libcuda.so.1
/usr/lib64/libcuda.so.1
/usr/local/cuda-10.2/compat/libcuda.so.1
$
$ ldconfig -v 2>/dev/null | grep -v ^$'\t'
/usr/local/cuda-10.2/targets/x86_64-linux/lib:
/lib:
/lib64:
/usr/lib:
/usr/lib64:
/lib64/tls: (hwcap: 0x8000000000000000)
/usr/lib64/sse2: (hwcap: 0x0000000004000000)
/usr/lib64/tls: (hwcap: 0x8000000000000000)

one of our local machines:

$ locate libcuda.so.1
/usr/lib/i386-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so.1
$
$ ldconfig -v 2>/dev/null | grep -v ^$'\t'
/usr/lib/x86_64-linux-gnu/libfakeroot:
/lib/i386-linux-gnu:
/usr/lib/i386-linux-gnu:
/usr/local/lib:
/lib/x86_64-linux-gnu:
/usr/lib/x86_64-linux-gnu:
/lib:
/usr/lib:

@leofang
Copy link
Member Author

leofang commented May 18, 2020

(By the way, how to get root access in the docker images? docker run -u root doesn't work...)

@isuruf
Copy link
Member

isuruf commented May 18, 2020

@h-vetinari, how about we add the existing paths in front of the cuda compat directory.
Something like,

ldconfig -v 2>/dev/null | grep -v ^$'\t' | cut -f1 -d":" >> /etc/ld.so.conf.d/cuda-$CUDA_VER.conf ;
echo "$CUDA_HOME/compat" >> /etc/ld.so.conf.d/cuda-$CUDA_VER.conf ;

?

@h-vetinari
Copy link
Member

@isuruf: @h-vetinari, how about we add the existing paths in front of the cuda compat directory.

Sure. I don't know the precedence of ldconfig vs. ld.so.conf.d, but if we're not sure that the latter comes after, then we can add the rest as well.

@leofang: (By the way, how to get root access in the docker images? docker run -u root doesn't work...)

I don't think you can. The entrypoint script switches to a user conda, and it has no elevated privileges with the exception of using yum.

@isuruf
Copy link
Member

isuruf commented May 18, 2020

I don't know the precedence of ldconfig vs. ld.so.conf.d, but if we're not sure that the latter comes after, then we can add the rest as well.

Entries in ld.so.conf.d comes before the preconfigured directories and we need the cuda-compat to come after preconfigured directories.

@h-vetinari
Copy link
Member

Entries in ld.so.conf.d comes before the preconfigured directories and we need the cuda-compat to come after preconfigured directories.

Soooooo, one more PR for the dockerfiles, I'm guessing.

@leofang
Copy link
Member Author

leofang commented May 19, 2020

Noticed something weird with our CI, let me check here...

@leofang leofang closed this May 19, 2020
@leofang leofang reopened this May 19, 2020
@leofang
Copy link
Member Author

leofang commented May 19, 2020

NVM. I made a mistake in my local tests. This is good to go.

recipe/run_test.py Outdated Show resolved Hide resolved
recipe/run_test.py Outdated Show resolved Hide resolved
recipe/run_test.py Outdated Show resolved Hide resolved
1. libcudart is loaded at import
2. import is tested
@leofang
Copy link
Member Author

leofang commented May 19, 2020

Thanks, @isuruf, for suggestions! The PR is ready to go. Let's wait a little to see if @jakirkham has any feedback.

@jakirkham
Copy link
Member

Sorry was out last week. So not caught up on all of the changes that went in. Could someone please summarize? 🙂

@leofang
Copy link
Member Author

leofang commented May 20, 2020

Hi @jakirkham, in conda-forge/docker-images#134 the CUDA headers living outside of the CUDA root were symlinked back, so your CFLAGS hacks in the recipe are no longer needed.

Moreover, libcuda.so was made available in conda-forge/docker-images#135 and conda-forge/docker-images#136, so import cupy would not fail, but any actual usage like cupy.show_config() would, so the test was modified.

Another change to the docker images was to deal with the linker discovery order, but I don't think it is critical here.

@leofang leofang merged commit 0ee1280 into conda-forge:master May 23, 2020
@leofang leofang deleted the remove_compiler_flags branch May 23, 2020 18:36
leofang added a commit to leofang/cupy-feedstock-1 that referenced this pull request Jul 8, 2020
leofang added a commit to leofang/cupy-feedstock-1 that referenced this pull request Jul 8, 2020
leofang added a commit to leofang/cupy-feedstock-1 that referenced this pull request Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove adding /usr/local/include Update the recipe for handling CUDA 10.1 headers
5 participants