Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing libcuda.so on 9.2 #93

Merged
merged 3 commits into from
May 18, 2020
Merged

Conversation

h-vetinari
Copy link
Member

@h-vetinari h-vetinari commented May 14, 2020

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari
Copy link
Member Author

@conda-forge-admin, please rerender

@h-vetinari h-vetinari force-pushed the master branch 2 times, most recently from ab01f50 to 34a9ee5 Compare May 14, 2020 22:06
@h-vetinari
Copy link
Member Author

@conda-forge-admin, please rerender

@h-vetinari
Copy link
Member Author

@conda-forge-admin, please rerender

@h-vetinari
Copy link
Member Author

@conda-forge-admin, please rerender

@h-vetinari
Copy link
Member Author

@conda-forge-admin, please rerender

@h-vetinari
Copy link
Member Author

@conda-forge-admin, please rerender

@h-vetinari
Copy link
Member Author

h-vetinari commented May 15, 2020

@isuruf @beckermr
This is a necessary complement to conda-forge/docker-images#135 - any comments?

Copy link

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: allow future extensions.

recipe/meta.yaml Outdated Show resolved Hide resolved
@h-vetinari
Copy link
Member Author

@conda-forge-admin, please rerender

@github-actions
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you, but it looks like there was nothing to do.

@h-vetinari
Copy link
Member Author

Ping @isuruf

@h-vetinari
Copy link
Member Author

@isuruf
This is green and should be ready to go.

@isuruf
Copy link
Member

isuruf commented May 16, 2020

error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Permission denied)

@h-vetinari
Copy link
Member Author

OK, clearly I should stop trying to do these PRs late at night - sorry for the cumulation of oversights.

In any case, if we don't have the rights to register another rpm, then I guess we have to download directly?

@h-vetinari h-vetinari force-pushed the master branch 5 times, most recently from 22f72d2 to ee896eb Compare May 16, 2020 01:55
@h-vetinari
Copy link
Member Author

OK, I give up. Even with sudo, I'm not getting past this

++ sudo mv /usr/local/cuda-10.0/compat /usr/local/cuda-9.2

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

rm cuda-repo-*.rpm && \
sudo yum install -y cuda-compat-10-0.x86_64 && \
# note that $CUDA_HOME is just a symlink to /usr/local/cuda-${CUDA_VER}, and yum installs in that folder
sudo mv /usr/local/cuda-10.0/* /usr/local/cuda-9.2 && \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we try the following approach:

  • keep /usr/local/cuda-10.0/compat as it is
  • in linux-anvil-cuda/Dockerfile append /usr/local/cuda-10.0/compat to /etc/ld.so.conf.d/cuda-$CUDA_VER.conf only when CUDA_VER=9.2
    ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isuruf
What do you think about this idea? I dislike not being able to fix this correctly here, but without proper elevation - maybe we could add these two specific commands to the sudoer-list in the image?

Copy link
Member Author

@h-vetinari h-vetinari May 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sudoers-route works (I tested it; see conda-forge/docker-images#138), and should lead to IMO the cleanest solution all-around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is safe to do. If a user builds a CUDA 9.2 package with this CUDA 10.0 compat libcuda.so this can potentially break things on a driver 396 install for example as this compat I believe is based on driver 410. I could be mistaken about this though.

@teju85 @harrism do you know if this is the case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user is not building against CUDA 10.0. It's just loading the application using cuda compat 10.0. This is useful for checking that packages can be loaded on CI without GPUs.

Copy link
Member

@isuruf isuruf May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's added to ld.conf which has the almost the same effect as LD_LIBRARY_PATH

@h-vetinari, how about not moving that and adding /usr/local/cuda-10.0/compat to ld.conf in cuda 9.2 image?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's added to ld.conf which has the almost the same effect as LD_LIBRARY_PATH

@h-vetinari, how about not moving that and adding /usr/local/cuda-10.2/compat to ld.conf in cuda 9.2 image?

Won't this basically cause symbols from a newer driver library to be used during building packages which aren't backwards compatible? I believe CUDA 10.0 compat is based on driver 410, CUDA 10.2 compat is based on driver 440, but CUDA 9.2 can use driver 396 which presents possible issues from lack of backwards compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. We are using the stub library from 9.2 to build and the 10.0 compat is used only for loading

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-vetinari, how about not moving that and adding /usr/local/cuda-10.0/compat to ld.conf in cuda 9.2 image?

This is what @pearu suggested at the top of this review thread. Fine by me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-vetinari
Copy link
Member Author

h-vetinari commented May 16, 2020

This needs to be restarted if/once conda-forge/docker-images#138 is in.

This need to be combined with conda-forge/docker-images#139 to have its desired effect.

@h-vetinari
Copy link
Member Author

@isuruf
Together with the now-merged conda-forge/docker-images#139, this PR should unblock pyarrow. LGTY?

@isuruf
Copy link
Member

isuruf commented May 18, 2020

@kkraus14, any objections?

@kkraus14
Copy link
Contributor

@kkraus14, any objections?

I'm still not clear on how this being added to ld.conf can't cause problems with symbols during build, but seems like you have a solution here that I'm just not understanding 😄 so this is good to go from my perspective.

@isuruf
Copy link
Member

isuruf commented May 18, 2020

ld.conf is for runtime dynamic linker which is different to the linker ld used during build.

@isuruf isuruf merged commit c0801d2 into conda-forge:master May 18, 2020
@isuruf
Copy link
Member

isuruf commented May 18, 2020

Thanks @h-vetinari for the PR and @kkraus14 for the review.

@jakirkham jakirkham mentioned this pull request Sep 26, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants