Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using system versions of cuda, cudatoolkit, cudnn #81

Open
evanr70 opened this issue Jul 22, 2022 · 14 comments
Open

Using system versions of cuda, cudatoolkit, cudnn #81

evanr70 opened this issue Jul 22, 2022 · 14 comments
Labels
question Further information is requested

Comments

@evanr70
Copy link

evanr70 commented Jul 22, 2022

Comment:

I'm installing tensorflow on an HPC. The system has the correct versions of all CUDA relevent software + drivers installed using Modules. I don't have enough disk space to install my own conda copy of cuda.

Is it possible to install the gpu compatible build of tensorflow, but without the dependency on these packages?

I know that this goes against the ethos of conda managing everything for reproducibility. I can achieve what I want using pip. I'm creating a feedstock for my own package which relies on tensorflow so I want to stay within the conda ecosystem. I believe PyTorch used(?) to have mock cuda packages which allowed the user to use their own versions of cuda while still satisfying the solver.

Thanks!

@evanr70 evanr70 added the question Further information is requested label Jul 22, 2022
@jakirkham jakirkham transferred this issue from conda-forge/tensorflow-feedstock Jul 22, 2022
@jakirkham
Copy link
Member

Moved to the cudatoolkit repo. If we were to do this, we would want to do this for all packages (not just tensorflow)

@leofang
Copy link
Member

leofang commented Jul 22, 2022

As a workaround, we can use conda install --no-deps ... to force install the selected packages only.

@leofang
Copy link
Member

leofang commented Jul 22, 2022

But, yeah, maybe we need the cudatoolkit[external] variant, similar to what we do to mpich/openmpi.

@jaimergp
Copy link
Member

My recommendation for this would be to have a separate label with the shim packages. So people opt in via channels, not creating more complexity in the conda-forge graph.

Example usage:

$ conda install -c conda-forge/label/cudatoolkit-system -c conda-forge tensorflow

@leofang
Copy link
Member

leofang commented Jul 22, 2022

cc: @beckermr (since you started the "external" MPI packages 🙂)

@leofang
Copy link
Member

leofang commented Jul 22, 2022

@jaimergp do you know how the solver determine the priority in the presence of multiple labels/channels? In the MPI and other cases, it's determined by the package's build number. Another question is how does it help reduce/maintain the graph complexity?

@jaimergp
Copy link
Member

With strict priority, the packages in the shim channel would completely shadow the ones in conda-forge, so there's no optimization process to make.

Keeping variants separate in channels would help because unless the channel is added, a regular user not interested in the shims will never see those packages in the requests. If the shims are on conda-forge, the solver has to choose among variants, which includes: versions, build numbers, track_features, timestamp, noarch vs non-noarch. The more variants we add for the same package, the more work we ask from the solver.

See discussion in related/duplicate issue: #61

@jakirkham
Copy link
Member

Is that still the case if we downprioritize the system when one with a feature?

Also wondering if we could use symlinks or something to use the system copy (without needing parallel builds of packages based on system/non-system variants) to keep things simpler and more maintainable.

@jaimergp
Copy link
Member

The system copy might be in a different location than standard, or not available at install time (think login node vs actual calculation node). I think we only need one branch that publishes all the cudatoolkit versions. It doesn't need to stay up-to-date with build numbers; just one empty package per major.minor. Example here (unmaintained, as the label indicates :D): https://anaconda.org/jaimergp/cudatoolkit

@jakirkham
Copy link
Member

Wouldn't we still want a way to affect the CUDA version used by other libraries? How would we do that if not through a versioned cudatoolkit library?

@jaimergp
Copy link
Member

The shim would be versioned, yes.

@jakirkham
Copy link
Member

I worry a little bit about creating a package we don't plan to maintain. Guessing people will still want us to fix things with it.

If we really don't want to maintain it, wonder if we are better off just explaining to people they can force remove the package (with the warnings Conda will naturally bring up when doing this). At least that way they know they are doing something non-standard and are on their own to get things to work.

@jaimergp
Copy link
Member

That was my concern as well, hence why I ended up with that label unsupported-... in my personal channel. We can give it a try and if it becomes a real problem support-wise, mark it as broken? Users who really need it can always copy them to their channel if needed (and I am assuming they know what they are doing).

@ngam
Copy link

ngam commented Aug 15, 2022

I just want to add that currently the cudatoolkit we have is incomplete as some parts of tensorflow and jax require ptxas which is not available, and thus would fail without either #62 or system cuda... 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants