Using system versions of cuda, cudatoolkit, cudnn #81

evanr70 · 2022-07-22T14:45:41Z

Comment:

I'm installing tensorflow on an HPC. The system has the correct versions of all CUDA relevent software + drivers installed using Modules. I don't have enough disk space to install my own conda copy of cuda.

Is it possible to install the gpu compatible build of tensorflow, but without the dependency on these packages?

I know that this goes against the ethos of conda managing everything for reproducibility. I can achieve what I want using pip. I'm creating a feedstock for my own package which relies on tensorflow so I want to stay within the conda ecosystem. I believe PyTorch used(?) to have mock cuda packages which allowed the user to use their own versions of cuda while still satisfying the solver.

Thanks!

jakirkham · 2022-07-22T16:37:12Z

Moved to the cudatoolkit repo. If we were to do this, we would want to do this for all packages (not just tensorflow)

leofang · 2022-07-22T16:47:07Z

As a workaround, we can use conda install --no-deps ... to force install the selected packages only.

leofang · 2022-07-22T16:50:31Z

But, yeah, maybe we need the cudatoolkit[external] variant, similar to what we do to mpich/openmpi.

jaimergp · 2022-07-22T16:57:48Z

My recommendation for this would be to have a separate label with the shim packages. So people opt in via channels, not creating more complexity in the conda-forge graph.

Example usage:

$ conda install -c conda-forge/label/cudatoolkit-system -c conda-forge tensorflow

leofang · 2022-07-22T17:00:26Z

cc: @beckermr (since you started the "external" MPI packages 🙂)

leofang · 2022-07-22T17:06:09Z

@jaimergp do you know how the solver determine the priority in the presence of multiple labels/channels? In the MPI and other cases, it's determined by the package's build number. Another question is how does it help reduce/maintain the graph complexity?

jaimergp · 2022-07-22T17:15:48Z

With strict priority, the packages in the shim channel would completely shadow the ones in conda-forge, so there's no optimization process to make.

Keeping variants separate in channels would help because unless the channel is added, a regular user not interested in the shims will never see those packages in the requests. If the shims are on conda-forge, the solver has to choose among variants, which includes: versions, build numbers, track_features, timestamp, noarch vs non-noarch. The more variants we add for the same package, the more work we ask from the solver.

See discussion in related/duplicate issue: #61

jakirkham · 2022-07-22T17:23:33Z

Is that still the case if we downprioritize the system when one with a feature?

Also wondering if we could use symlinks or something to use the system copy (without needing parallel builds of packages based on system/non-system variants) to keep things simpler and more maintainable.

jaimergp · 2022-07-22T17:32:16Z

The system copy might be in a different location than standard, or not available at install time (think login node vs actual calculation node). I think we only need one branch that publishes all the cudatoolkit versions. It doesn't need to stay up-to-date with build numbers; just one empty package per major.minor. Example here (unmaintained, as the label indicates :D): https://anaconda.org/jaimergp/cudatoolkit

jakirkham · 2022-07-22T17:38:37Z

Wouldn't we still want a way to affect the CUDA version used by other libraries? How would we do that if not through a versioned cudatoolkit library?

jaimergp · 2022-07-22T17:56:01Z

The shim would be versioned, yes.

jakirkham · 2022-07-22T19:12:10Z

I worry a little bit about creating a package we don't plan to maintain. Guessing people will still want us to fix things with it.

If we really don't want to maintain it, wonder if we are better off just explaining to people they can force remove the package (with the warnings Conda will naturally bring up when doing this). At least that way they know they are doing something non-standard and are on their own to get things to work.

jaimergp · 2022-07-22T19:20:48Z

That was my concern as well, hence why I ended up with that label unsupported-... in my personal channel. We can give it a try and if it becomes a real problem support-wise, mark it as broken? Users who really need it can always copy them to their channel if needed (and I am assuming they know what they are doing).

ngam · 2022-08-15T20:56:29Z

I just want to add that currently the cudatoolkit we have is incomplete as some parts of tensorflow and jax require ptxas which is not available, and thus would fail without either #62 or system cuda... 😢

evanr70 added the question Further information is requested label Jul 22, 2022

jakirkham transferred this issue from conda-forge/tensorflow-feedstock Jul 22, 2022

This was referenced Aug 16, 2022

Recommend using the cuda-nvcc package from the "nvidia" conda channel… jax-ml/jax#11934

Merged

ptxas executable #72

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using system versions of cuda, cudatoolkit, cudnn #81

Using system versions of cuda, cudatoolkit, cudnn #81

evanr70 commented Jul 22, 2022

jakirkham commented Jul 22, 2022

leofang commented Jul 22, 2022

leofang commented Jul 22, 2022

jaimergp commented Jul 22, 2022

leofang commented Jul 22, 2022

leofang commented Jul 22, 2022

jaimergp commented Jul 22, 2022

jakirkham commented Jul 22, 2022

jaimergp commented Jul 22, 2022

jakirkham commented Jul 22, 2022

jaimergp commented Jul 22, 2022

jakirkham commented Jul 22, 2022

jaimergp commented Jul 22, 2022

ngam commented Aug 15, 2022

Using system versions of cuda, cudatoolkit, cudnn #81

Using system versions of cuda, cudatoolkit, cudnn #81

Comments

evanr70 commented Jul 22, 2022

Comment:

jakirkham commented Jul 22, 2022

leofang commented Jul 22, 2022

leofang commented Jul 22, 2022

jaimergp commented Jul 22, 2022

leofang commented Jul 22, 2022

leofang commented Jul 22, 2022

jaimergp commented Jul 22, 2022

jakirkham commented Jul 22, 2022

jaimergp commented Jul 22, 2022

jakirkham commented Jul 22, 2022

jaimergp commented Jul 22, 2022

jakirkham commented Jul 22, 2022

jaimergp commented Jul 22, 2022

ngam commented Aug 15, 2022