ptxas executable #72

hmaarrfk · 2021-11-22T14:26:35Z

I believe that the ptxas executable should be available here.

It seems that tensorflow (at least v1) attempts to use it.

However, when I create a fresh environment with cudatoolkit 11.2 it doesn't seem to be included.

I read the conda-forge documentation and could not find the solution for my problem there.

Issue:

Environment (conda list):

$ conda list
# packages in environment at /home/mark/miniforge3/envs/cudatoolkit:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
cudatoolkit               11.2.0               h73cb219_9    conda-forge
libgcc-ng                 11.2.0              h1d223b6_11    conda-forge
libgomp                   11.2.0              h1d223b6_11    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_11    conda-forge

Details about conda and system ( conda info ):

$ conda info

     active environment : cudatoolkit
    active env location : /home/mark/miniforge3/envs/cudatoolkit
            shell level : 3
       user config file : /home/mark/.condarc
 populated config files : /home/mark/miniforge3/.condarc
                          /home/mark/.condarc
          conda version : 4.10.3
    conda-build version : 3.21.4
         python version : 3.8.12.final.0
       virtual packages : __cuda=11.2=0
                          __linux=5.11.0=0
                          __glibc=2.31=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/mark/miniforge3  (writable)
      conda av data dir : /home/mark/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/mark/miniforge3/pkgs
                          /home/mark/.conda/pkgs
       envs directories : /home/mark/miniforge3/envs
                          /home/mark/.conda/envs
               platform : linux-64
             user-agent : conda/4.10.3 requests/2.26.0 CPython/3.8.12 Linux/5.11.0-40-generic ubuntu/20.04.3 glibc/2.31
                UID:GID : 1002:1002
             netrc file : None
           offline mode : False

xref: conda-forge/tensorflow-feedstock#170

The text was updated successfully, but these errors were encountered:

leofang · 2021-11-22T14:54:57Z

Currently cudatoolkit does not contain any executable or header from CUDA Toolkit because of the EULA limitation. The new package format (#62) will address this issue but I am unaware of any timeline for that.

hmaarrfk · 2021-11-22T15:43:40Z

Understood. Thank you for the explination

ngam · 2022-05-25T19:22:11Z

@leofang, I am a little confused about what is and is not available through the main cudatoolkit (the one in this stock). In particular:

is nvcc compiler available?
are things like cublas available?

I am primarily interested in essentially recreating what NVIDIA offers in their NGC containers in our packaging of tensorflow and pytorch; one of the key missing items I've been working on is activating XLA for tensorflow, which requires the correct compilers etc. being available. Do you have any advice?

ngam · 2022-05-25T19:23:59Z

If not available, where are these things available in conda-forge? For example, I saw that some things are available through cudatoolkit-dev and I believe we have nvcc feedstock... Are we supposed to be using them that way? My understanding has been that these things should be bundled in cudatoolkit (as the first sentence above asserts)

ngam · 2022-05-25T19:25:09Z

cc @jakirkham for viz and comment

leofang · 2022-05-25T19:36:56Z

the offline compiler (nvcc) and headers are not available
- on conda-forge, they are only available in the conda-forge's docker images that are deployed to the CIs for building packages
- nvcc-feedstock is just a thin wrapper on top of the image's nvcc, no real use case outside of the CI AFAIK
- to me cudatoolkit-dev technically violates the CUDA EULA, I don't know how it still exists, but perhaps it's not my business to ask 🙂
runtime shared libraries for cuBLAS etc are available

ngam · 2022-05-25T19:51:23Z

runtime shared libraries for cuBLAS etc are available

Where? Here?

the offline compiler (nvcc) and headers are not available

Okay, let me try to see what exactly is needed for tensorflow and pytorch and we can work on addressing these issues as they come.

on conda-forge, they are only available in the conda-forge's docker images that are deployed to the CIs for building packages

Since you talk about the EULA, etc. --- is using that conda-forge CI image for someone's production work okay or is it only for CI? I believe I saw it was based on the cuda-devel Docker images, so the licensing might be exactly the same as that (those cuda-devel images are the main building blocks for all NGC containers as far as I understand...)

ngam · 2022-05-25T19:52:13Z

Thanks for the prompt and clear answer by the way, 👍 @leofang

ngam · 2022-05-27T17:21:45Z

Btw, as far as I could tell, we are good for the XLA implementation, though I need to do more local testing to see if there are additional issues to resolve. @leofang, if you're interested in having a look, see conda-forge/tensorflow-feedstock#246

leofang · 2022-05-27T18:20:22Z

Where? Here?

Yes

Since you talk about the EULA, etc. --- is using that conda-forge CI image for someone's production work okay or is it only for CI?

I think it's OK. The CUDA images and their derivatives (including conda-forge's) are permissive. By using them users acknowledge the terms and conditions.

JulianSMoore · 2022-05-28T16:55:02Z

I have just encountered the issue that a particular TF model built from TF hub and run in Jupterlab in an Anaconda environment in which I had installed cuDNN and cudatoolkit raised issues because is missing from the conda-forge source.

I confess I cannot follow the discussion above. Can someone explain in plainer language why this happened in CUDA toolkit 11.2 and what the future will be like?

It doesn't make sense to someone me: if the DLL's etc are available then why not the ptxas.exe??? I'm a user... I just want it to run. (NB finding out why ptxas was an issue and what to do about it was a PITA - now I have an installation, can I just copy ptxas into the environment somewhere appropriate??)

I had to install cuda in the OS, which undermines the value of having conda environments with the CUDA stuff in them.

Setup: win-10 home 64 bit 21H2.

UPDATE

Issue finally resolved by conda install -y -q -c nvidia cuda-nvcc to bring in ptxas.exe

But... hours to work this out. I'm going to give the TF crew a fair share of the blame for not giving enough info about ptxas and where TF is looking etc. Not their first offense... XLA_FLAGS is also screwed up

JulianSMoore · 2022-06-02T08:54:15Z

Hmmm... view on github didn't seem to work, but here goes.... Error arose while executing this in a downloaded ViT (Vision Transformer ipynb attached, without error info, showing working result after ptxas made available) model_url = 'https://tfhub.dev/sayakpaul/vit_s16_classification/1'

classification_model = tf.keras.Sequential( [hub.KerasLayer(model_url)] ) predictions = classification_model.predict(image) predicted_label = imagenet_int_to_str[int(np.argmax(predictions))] predicted_label

I can't recall (and didn't save :( ) the error details except that the right ptxas was not found. I did find these notes though... Then TF complains about the ptxas version (there are 2 on c: one at

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\ptxas.exe and the other in C:\Users\Julian\AppData\Roaming\Mathematica\Paclets\Repository\CUDAResources-Win64-10.5.0\CUDAToolkit\bin\ptxas.exe)

It says maybe I can just put the right version (?>11.1) in the cuda path,

i.e. into the OS CUDA 10

Obviously I wanted it to take it from the env, which *conda install -c nvidia cuda-nvcc* achieved.TF is looking in the env first AFAICT, and then falls back to an OS install - complex errors depending on env/os cuda installation existence + version, Setup is TF2.9.1 # Name Version Build Channel tensorboard 2.9.0 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow 2.9.1 pypi_0 pypi tensorflow-addons 0.17.0 pypi_0 pypi tensorflow-datasets 4.5.2 pypi_0 pypi tensorflow-estimator 2.9.0 pypi_0 pypi tensorflow-gan 2.1.0 pypi_0 pypi tensorflow-hub 0.12.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.26.0 pypi_0 pypi tensorflow-metadata 1.8.0 pypi_0 pypi tensorflow-probability 0.16.0 pypi_0 pypi tensorflow-text 2.9.0 pypi_0 pypi cuda-nvcc 11.7.64 0 nvidia cudatoolkit 11.2.2 h933977f_10 conda-forge cupy-cuda112 10.5.0 pypi_0 pypi Python 3.10.4, Win-10 Home, 64 bit. HTH BR, Julian

…

On Thu, 2 Jun 2022 at 02:28, ngam ***@***.***> wrote: @JulianSMoore <https://github.com/JulianSMoore> What exactly is your error and what tensorflow are you using? Upgrade to 2.8.1 if you can! — Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACHLK6N76DMHKBIRSEQLVP3VNAE2RANCNFSM5IRELETQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

JulianSMoore · 2022-06-02T09:05:06Z

Not obvious to me that the notebook attached to email is available here, so attaching it separately in zip.

classification_ViT-checkpoint - Copy.zip

hmaarrfk · 2022-06-05T13:32:28Z

@JulianSMoore sorry we can't be of more help. A few things:

The anaconda channel is different than the conda-forge channel
The conda-forge channel still doesn't have many packages for cuda+windows
The conda-forge channels doesn't have tensorflow for windows
conda-forge is different from pypi and as such, the knowledge you gain here may not translate.

it seems that you are installing things through the Tensorflow recommended ways (pypi) and as such I would recommend you as a question on their forums. We simply don't have the knowledge to help you troubleshoot your system on windows.

JulianSMoore · 2022-06-06T08:57:44Z

@hmaarrfk Perfectly understood (wasn't expecting you to troubleshoot!) & your info will be helpful.

For the benefit of others: some required s/w support for TensorFlow (e.g. ptxas, for ViT model from TF hub) seems to lie outside cudatoolkit. If you encounter a similar issue, first check your paths, then think about the libraries used/needed and finally consider packages from different channels. Hard to be more specific than that, unfortunately. (I use conda installation for cuda tookit and cuDNN because that is the only way I know to install cuda in an anaconda env (rather than OS) - everything else I do with pip)

ngam · 2022-08-16T15:12:36Z

Any update on this?

We need ptxas for tensorflow and jax going forward. I am not sure if there is any point in continuing our crazy efforts in maintaining cuda builds if we are not going to have access to ptxas: It is simply needed. I am personally not going to participate in any cuda builds in tensorflow and jaxlib until this is fixed. (I have been the primary pusher for the latest tensorflow and jaxlib builds as others are busier than usual.)

At the end of the day, if someone has to install system cudatoolkit anyway, there is no point in getting it from conda-forge. I would be more inclined to pursue lighter builds along the lines of #81 instead.

@conda-forge/cudatoolkit could we please get some clarity on this soon? Or at least a response about what is stopping us from resolving it?

cc @conda-forge/core

jaimergp · 2022-08-19T07:21:55Z

Isn't the main problem that the Nvidia EULA prevents us from distributing PTXAS and other binaries? We can't just ignore that. This all will go away when #62 lands and Nvidia officially distributes their packages on conda, which I assume would have a permissive enough license for us to redistribute. Til then... we can't do much, sorry.

leofang · 2023-12-16T03:54:58Z

Let's close this issue now that it is resolved with CUDA 12. Thanks everyone for the discussion and request.

This was referenced Aug 15, 2022

Point to the conda-forge jaxlib wheels in the JAX readme. jax-ml/jax#11888

Merged

Recommend using the cuda-nvcc package from the "nvidia" conda channel… jax-ml/jax#11934

Merged

stephane-caron mentioned this issue Mar 1, 2023

Getting CUDA to work in Conda environment perrin-isir/xpag#6

Closed

h-vetinari mentioned this issue May 6, 2023

Build order for CUDA SDK conda-forge/staged-recipes#21382

Closed

49 tasks

jakirkham mentioned this issue Jun 1, 2023

conda-forge: CUDA 12 / new CUDA SDK bringup conda-forge/conda-forge.github.io#1963

Closed

leofang closed this as completed Dec 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ptxas executable #72

ptxas executable #72

hmaarrfk commented Nov 22, 2021

leofang commented Nov 22, 2021

hmaarrfk commented Nov 22, 2021

ngam commented May 25, 2022

ngam commented May 25, 2022

ngam commented May 25, 2022

leofang commented May 25, 2022

ngam commented May 25, 2022

ngam commented May 25, 2022

ngam commented May 27, 2022 •

edited

Loading

leofang commented May 27, 2022

JulianSMoore commented May 28, 2022 •

edited

Loading

JulianSMoore commented Jun 2, 2022 via email

JulianSMoore commented Jun 2, 2022

hmaarrfk commented Jun 5, 2022

JulianSMoore commented Jun 6, 2022

ngam commented Aug 16, 2022

jaimergp commented Aug 19, 2022

leofang commented Dec 16, 2023

ptxas executable #72

ptxas executable #72

Comments

hmaarrfk commented Nov 22, 2021

leofang commented Nov 22, 2021

hmaarrfk commented Nov 22, 2021

ngam commented May 25, 2022

ngam commented May 25, 2022

ngam commented May 25, 2022

leofang commented May 25, 2022

ngam commented May 25, 2022

ngam commented May 25, 2022

ngam commented May 27, 2022 • edited Loading

leofang commented May 27, 2022

JulianSMoore commented May 28, 2022 • edited Loading

JulianSMoore commented Jun 2, 2022 via email

JulianSMoore commented Jun 2, 2022

hmaarrfk commented Jun 5, 2022

JulianSMoore commented Jun 6, 2022

ngam commented Aug 16, 2022

jaimergp commented Aug 19, 2022

leofang commented Dec 16, 2023

ngam commented May 27, 2022 •

edited

Loading

JulianSMoore commented May 28, 2022 •

edited

Loading