Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ptxas executable #72

Closed
1 task done
hmaarrfk opened this issue Nov 22, 2021 · 18 comments
Closed
1 task done

ptxas executable #72

hmaarrfk opened this issue Nov 22, 2021 · 18 comments

Comments

@hmaarrfk
Copy link

I believe that the ptxas executable should be available here.

It seems that tensorflow (at least v1) attempts to use it.

However, when I create a fresh environment with cudatoolkit 11.2 it doesn't seem to be included.

Issue:


Environment (conda list):
$ conda list
# packages in environment at /home/mark/miniforge3/envs/cudatoolkit:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
cudatoolkit               11.2.0               h73cb219_9    conda-forge
libgcc-ng                 11.2.0              h1d223b6_11    conda-forge
libgomp                   11.2.0              h1d223b6_11    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_11    conda-forge

Details about conda and system ( conda info ):
$ conda info

     active environment : cudatoolkit
    active env location : /home/mark/miniforge3/envs/cudatoolkit
            shell level : 3
       user config file : /home/mark/.condarc
 populated config files : /home/mark/miniforge3/.condarc
                          /home/mark/.condarc
          conda version : 4.10.3
    conda-build version : 3.21.4
         python version : 3.8.12.final.0
       virtual packages : __cuda=11.2=0
                          __linux=5.11.0=0
                          __glibc=2.31=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/mark/miniforge3  (writable)
      conda av data dir : /home/mark/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/mark/miniforge3/pkgs
                          /home/mark/.conda/pkgs
       envs directories : /home/mark/miniforge3/envs
                          /home/mark/.conda/envs
               platform : linux-64
             user-agent : conda/4.10.3 requests/2.26.0 CPython/3.8.12 Linux/5.11.0-40-generic ubuntu/20.04.3 glibc/2.31
                UID:GID : 1002:1002
             netrc file : None
           offline mode : False

xref: conda-forge/tensorflow-feedstock#170

@leofang
Copy link
Member

leofang commented Nov 22, 2021

Currently cudatoolkit does not contain any executable or header from CUDA Toolkit because of the EULA limitation. The new package format (#62) will address this issue but I am unaware of any timeline for that.

@hmaarrfk
Copy link
Author

Understood. Thank you for the explination

@ngam
Copy link

ngam commented May 25, 2022

@leofang, I am a little confused about what is and is not available through the main cudatoolkit (the one in this stock). In particular:

  • is nvcc compiler available?
  • are things like cublas available?

I am primarily interested in essentially recreating what NVIDIA offers in their NGC containers in our packaging of tensorflow and pytorch; one of the key missing items I've been working on is activating XLA for tensorflow, which requires the correct compilers etc. being available. Do you have any advice?

@ngam
Copy link

ngam commented May 25, 2022

If not available, where are these things available in conda-forge? For example, I saw that some things are available through cudatoolkit-dev and I believe we have nvcc feedstock... Are we supposed to be using them that way? My understanding has been that these things should be bundled in cudatoolkit (as the first sentence above asserts)

@ngam
Copy link

ngam commented May 25, 2022

cc @jakirkham for viz and comment

@leofang
Copy link
Member

leofang commented May 25, 2022

  • the offline compiler (nvcc) and headers are not available
    • on conda-forge, they are only available in the conda-forge's docker images that are deployed to the CIs for building packages
    • nvcc-feedstock is just a thin wrapper on top of the image's nvcc, no real use case outside of the CI AFAIK
    • to me cudatoolkit-dev technically violates the CUDA EULA, I don't know how it still exists, but perhaps it's not my business to ask 🙂
  • runtime shared libraries for cuBLAS etc are available

@ngam
Copy link

ngam commented May 25, 2022

  • runtime shared libraries for cuBLAS etc are available

Where? Here?

  • the offline compiler (nvcc) and headers are not available

Okay, let me try to see what exactly is needed for tensorflow and pytorch and we can work on addressing these issues as they come.

on conda-forge, they are only available in the conda-forge's docker images that are deployed to the CIs for building packages

Since you talk about the EULA, etc. --- is using that conda-forge CI image for someone's production work okay or is it only for CI? I believe I saw it was based on the cuda-devel Docker images, so the licensing might be exactly the same as that (those cuda-devel images are the main building blocks for all NGC containers as far as I understand...)

@ngam
Copy link

ngam commented May 25, 2022

Thanks for the prompt and clear answer by the way, 👍 @leofang

@ngam
Copy link

ngam commented May 27, 2022

Btw, as far as I could tell, we are good for the XLA implementation, though I need to do more local testing to see if there are additional issues to resolve. @leofang, if you're interested in having a look, see conda-forge/tensorflow-feedstock#246

@leofang
Copy link
Member

leofang commented May 27, 2022

Where? Here?

Yes

Since you talk about the EULA, etc. --- is using that conda-forge CI image for someone's production work okay or is it only for CI?

I think it's OK. The CUDA images and their derivatives (including conda-forge's) are permissive. By using them users acknowledge the terms and conditions.

@JulianSMoore
Copy link

JulianSMoore commented May 28, 2022

I have just encountered the issue that a particular TF model built from TF hub and run in Jupterlab in an Anaconda environment in which I had installed cuDNN and cudatoolkit raised issues because is missing from the conda-forge source.

I confess I cannot follow the discussion above. Can someone explain in plainer language why this happened in CUDA toolkit 11.2 and what the future will be like?

It doesn't make sense to someone me: if the DLL's etc are available then why not the ptxas.exe??? I'm a user... I just want it to run. (NB finding out why ptxas was an issue and what to do about it was a PITA - now I have an installation, can I just copy ptxas into the environment somewhere appropriate??)

I had to install cuda in the OS, which undermines the value of having conda environments with the CUDA stuff in them.

Setup: win-10 home 64 bit 21H2.

UPDATE

Issue finally resolved by conda install -y -q -c nvidia cuda-nvcc to bring in ptxas.exe

But... hours to work this out. I'm going to give the TF crew a fair share of the blame for not giving enough info about ptxas and where TF is looking etc. Not their first offense... XLA_FLAGS is also screwed up

@JulianSMoore
Copy link

JulianSMoore commented Jun 2, 2022 via email

@JulianSMoore
Copy link

Not obvious to me that the notebook attached to email is available here, so attaching it separately in zip.

classification_ViT-checkpoint - Copy.zip

@hmaarrfk
Copy link
Author

hmaarrfk commented Jun 5, 2022

@JulianSMoore sorry we can't be of more help. A few things:

  1. The anaconda channel is different than the conda-forge channel
  2. The conda-forge channel still doesn't have many packages for cuda+windows
  3. The conda-forge channels doesn't have tensorflow for windows
  4. conda-forge is different from pypi and as such, the knowledge you gain here may not translate.

it seems that you are installing things through the Tensorflow recommended ways (pypi) and as such I would recommend you as a question on their forums. We simply don't have the knowledge to help you troubleshoot your system on windows.

@JulianSMoore
Copy link

@hmaarrfk Perfectly understood (wasn't expecting you to troubleshoot!) & your info will be helpful.

For the benefit of others: some required s/w support for TensorFlow (e.g. ptxas, for ViT model from TF hub) seems to lie outside cudatoolkit. If you encounter a similar issue, first check your paths, then think about the libraries used/needed and finally consider packages from different channels. Hard to be more specific than that, unfortunately. (I use conda installation for cuda tookit and cuDNN because that is the only way I know to install cuda in an anaconda env (rather than OS) - everything else I do with pip)

@ngam
Copy link

ngam commented Aug 16, 2022

Any update on this?

We need ptxas for tensorflow and jax going forward. I am not sure if there is any point in continuing our crazy efforts in maintaining cuda builds if we are not going to have access to ptxas: It is simply needed. I am personally not going to participate in any cuda builds in tensorflow and jaxlib until this is fixed. (I have been the primary pusher for the latest tensorflow and jaxlib builds as others are busier than usual.)

At the end of the day, if someone has to install system cudatoolkit anyway, there is no point in getting it from conda-forge. I would be more inclined to pursue lighter builds along the lines of #81 instead.

@conda-forge/cudatoolkit could we please get some clarity on this soon? Or at least a response about what is stopping us from resolving it?

cc @conda-forge/core

@jaimergp
Copy link
Member

Isn't the main problem that the Nvidia EULA prevents us from distributing PTXAS and other binaries? We can't just ignore that. This all will go away when #62 lands and Nvidia officially distributes their packages on conda, which I assume would have a permissive enough license for us to redistribute. Til then... we can't do much, sorry.

@leofang
Copy link
Member

leofang commented Dec 16, 2023

Let's close this issue now that it is resolved with CUDA 12. Thanks everyone for the discussion and request.

@leofang leofang closed this as completed Dec 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants