-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Jetson Orin embedded cards (i.e. add "8.7" to TORCH_CUDA_ARCH_LIST
in linux-aarch64) ?
#303
Comments
The change I am suggesting to make is traversaro@bbc22e0 . I am not opening the PR to avoid overloading the CI unless necessary. |
The problem is that you also need |
+1 for this request, can strongly support the robotics need |
Just an update, by installing at the system level the consistent versions of |
+1 from me |
Related: conda-forge/arm-variant-feedstock#3 . |
I discovered some interesting stuff about support in conda-forge for jetson-like boards (that are probably obvious for Nvidia/CUDA people, but I write it down for non-expert people like me), basically the cuda packages for
and this is due to the fact that Based on conda-forge/cusparselt-feedstock#48 (comment), at least for cudnn the difference compute architecture enabled for the two cases are:
The difference are not so big. From the point of view of conda-forge, I think we can safely ignore |
Ok, I think at this point I have a more clear idea of the situation:
So, given all of this, my proposal is (that also works as
Given that we are not particularly in a hurry, I think it does not make a lot of sense of opening a PR that will use a lot of CI resources and users disk space to just do a rebuild for the traversaro@bbc22e0 change. Instead, could it make sense to add this change to one of the upcoming PRs? Thanks a lot in advance! |
We can support 11. But we need a champion for it |
Sure, but until (and unless) that happens, if we only build for CUDA >= 12 at the Jetson Orin level we just need to enable the |
so it will compile it, and users will have to do gymnastics to get it to work? I'm ok with that if you can help document the gymnastics in a short summary. |
I'm not sure I understand the need here. CUDA binaries created for compute capability |
Good point. To be honest I am always a bit confused about how this works. Indeed I will try (once I have again access to a Orin) using a stock conda-forge pytorch with cudnn and cublas with sm87, and report the result. |
As @isuruf mentioned, normally we have a binary compatibility guarantee that means packages built for any compute capability >=8.0,<=8.7 should be compatible with a device of compute capability 8.7. However, the documentation has the following admonition:
I'm not sure whether this is due to incompatibility with host code, device code, or both. Thus, the solution is probably to target mutually exclusive capabilities for sbsa and tegra instead of trying to add 8.7 to the sbsa build since it would never be used. |
Solution to issue cannot be found in the documentation.
Issue
Yesterday at work tried to use the latest cuda-enabled
pytorch
conda-forge package on alinux-aarch64
platform, the Jetson Orin (https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/, https://developer.nvidia.com/embedded/jetson-modules), that is a really popular board in robotics (at least for R&D), as it is relatively compact embedded board with a CUDA-enabled CPU board.However, running this minimal snippet:
resulted in the following error:
The reason for the error is quite clear: the compute architecture used by Jetson Orin is 8.7 (see https://developer.nvidia.com/cuda-gpus), and this architecture is not part of
TORCH_CUDA_ARCH_LIST
forlinux-aarch64
.Indeed, I tried to quickly generate packages in which the architecture is enabled (code: https://github.com/traversaro/pytorch-cpu-traversaro-fork/tree/patch-1, binary packages: https://github.com/traversaro/pytorch-cpu-traversaro-fork/releases/tag/test-builds-on-aarch64-with-sm87), and that snippet works fine.
Given how widespread the use Jetson Orin is (the whole Jetson series sold ~1 Million as of 2023, see https://blogs.nvidia.com/blog/million-jetson-developers-gtc/) I would argue that it could make sense to add
8.7
toTORCH_CUDA_ARCH_LIST
only onlinux-aarch64
, as the8.7
is a compute architecture that is not used in discrete GPUs used onlinux-64
system (at least as far as I know).While I understand that the amount of compute architectures that are enabled is always a tradeoff between maintenance effort, time taken in CI, package size, I wonder if the
8.7
architecture used in Jetson is more widespread onlinux-aarch64
as opposed to (for example) 5.0, as I doubt there are a lot (for example) ofGTX 750
GPUs installed in a system with a ARM CPU (even if the card is indeed supported bylinux-aarch64
drivers, see "Supported Products" in https://www.nvidia.com/en-us/drivers/details/237590/).Installed packages
Environment info
The text was updated successfully, but these errors were encountered: