Skip to content

🐛 [Bug] InvalidVersion error when trying to use torch_tensorrt that was compiled in a Jetson container image #2112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
airalcorn2 opened this issue Jul 13, 2023 · 3 comments · Fixed by #2118
Assignees
Labels
bug Something isn't working

Comments

@airalcorn2
Copy link

airalcorn2 commented Jul 13, 2023

Bug Description

import torch_tensorrt

raises an InvalidVersion error when installed using:

python3 setup.py install --use-cxx11-abi

on a Jetson Xavier AGX using the dustynv/ros:humble-pytorch-l4t-r35.3.1 container base image found here. The issue appears to be caused by the fact that torch.__version__ == "2.0.0.nv23.05". Changing the line here:

if version.parse(torch.__version__) >= version.parse("2.dev"):

to:

if version.parse(".".join(torch.__version__.split(".")[:3])) >= version.parse("2.dev"):

makes the error go away.

To Reproduce

Start a container using the dustynv/ros:humble-pytorch-l4t-r35.3.1 base image found here.

apt update
apt install default-jdk unzip zip

PYTHON_VERSION=$(python --version | sed -n 's/^Python \([0-9]\+\.[0-9]\+\).*$/\1/p')

CUDA_VERSION=$(nvcc --version | sed -n 's/^.*release \([0-9]\+\.[0-9]\+\).*$/\1/p')

CUDNN_MAJOR=$(cat /usr/include/cudnn_version.h | sed -n 's/^.*CUDNN_MAJOR \([0-9]\+\).*$/\1/p')
CUDNN_MINOR=$(cat /usr/include/cudnn_version.h | sed -n 's/^.*CUDNN_MINOR \([0-9]\+\).*$/\1/p')
CUDNN_VERSION=${CUDNN_MAJOR}.${CUDNN_MINOR}

TENSORRT_MAJOR=$(cat /usr/include/aarch64-linux-gnu/NvInferVersion.h | sed -n 's/^.*NV_TENSORRT_MAJOR \([0-9]\+\).*$/\1/p')
TENSORRT_MINOR=$(cat /usr/include/aarch64-linux-gnu/NvInferVersion.h | sed -n 's/^.*NV_TENSORRT_MINOR \([0-9]\+\).*$/\1/p')
TENSORRT_VERSION=${TENSORRT_MAJOR}.${TENSORRT_MINOR}

git clone -b v1.4.0 https://github.com/pytorch/TensorRT.git

cd TensorRT
cp toolchains/jp_workspaces/WORKSPACE.jp50 WORKSPACE
sed -i "s/python[0-9]\+.[0-9]\+/python${PYTHON_VERSION}/g" WORKSPACE
sed -i "s/cuda-[0-9]\+.[0-9]\+/cuda-${CUDA_VERSION}/g" WORKSPACE

BAZEL_VERSION=$(cat .bazelversion)
mkdir bazel
cd bazel
curl -fSsL -O https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-dist.zip
unzip bazel-${BAZEL_VERSION}-dist.zip
bash ./compile.sh
cp output/bazel /usr/local/bin/

cd ../py
TORCH_TENSORRT_VERSION=$(cat versions.py | sed -n 's/^__version__ = "\(.*\)"/\1/p')
echo "__version__ = \"${TORCH_TENSORRT_VERSION}\"" > versions.py
echo "__cuda_version__ = \"${CUDA_VERSION}\"" >> versions.py
echo "__cudnn_version__ = \"${CUDNN_VERSION}\"" >> versions.py
echo "__tensorrt_version__ = \"${TENSORRT_VERSION}\"" >> versions.py

python3 setup.py install --use-cxx11-abi

cd
python3 -c "import torch_tensorrt; print(torch_tensorrt.__version__)"

Expected behavior

No error.

Environment

A Jetson Xavier AGX and the container base image found here.

Additional context

@airalcorn2 airalcorn2 added the bug Something isn't working label Jul 13, 2023
@airalcorn2
Copy link
Author

I'm realizing now this is probably not the correct workflow, and that I should instead be using Torch-TensorRT outside of the container to compile the model and then providing the compiled model to the container.

@airalcorn2
Copy link
Author

Re-opening because the fact that the repository points to containers suggests this should be an OK strategy.

@gs-olive
Copy link
Collaborator

Hi - thanks for the report. I am able to reproduce the issue, and I've linked a PR that will address this versioning check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants