Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Docker images of tensorrt_llm #605

Open
nmq45698 opened this issue Aug 30, 2024 · 11 comments
Open

Request for Docker images of tensorrt_llm #605

nmq45698 opened this issue Aug 30, 2024 · 11 comments

Comments

@nmq45698
Copy link

No description provided.

@nmq45698 nmq45698 changed the title Image Request for Docker images of tensorrt_llm Aug 30, 2024
@nmq45698
Copy link
Author

HI there!
Currently i'm trying to deploy some small models (say gpt-2) with smoothquant on Jetson-AGX Orin.
However, there is no results upon searching the image "tensorrt_llm:35.2.1" on dockerhub.
I've noticed the your comments on #564 that the trtllm will be available soon.
So, are there update for docker image to support the "docker run" about tensorrt_llm on Orin?
thx :P

@nmq45698 nmq45698 reopened this Aug 30, 2024
@dusty-nv
Copy link
Owner

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

@johnnynunez
Copy link
Contributor

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

@nmq45698
Copy link
Author

nmq45698 commented Sep 6, 2024

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

Thanks for your remind! :P
Actually, I've noticed such repo about TensorRT-LLM, with engines successfully built on GPU with amd64.

While for Jetson AGX Orin, I've tried under some docker images pulled from @dusty-nv dustynv/... r35.x.x, which seems not supportive for TensorRT 10.3 due to the GLIBC under Ubuntu 20.04. The kINT64, kBF16 in TensorRT-LLM could not be compiled.

Then, I tried on docker images pulled from dustynv/... r36.x.x (say nano_llm) based on Ubuntu 22.04, the TensorRT 10.3 was successfully installed.
However, another error about libnvinfer.so occurred when running /scripts/build_wheels.py, as well as import tensorrt in python!
(Pdb) import tensorrt *** ImportError: /lib/aarch64-linux-gnu/libnvinfer.so.10: undefined symbol: _ZN5nvdla8IProfile36setCanGenerateDetailedLayerwiseStatsEb

Have you ever encountered that upon building TensorRT-LLM on Orin? ^v^

@johnnynunez
Copy link
Contributor

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

Thanks for your remind! :P Actually, I've noticed such repo about TensorRT-LLM, with engines successfully built on GPU with amd64.

While for Jetson AGX Orin, I've tried under some docker images pulled from @dusty-nv dustynv/... r35.x.x, which seems not supportive for TensorRT 10.3 due to the GLIBC under Ubuntu 20.04. The kINT64, kBF16 in TensorRT-LLM could not be compiled.

Then, I tried on docker images pulled from dustynv/... r36.x.x (say nano_llm) based on Ubuntu 22.04, the TensorRT 10.3 was successfully installed. However, another error about libnvinfer.so occurred when running /scripts/build_wheels.py, as well as import tensorrt in python! (Pdb) import tensorrt *** ImportError: /lib/aarch64-linux-gnu/libnvinfer.so.10: undefined symbol: _ZN5nvdla8IProfile36setCanGenerateDetailedLayerwiseStatsEb

Have you ever encountered that upon building TensorRT-LLM on Orin? ^v^

Do you use jetpack 5?

@Hrishikeshh-nd
Copy link

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

@johnnynunez Does this release support Jetson Orin or AGX? The official documentation page still states that there's only support for x86_64 architecture at the moment, and nothing about aarch64.

@johnnynunez
Copy link
Contributor

johnnynunez commented Nov 14, 2024

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

@johnnynunez Does this release support Jetson Orin or AGX? The official documentation page still states that there's only support for x86_64 architecture at the moment, and nothing about aarch64.

https://www.jetson-ai-lab.com/tensorrt_llm.html

@dusty-nv in the first example, change the name to  dustynv/tensorrt_llm:0.12-r36.4.0

@dusty-nv
Copy link
Owner

@dusty-nv in the first example, change the name to  dustynv/tensorrt_llm:0.12-r36.4.0

Ahh thanks @johnnynunez, just fixed that in NVIDIA-AI-IOT/jetson-generative-ai-playground#229

@Hrishikeshh-nd the Orin support is in this TRT-LLM branch: https://github.com/NVIDIA/TensorRT-LLM/tree/v0.12.0-jetson

@shahizat
Copy link

hi @dusty-nv, @johnnynunez, If I am not mistaken, we also need to add --tokenizer path to the openai_server.py command. See this link: NVIDIA/TensorRT-LLM#2357

@shahizat
Copy link

Hi @dusty-nv, I successfully run the openai_server.py with endpoint exposed, but can not send the request, always shows "port 8000 after 0 ms: Connection refused"

@Hrishikeshh-nd
Copy link

@dusty-nv in the first example, change the name to  dustynv/tensorrt_llm:0.12-r36.4.0

Ahh thanks @johnnynunez, just fixed that in NVIDIA-AI-IOT/jetson-generative-ai-playground#229

@Hrishikeshh-nd the Orin support is in this TRT-LLM branch: https://github.com/NVIDIA/TensorRT-LLM/tree/v0.12.0-jetson

Thank you @dusty-nv @johnnynunez !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants