Request for Docker images of tensorrt_llm #605

nmq45698 · 2024-08-30T01:57:51Z

No description provided.

nmq45698 · 2024-08-30T02:13:25Z

HI there!
Currently i'm trying to deploy some small models (say gpt-2) with smoothquant on Jetson-AGX Orin.
However, there is no results upon searching the image "tensorrt_llm:35.2.1" on dockerhub.
I've noticed the your comments on #564 that the trtllm will be available soon.
So, are there update for docker image to support the "docker run" about tensorrt_llm on Orin?
thx :P

dusty-nv · 2024-08-30T19:09:04Z

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

johnnynunez · 2024-09-05T00:25:11Z

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

nmq45698 · 2024-09-06T03:07:34Z

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

Thanks for your remind! :P
Actually, I've noticed such repo about TensorRT-LLM, with engines successfully built on GPU with amd64.

While for Jetson AGX Orin, I've tried under some docker images pulled from @dusty-nv dustynv/... r35.x.x, which seems not supportive for TensorRT 10.3 due to the GLIBC under Ubuntu 20.04. The kINT64, kBF16 in TensorRT-LLM could not be compiled.

Then, I tried on docker images pulled from dustynv/... r36.x.x (say nano_llm) based on Ubuntu 22.04, the TensorRT 10.3 was successfully installed.
However, another error about libnvinfer.so occurred when running /scripts/build_wheels.py, as well as import tensorrt in python!
(Pdb) import tensorrt *** ImportError: /lib/aarch64-linux-gnu/libnvinfer.so.10: undefined symbol: _ZN5nvdla8IProfile36setCanGenerateDetailedLayerwiseStatsEb

Have you ever encountered that upon building TensorRT-LLM on Orin? ^v^

johnnynunez · 2024-09-06T10:58:47Z

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

Thanks for your remind! :P Actually, I've noticed such repo about TensorRT-LLM, with engines successfully built on GPU with amd64.

While for Jetson AGX Orin, I've tried under some docker images pulled from @dusty-nv dustynv/... r35.x.x, which seems not supportive for TensorRT 10.3 due to the GLIBC under Ubuntu 20.04. The kINT64, kBF16 in TensorRT-LLM could not be compiled.

Then, I tried on docker images pulled from dustynv/... r36.x.x (say nano_llm) based on Ubuntu 22.04, the TensorRT 10.3 was successfully installed. However, another error about libnvinfer.so occurred when running /scripts/build_wheels.py, as well as import tensorrt in python! (Pdb) import tensorrt *** ImportError: /lib/aarch64-linux-gnu/libnvinfer.so.10: undefined symbol: _ZN5nvdla8IProfile36setCanGenerateDetailedLayerwiseStatsEb

Have you ever encountered that upon building TensorRT-LLM on Orin? ^v^

Do you use jetpack 5?

Hrishikeshh-nd · 2024-11-14T19:14:04Z

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

@johnnynunez Does this release support Jetson Orin or AGX? The official documentation page still states that there's only support for x86_64 architecture at the moment, and nothing about aarch64.

johnnynunez · 2024-11-14T23:02:40Z

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

@johnnynunez Does this release support Jetson Orin or AGX? The official documentation page still states that there's only support for x86_64 architecture at the moment, and nothing about aarch64.

https://www.jetson-ai-lab.com/tensorrt_llm.html

@dusty-nv in the first example, change the name to dustynv/tensorrt_llm:0.12-r36.4.0

dusty-nv · 2024-11-15T03:49:02Z

@dusty-nv in the first example, change the name to dustynv/tensorrt_llm:0.12-r36.4.0

Ahh thanks @johnnynunez, just fixed that in NVIDIA-AI-IOT/jetson-generative-ai-playground#229

@Hrishikeshh-nd the Orin support is in this TRT-LLM branch: https://github.com/NVIDIA/TensorRT-LLM/tree/v0.12.0-jetson

shahizat · 2024-11-15T14:48:52Z

hi @dusty-nv, @johnnynunez, If I am not mistaken, we also need to add --tokenizer path to the openai_server.py command. See this link: NVIDIA/TensorRT-LLM#2357

shahizat · 2024-11-15T15:18:01Z

Hi @dusty-nv, I successfully run the openai_server.py with endpoint exposed, but can not send the request, always shows "port 8000 after 0 ms: Connection refused"

Hrishikeshh-nd · 2024-11-15T21:50:02Z

@dusty-nv in the first example, change the name to dustynv/tensorrt_llm:0.12-r36.4.0

Ahh thanks @johnnynunez, just fixed that in NVIDIA-AI-IOT/jetson-generative-ai-playground#229

@Hrishikeshh-nd the Orin support is in this TRT-LLM branch: https://github.com/NVIDIA/TensorRT-LLM/tree/v0.12.0-jetson

Thank you @dusty-nv @johnnynunez !

nmq45698 closed this as completed Aug 30, 2024

nmq45698 changed the title ~~Image~~ Request for Docker images of tensorrt_llm Aug 30, 2024

nmq45698 reopened this Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Docker images of tensorrt_llm #605

Request for Docker images of tensorrt_llm #605

nmq45698 commented Aug 30, 2024

nmq45698 commented Aug 30, 2024

dusty-nv commented Aug 30, 2024

johnnynunez commented Sep 5, 2024

nmq45698 commented Sep 6, 2024

johnnynunez commented Sep 6, 2024

Hrishikeshh-nd commented Nov 14, 2024

johnnynunez commented Nov 14, 2024 •

edited

Loading

dusty-nv commented Nov 15, 2024

shahizat commented Nov 15, 2024

shahizat commented Nov 15, 2024

Hrishikeshh-nd commented Nov 15, 2024

Request for Docker images of tensorrt_llm #605

Request for Docker images of tensorrt_llm #605

Comments

nmq45698 commented Aug 30, 2024

nmq45698 commented Aug 30, 2024

dusty-nv commented Aug 30, 2024

johnnynunez commented Sep 5, 2024

nmq45698 commented Sep 6, 2024

johnnynunez commented Sep 6, 2024

Hrishikeshh-nd commented Nov 14, 2024

johnnynunez commented Nov 14, 2024 • edited Loading

dusty-nv commented Nov 15, 2024

shahizat commented Nov 15, 2024

shahizat commented Nov 15, 2024

Hrishikeshh-nd commented Nov 15, 2024

johnnynunez commented Nov 14, 2024 •

edited

Loading