Skip to content

A Vision Transformer video model with TensorRT build and inference code

License

Notifications You must be signed in to change notification settings

ohadravid/vit-trt

Repository files navigation

This repo contains a Vision Transformer video model based on the VideoMAE V2 paper and code, as well as examples for compiling the model using TensorRT and running inference using the built engine.

It is part of a blog post describing an issue with compiling this model using TensorRT - to get a working engine you'll need to find the Uncomment and change this to use the desired attention module line and use one of the working Attention layers.

Download the weights

Download the distilled checkpoint by running:

wget https://huggingface.co/OpenGVLab/VideoMAE2/resolve/main/distill/vit_s_k710_dl_from_giant.pth

How to use

After downloading the weight, you can run inference.

For the included video:

A clip of pouring water into a teapot

Running:

uv run python main.py infer

Should print something like:

making tea: 0.81
setting table: 0.01
opening door: 0.01

Commands

Note: be sure to use an Attention layer that works with TensorRT.

# Use faster settings for torch inference (half precision, torch.compile, ..)
uv run python main.py infer --fast

# Export the model to an ONNX file
uv run python main.py export_onnx

# and run inference using ONNX runtime
uv run python main.py infer_onnx

# Build a TensorRT engine from the ONNX
uv run python build_trt.py

# and run inference using the built engine
uv run python main.py infer_trt

Testing with different version of TensorRT

Checking with different TensorRT versions can be done using docker and NVIDIA's pytorch images:

$ docker run --gpus all --rm -it -v $(pwd):/code -w /code nvcr.io/nvidia/pytorch:24.12-py3 bash
$ root@cd60802e9604:/code# pip install "onnxruntime>=1.17.1" "pyav<14.0.0" "timm>=1.0.12"
$ root@cd60802e9604:/code# python ./main.py export_onnx && python ./build_trt.py  && python ./main.py infer_trt

About

A Vision Transformer video model with TensorRT build and inference code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages