Please follow the instructions below to run the VTimeLLM inference on your local GPU machine.
Note: Our demo requires approximately 18 GB of GPU memory.
conda create --name=vtimellm python=3.10
conda activate vtimellm
git clone https://github.com/huangb23/VTimeLLM.git
cd VTimeLLM
pip install -r requirements.txt
- Download clip model and vtimellm model from the Tsinghua Cloud and place them into the 'checkpoints' directory.
- Download Vicuna v1.5 or ChatGLM3-6b weights.
python -m vtimellm.inference --model_base <path to the Vicuna v1.5 weights>
Alternatively, you can also choose to conduct multi-turn conversations in Jupyter Notebook. Similarly, you need to set 'args.model_base' to the path of Vicuna v1.5.
If you want to run the VTimeLLM-ChatGLM version, please refer to the code in inference_for_glm.ipynb.
We have provided an offline gradio demo as follows:
cd vtimellm
python demo_gradio.py --model_base <path to the Vicuna v1.5 weights>