Skip to content

Latest commit

 

History

History
40 lines (26 loc) · 1.34 KB

offline_demo.md

File metadata and controls

40 lines (26 loc) · 1.34 KB

Running VTimeLLM inference Offline

Please follow the instructions below to run the VTimeLLM inference on your local GPU machine.

Note: Our demo requires approximately 18 GB of GPU memory.

Clone the VTimeLLM repository

conda create --name=vtimellm python=3.10
conda activate vtimellm

git clone https://github.com/huangb23/VTimeLLM.git
cd VTimeLLM
pip install -r requirements.txt

Download weights

Run the inference code

python -m vtimellm.inference --model_base <path to the Vicuna v1.5 weights> 

Alternatively, you can also choose to conduct multi-turn conversations in Jupyter Notebook. Similarly, you need to set 'args.model_base' to the path of Vicuna v1.5.

If you want to run the VTimeLLM-ChatGLM version, please refer to the code in inference_for_glm.ipynb.

Run the gradio demo

We have provided an offline gradio demo as follows:

cd vtimellm
python demo_gradio.py --model_base <path to the Vicuna v1.5 weights>