You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU
How would you like to use vllm
I'm running a eval framework that's evaluating multiple models. vllm doesn't seem to free the gpu memory after initialize the 2nd model (with the same variable name), how to free up gpu memory with each vLLMEngine call llm = LLM(new_model)
The text was updated successfully, but these errors were encountered:
Your current environment
How would you like to use vllm
I'm running a eval framework that's evaluating multiple models. vllm doesn't seem to free the gpu memory after initialize the 2nd model (with the same variable name), how to free up gpu memory with each vLLMEngine call
llm = LLM(new_model)
The text was updated successfully, but these errors were encountered: