[Usage]: Out of Memory w/ multiple models #4678

yudataguy · 2024-05-08T08:37:27Z

Your current environment

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU

How would you like to use vllm

I'm running a eval framework that's evaluating multiple models. vllm doesn't seem to free the gpu memory after initialize the 2nd model (with the same variable name), how to free up gpu memory with each vLLMEngine call llm = LLM(new_model)

The text was updated successfully, but these errors were encountered:

yudataguy · 2024-05-08T10:05:25Z

Tried methods from #1908
no success

russellb · 2024-10-16T19:19:29Z

the LLM engine internal to the LLM class should get destroyed when your LLM instance is garbage collected. You could try forcing that with del(llm).

russellb · 2024-10-16T19:20:39Z

more detailed input on this in #3281

russellb · 2024-10-16T19:27:03Z

going to close this since it's a duplicate of #3281

yudataguy added the usage How to use vllm label May 8, 2024

russellb closed this as completed Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Out of Memory w/ multiple models #4678

[Usage]: Out of Memory w/ multiple models #4678

yudataguy commented May 8, 2024

yudataguy commented May 8, 2024

russellb commented Oct 16, 2024

russellb commented Oct 16, 2024

russellb commented Oct 16, 2024

[Usage]: Out of Memory w/ multiple models #4678

[Usage]: Out of Memory w/ multiple models #4678

Comments

yudataguy commented May 8, 2024

Your current environment

How would you like to use vllm

yudataguy commented May 8, 2024

russellb commented Oct 16, 2024

russellb commented Oct 16, 2024

russellb commented Oct 16, 2024