unload the model #3281

mira-roza · 2024-03-08T13:40:20Z

Hi,
i m sorry, i don't find how unload model. like i load a model, i delete the object and i call the garbage collector but it does nothing.
How we are suppose to unload model?
I want to load a model do a batch, load an other do a batch, like that for multiple models for comparing them. But for now i must stop python each time.

hmellor · 2024-03-09T11:13:14Z

Try calling torch.cuda.empty_cache() after you delete the LLM object

chenxu2048 · 2024-03-11T05:06:37Z

You can also use gc.collect() to remove *garbage* objects immediately, after you delete them.

mira-roza · 2024-03-11T09:01:32Z

both doesn't work.

chenxu2048 · 2024-03-11T13:32:21Z

You should also clean Notebook output: https://stackoverflow.com/questions/24816237/ipython-notebook-clear-cell-output-in-code

mira-roza · 2024-03-11T14:12:46Z

i always do (In the GUI not in my cells)

mnoukhov · 2024-03-28T15:23:14Z

this seems mostly solved by #1908 with

import gc

import torch
from vllm import LLM, SamplingParams
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Load the model via vLLM
llm = LLM(model=model_name, download_dir=saver_dir, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)

# Delete the llm object and free the memory
destroy_model_parallel()
del llm.llm_engine.driver_worker
del llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully delete the llm pipeline and free the GPU memory!")

mira-roza · 2024-04-02T12:20:07Z

this seems mostly solved by #1908 with

import gc

import torch
from vllm import LLM, SamplingParams
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Load the model via vLLM
llm = LLM(model=model_name, download_dir=saver_dir, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)

# Delete the llm object and free the memory
destroy_model_parallel()
del llm.llm_engine.driver_worker
del llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully delete the llm pipeline and free the GPU memory!")

i had already read that. My problem stay unsolved when i use the Vllm from llamaindex otherwise it almost works. I've a little of memory that stay used (~1GB) but at least i can load and unload the models. the problem is that i don't find how access to the member llm_engine of Vllm.LLM

vvolhejn · 2024-10-01T12:14:56Z

@chenxu2048 the notebook output is just computed data shown to the user, the Python kernel computes it but it's a one-way communication - the output doesn't affect the kernel at all. Therefore clearing the output will have no effect on GPU memory or any other state of the kernel.

david-koleckar · 2024-10-08T19:15:25Z

No resulute answer given. Can be model unload from gpu ram with vllm? Yes or no

powerLEO101 · 2024-10-31T23:43:11Z

No resulute answer given. Can be model unload from gpu ram with vllm? Yes or no

Worst case scenario: use notebook magic %write to write the script in a python file, and run the python file within the notebook. When vllm finish run the memory will be recollected

github-actions · 2025-02-14T01:59:34Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

lizzzcai mentioned this issue Jun 13, 2024

[Feature]: load/unload API to run multiple LLMs in a single GPU instance #5491

Open

russellb mentioned this issue Oct 16, 2024

[Usage]: Out of Memory w/ multiple models #4678

Closed

github-actions bot added the stale label Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unload the model #3281

unload the model #3281

mira-roza commented Mar 8, 2024

hmellor commented Mar 9, 2024

chenxu2048 commented Mar 11, 2024

mira-roza commented Mar 11, 2024

chenxu2048 commented Mar 11, 2024

mira-roza commented Mar 11, 2024 •

edited

Loading

mnoukhov commented Mar 28, 2024

mira-roza commented Apr 2, 2024 •

edited

Loading

vvolhejn commented Oct 1, 2024

david-koleckar commented Oct 8, 2024

powerLEO101 commented Oct 31, 2024

github-actions bot commented Feb 14, 2025

unload the model #3281

unload the model #3281

Comments

mira-roza commented Mar 8, 2024

hmellor commented Mar 9, 2024

chenxu2048 commented Mar 11, 2024

mira-roza commented Mar 11, 2024

chenxu2048 commented Mar 11, 2024

mira-roza commented Mar 11, 2024 • edited Loading

mnoukhov commented Mar 28, 2024

mira-roza commented Apr 2, 2024 • edited Loading

vvolhejn commented Oct 1, 2024

david-koleckar commented Oct 8, 2024

powerLEO101 commented Oct 31, 2024

github-actions bot commented Feb 14, 2025

mira-roza commented Mar 11, 2024 •

edited

Loading

mira-roza commented Apr 2, 2024 •

edited

Loading