Skip to content

Commit

Permalink
release gpu vram after layer.fwd (ModelCloud#616)
Browse files Browse the repository at this point in the history
Co-authored-by: LRL-ModelCloud <lrl@modelcloud.ai>
  • Loading branch information
LRL-ModelCloud and LRL-ModelCloud authored Nov 19, 2024
1 parent ee4ede5 commit 416e47f
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions gptqmodel/models/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -541,6 +541,8 @@ def tmp(_, inp, out):
additional_layer_inputs[k] = nested_move_to(v, cur_layer_device)
with torch.no_grad():
layer(*layer_input, **additional_layer_inputs)

torch.cuda.empty_cache()
for h in handles:
h.remove()

Expand Down Expand Up @@ -615,6 +617,8 @@ def tmp(_, inp, out):
)
layer_outputs.append([layer_output])

torch.cuda.empty_cache()

layers[i] = move_to(layer, CPU if force_layer_back_to_cpu else cur_layer_device)
del layer
del gptq
Expand Down

0 comments on commit 416e47f

Please sign in to comment.