Fix CUDA OOM when creating Mixtral checkpoint #1629

VivekBits2210 · 2024-05-19T21:00:25Z

Move w1,w2,w3 tensors to cpu before stacking.

byshiue · 2024-05-21T02:20:53Z

Thank you for the report. But forcing the tensor on cpu might lead to cpu OOM on other environment. Have you tried using --load_model_on_cpu during converting the checkpoint.

nv-guomingz · 2024-06-05T09:57:30Z

Hi @VivekBits2210 , do you manage to create mixtral checkpoint with @byshiue 's suggestion?
If so, I'd like to close this PR at current stage.

ghost · 2024-07-17T01:16:33Z

Have you tried using --load_model_on_cpu during converting the checkpoint.
This won't work for some quantization cases where you need to run the model on some calibration dataset. The model cannot be loaded on CPU.

Move w1,w2,w3 tensors to cpu before stacking

67c872b

byshiue self-assigned this May 21, 2024

byshiue added the triaged Issue has been triaged by maintainers label May 21, 2024

nv-guomingz added the waiting for feedback label Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA OOM when creating Mixtral checkpoint #1629

Fix CUDA OOM when creating Mixtral checkpoint #1629

VivekBits2210 commented May 19, 2024

byshiue commented May 21, 2024

nv-guomingz commented Jun 5, 2024

ghost commented Jul 17, 2024

Fix CUDA OOM when creating Mixtral checkpoint #1629

Are you sure you want to change the base?

Fix CUDA OOM when creating Mixtral checkpoint #1629

Conversation

VivekBits2210 commented May 19, 2024

byshiue commented May 21, 2024

nv-guomingz commented Jun 5, 2024

ghost commented Jul 17, 2024