Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CUDA OOM when creating Mixtral checkpoint #1629

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

VivekBits2210
Copy link

Move w1,w2,w3 tensors to cpu before stacking.

@byshiue
Copy link
Collaborator

byshiue commented May 21, 2024

Thank you for the report. But forcing the tensor on cpu might lead to cpu OOM on other environment. Have you tried using --load_model_on_cpu during converting the checkpoint.

@byshiue byshiue self-assigned this May 21, 2024
@byshiue byshiue added the triaged Issue has been triaged by maintainers label May 21, 2024
@nv-guomingz
Copy link
Collaborator

Hi @VivekBits2210 , do you manage to create mixtral checkpoint with @byshiue 's suggestion?
If so, I'd like to close this PR at current stage.

@ghost
Copy link

ghost commented Jul 17, 2024

Have you tried using --load_model_on_cpu during converting the checkpoint.
This won't work for some quantization cases where you need to run the model on some calibration dataset. The model cannot be loaded on CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers waiting for feedback
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants