Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lazy eval during conversion #127

Merged
merged 1 commit into from
Nov 27, 2024
Merged

Add lazy eval during conversion #127

merged 1 commit into from
Nov 27, 2024

Conversation

Blaizzy
Copy link
Owner

@Blaizzy Blaizzy commented Nov 27, 2024

This PR implements lazy evaluation during the model loading and conversion process to improve memory efficiency. By utilizing MLX's lazy evaluation feature, we defer actual memory allocation until weights are needed.

Key changes:

  • Added lazy=True parameter when fetching models from HuggingFace hub
  • Memory allocation now happens on-demand rather than eagerly
  • Works with both float16 (quantized) and user-specified dtype configurations

Benefits:

  • Reduced peak memory usage during model conversion
  • Smoother handling of large models, particularly when converting to float16
  • Maintains compatibility with existing model loading workflows

Read more here: https://ml-explore.github.io/mlx/build/html/usage/lazy_evaluation.html

@Blaizzy Blaizzy merged commit 61e0503 into main Nov 27, 2024
1 check passed
@Blaizzy Blaizzy deleted the pc/lazy-convert branch November 27, 2024 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant