Add lazy eval during conversion #127

Blaizzy · 2024-11-27T02:18:49Z

This PR implements lazy evaluation during the model loading and conversion process to improve memory efficiency. By utilizing MLX's lazy evaluation feature, we defer actual memory allocation until weights are needed.

Key changes:

Added lazy=True parameter when fetching models from HuggingFace hub
Memory allocation now happens on-demand rather than eagerly
Works with both float16 (quantized) and user-specified dtype configurations

Benefits:

Reduced peak memory usage during model conversion
Smoother handling of large models, particularly when converting to float16
Maintains compatibility with existing model loading workflows

Read more here: https://ml-explore.github.io/mlx/build/html/usage/lazy_evaluation.html

add lazy eval during conversion

78b5d89

Blaizzy merged commit 61e0503 into main Nov 27, 2024
1 check passed

Blaizzy deleted the pc/lazy-convert branch November 27, 2024 02:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lazy eval during conversion #127

Add lazy eval during conversion #127

Blaizzy commented Nov 27, 2024 •

edited

Loading

Add lazy eval during conversion #127

Add lazy eval during conversion #127

Conversation

Blaizzy commented Nov 27, 2024 • edited Loading

Key changes:

Benefits:

Blaizzy commented Nov 27, 2024 •

edited

Loading