Qwen2.5 14b does not fit on Mac m1 32gb #1058

hhamud · 2025-01-12T18:15:39Z

I recieve this error

running 1 test
2025-01-12T18:12:29.336835Z  INFO mistralrs_core::pipeline::normal: Loading `tokenizer.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.337533Z  INFO mistralrs_core::pipeline::normal: Loading `config.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.472416Z  INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00001-of-00008.safetensors", "model-00002-of-00008.safetensors", "model-00003-of-00008.safetensors", "model-00004-of-00008.safetensors", "model-00005-of-00008.safetensors", "model-00006-of-00008.safetensors", "model-00007-of-00008.safetensors", "model-00008-of-00008.safetensors"]
2025-01-12T18:12:29.582263Z  INFO mistralrs_core::pipeline::normal: Loading `generation_config.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.815600Z  INFO mistralrs_core::pipeline::normal: Loading `tokenizer_config.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.924620Z  INFO mistralrs_core::utils::normal: DType selected is BF16.
2025-01-12T18:12:29.925200Z  INFO mistralrs_core::utils::log: Automatic loader type determined to be `qwen2`
thread 'llm::tests::test_spawn_llm' panicked at src/llm.rs:65:14:
called `Result::unwrap()` on an `Err` value: This model does not fit on the devices ["metal[4294969630]", "cpu"], and exceeds total capacity by 7530MB
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

when trying to run this code

        let model = TextModelBuilder::new(model_id)
            .with_logging()
            .with_paged_attn(|| {
                PagedAttentionMetaBuilder::default()
                    //.with_gpu_memory(MemoryGpuConfig::Utilization(0.9))
                    .build()
            })
            .unwrap()
            .build()
            .await
            .unwrap();

EricLBuehler · 2025-01-12T19:59:07Z

Hi @hhamud! Thanks for reporting this. On my Mac, I'm seeing that the full, unquantized, model takes up ~28GB (consistent with 14B at bf16). If more than 4GB are being utilized on your system, then this error will occur (based on the error, maybe about 11GB are being used).

Perhaps you can try ISQ to reduce the size? IsqType::Q8_0 retains much of the quality but reduces the model size by about half.

Otherwise, I merged #1060 which improves the error you received, can you please run cargo update to use the latest changes?

hhamud · 2025-01-16T02:44:19Z

Hi @hhamud! Thanks for reporting this. On my Mac, I'm seeing that the full, unquantized, model takes up ~28GB (consistent with 14B at bf16). If more than 4GB are being utilized on your system, then this error will occur (based on the error, maybe about 11GB are being used).

Perhaps you can try ISQ to reduce the size? IsqType::Q8_0 retains much of the quality but reduces the model size by about half.

Otherwise, I merged #1060 which improves the error you received, can you please run cargo update to use the latest changes?

you are correct, I should have quantised it.

hhamud added the bug label Jan 12, 2025

hhamud closed this as completed Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2.5 14b does not fit on Mac m1 32gb #1058

Qwen2.5 14b does not fit on Mac m1 32gb #1058

hhamud commented Jan 12, 2025

EricLBuehler commented Jan 12, 2025 •

edited

Loading

hhamud commented Jan 16, 2025

Qwen2.5 14b does not fit on Mac m1 32gb #1058

Qwen2.5 14b does not fit on Mac m1 32gb #1058

Comments

hhamud commented Jan 12, 2025

EricLBuehler commented Jan 12, 2025 • edited Loading

hhamud commented Jan 16, 2025

EricLBuehler commented Jan 12, 2025 •

edited

Loading