Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2.5 14b does not fit on Mac m1 32gb #1058

Closed
hhamud opened this issue Jan 12, 2025 · 2 comments
Closed

Qwen2.5 14b does not fit on Mac m1 32gb #1058

hhamud opened this issue Jan 12, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@hhamud
Copy link

hhamud commented Jan 12, 2025

I recieve this error

running 1 test
2025-01-12T18:12:29.336835Z  INFO mistralrs_core::pipeline::normal: Loading `tokenizer.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.337533Z  INFO mistralrs_core::pipeline::normal: Loading `config.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.472416Z  INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00001-of-00008.safetensors", "model-00002-of-00008.safetensors", "model-00003-of-00008.safetensors", "model-00004-of-00008.safetensors", "model-00005-of-00008.safetensors", "model-00006-of-00008.safetensors", "model-00007-of-00008.safetensors", "model-00008-of-00008.safetensors"]
2025-01-12T18:12:29.582263Z  INFO mistralrs_core::pipeline::normal: Loading `generation_config.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.815600Z  INFO mistralrs_core::pipeline::normal: Loading `tokenizer_config.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.924620Z  INFO mistralrs_core::utils::normal: DType selected is BF16.
2025-01-12T18:12:29.925200Z  INFO mistralrs_core::utils::log: Automatic loader type determined to be `qwen2`
thread 'llm::tests::test_spawn_llm' panicked at src/llm.rs:65:14:
called `Result::unwrap()` on an `Err` value: This model does not fit on the devices ["metal[4294969630]", "cpu"], and exceeds total capacity by 7530MB
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

when trying to run this code

        let model = TextModelBuilder::new(model_id)
            .with_logging()
            .with_paged_attn(|| {
                PagedAttentionMetaBuilder::default()
                    //.with_gpu_memory(MemoryGpuConfig::Utilization(0.9))
                    .build()
            })
            .unwrap()
            .build()
            .await
            .unwrap();
@hhamud hhamud added the bug Something isn't working label Jan 12, 2025
@EricLBuehler
Copy link
Owner

EricLBuehler commented Jan 12, 2025

Hi @hhamud! Thanks for reporting this. On my Mac, I'm seeing that the full, unquantized, model takes up ~28GB (consistent with 14B at bf16). If more than 4GB are being utilized on your system, then this error will occur (based on the error, maybe about 11GB are being used).

Perhaps you can try ISQ to reduce the size? IsqType::Q8_0 retains much of the quality but reduces the model size by about half.

Otherwise, I merged #1060 which improves the error you received, can you please run cargo update to use the latest changes?

@hhamud
Copy link
Author

hhamud commented Jan 16, 2025

Hi @hhamud! Thanks for reporting this. On my Mac, I'm seeing that the full, unquantized, model takes up ~28GB (consistent with 14B at bf16). If more than 4GB are being utilized on your system, then this error will occur (based on the error, maybe about 11GB are being used).

Perhaps you can try ISQ to reduce the size? IsqType::Q8_0 retains much of the quality but reduces the model size by about half.

Otherwise, I merged #1060 which improves the error you received, can you please run cargo update to use the latest changes?

you are correct, I should have quantised it.

@hhamud hhamud closed this as completed Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants