You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
running 1 test
2025-01-12T18:12:29.336835Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.337533Z INFO mistralrs_core::pipeline::normal: Loading `config.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.472416Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00001-of-00008.safetensors", "model-00002-of-00008.safetensors", "model-00003-of-00008.safetensors", "model-00004-of-00008.safetensors", "model-00005-of-00008.safetensors", "model-00006-of-00008.safetensors", "model-00007-of-00008.safetensors", "model-00008-of-00008.safetensors"]
2025-01-12T18:12:29.582263Z INFO mistralrs_core::pipeline::normal: Loading `generation_config.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.815600Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer_config.json` at `Qwen/Qwen2.5-14B-Instruct`
2025-01-12T18:12:29.924620Z INFO mistralrs_core::utils::normal: DType selected is BF16.
2025-01-12T18:12:29.925200Z INFO mistralrs_core::utils::log: Automatic loader type determined to be `qwen2`
thread 'llm::tests::test_spawn_llm' panicked at src/llm.rs:65:14:
called `Result::unwrap()` on an `Err` value: This model does not fit on the devices ["metal[4294969630]", "cpu"], and exceeds total capacity by 7530MBnote: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
when trying to run this code
let model = TextModelBuilder::new(model_id).with_logging().with_paged_attn(|| {PagedAttentionMetaBuilder::default()//.with_gpu_memory(MemoryGpuConfig::Utilization(0.9)).build()}).unwrap().build().await.unwrap();
The text was updated successfully, but these errors were encountered:
Hi @hhamud! Thanks for reporting this. On my Mac, I'm seeing that the full, unquantized, model takes up ~28GB (consistent with 14B at bf16). If more than 4GB are being utilized on your system, then this error will occur (based on the error, maybe about 11GB are being used).
Perhaps you can try ISQ to reduce the size? IsqType::Q8_0 retains much of the quality but reduces the model size by about half.
Otherwise, I merged #1060 which improves the error you received, can you please run cargo update to use the latest changes?
Hi @hhamud! Thanks for reporting this. On my Mac, I'm seeing that the full, unquantized, model takes up ~28GB (consistent with 14B at bf16). If more than 4GB are being utilized on your system, then this error will occur (based on the error, maybe about 11GB are being used).
Perhaps you can try ISQ to reduce the size? IsqType::Q8_0 retains much of the quality but reduces the model size by about half.
Otherwise, I merged #1060 which improves the error you received, can you please run cargo update to use the latest changes?
I recieve this error
when trying to run this code
The text was updated successfully, but these errors were encountered: