This repository was archived by the owner on Jun 24, 2024. It is now read-only.
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
Parallel loading of the model tensors #79
Open
Description
People have reported faster loading of the models in upstream when the tensors are loaded in parallel: ggml-org/llama.cpp#85
This should be pretty easy to do with Rust if we convert loading to an iter
and then use par_iter
instead. It seems like this should be I/O bound, but perhaps the actual loading process has computational overhead?