You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As MiniCPM3 comes with an RAG suite, we'd like to use the LoRA adapter for better performance, just like:
# Suppose we already have downloaded MiniCPM3-4B and MiniCPM3-RAG-LoRA-GGUF models in current directory
docker run --rm -it -p 8080:8080 -v $PWD/MiniCPM3-4B-GGUF:/models -v $PWD/MiniCPM3-RAG-LoRA-GGUF:/lora --gpus all ghcr.io/ggerganov/llama.cpp:server-cuda -m models/minicpm3-4b-q4_k_m.gguf --host 0.0.0.0 --port 8080 --n-gpu-layers 99 -v -ub 1024 -b 4096 --lora lora/lora-adapter-fp16.gguf
And the LoRA model cannot be converted to .gguf format now as the ggerganov/llama.cpp#9396 haven't be merged:
# As ditto
docker run -it --rm --entrypoint /app/convert_lora_to_gguf.py -v $PWD/MiniCPM3-4B:/models -v $PWD/MiniCPM3-RAG-LoRA:/lora ghcr.io/ggerganov/llama.cpp:full --outtype q8_0 --base /models /lora
It said:
The repository for /models contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//models.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
Or could you give us some tips for converting? Thanks a lot!
MiniCPM3 is, de facto, an ideal edge-side LLM for small companies.
The text was updated successfully, but these errors were encountered:
Feature request / 功能建议
Hey my dear bros, we're building an RAG application (especially for one of our products) using MiniCPM3. Below is our stack:
It's almost done.
As MiniCPM3 comes with an RAG suite, we'd like to use the LoRA adapter for better performance, just like:
And the LoRA model cannot be converted to
.gguf
format now as the ggerganov/llama.cpp#9396 haven't be merged:It said:
Or could you give us some tips for converting? Thanks a lot!
MiniCPM3 is, de facto, an ideal edge-side LLM for small companies.
The text was updated successfully, but these errors were encountered: