llama.cpp Integration to Support Low-End Hardware Compatibility #11

efelem · 2023-12-05T09:21:39Z

Request for llama.cpp Integration to Support Low-End Hardware Compatibility

Description

I'm currently trying to integrate llama.cpp with Meditron for running models on lower-end hardware. Meditron is based on Llama, so in theory, this should be possible. However, I'm encountering issues when attempting to convert the Meditron model using llama.cpp.

Steps to Reproduce

Either run python3 convert-hf-to-gguf.py ../meditron-7b/

Output:

Loading model: meditron-7b
Traceback (most recent call last):
...
NotImplementedError: Architecture "LlamaForCausalLM" not supported!

Or directly launching with llama.cpp using:

./build/bin/main --rope-freq-scale 8.0 -m ../meditron-7b/pytorch_model-00008-of-00008.bin -p "I have pain in my leg from toes to hip"

Output:

Log start
...
error loading model: llama_model_loader: failed to load model from ../meditron-7b/pytorch_model-00008-of-00008.bin

Expected Behavior

Successful integration of llama.cpp with Meditron, allowing the model to run on lower-end hardware.

Actual Behavior

Encountering a NotImplementedError for the architecture "LlamaForCausalLM" when trying to convert the model, and an error loading the model when launching directly with llama.cpp.

Possible Solution

Adjustments in llama.cpp to support the "LlamaForCausalLM" architecture used by Meditron. This could involve modifying the model conversion script or the model loading mechanism in llama.cpp.

Additional Context

Link to llama.cpp

Request

I kindly request the team to consider adding support for llama.cpp integration with Meditron. Or to give advices on how to implement it. This would be a significant enhancement, enabling the use of Meditron models on more diverse hardware setups, especially those at the lower end.

The text was updated successfully, but these errors were encountered:

martinjaggi · 2023-12-05T09:37:11Z

related: did you try these quantized models also?
https://huggingface.co/TheBloke/meditron-70B-GGUF
https://huggingface.co/TheBloke/meditron-7B-GGUF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp Integration to Support Low-End Hardware Compatibility #11

llama.cpp Integration to Support Low-End Hardware Compatibility #11

efelem commented Dec 5, 2023

martinjaggi commented Dec 5, 2023

llama.cpp Integration to Support Low-End Hardware Compatibility #11

llama.cpp Integration to Support Low-End Hardware Compatibility #11

Comments

efelem commented Dec 5, 2023

Request for llama.cpp Integration to Support Low-End Hardware Compatibility

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Possible Solution

Additional Context

Request

martinjaggi commented Dec 5, 2023