Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp Integration to Support Low-End Hardware Compatibility #11

Open
efelem opened this issue Dec 5, 2023 · 1 comment
Open

llama.cpp Integration to Support Low-End Hardware Compatibility #11

efelem opened this issue Dec 5, 2023 · 1 comment

Comments

@efelem
Copy link

efelem commented Dec 5, 2023

Request for llama.cpp Integration to Support Low-End Hardware Compatibility

Description

I'm currently trying to integrate llama.cpp with Meditron for running models on lower-end hardware. Meditron is based on Llama, so in theory, this should be possible. However, I'm encountering issues when attempting to convert the Meditron model using llama.cpp.

Steps to Reproduce

  1. Either run python3 convert-hf-to-gguf.py ../meditron-7b/

    • Output:
      Loading model: meditron-7b
      Traceback (most recent call last):
      ...
      NotImplementedError: Architecture "LlamaForCausalLM" not supported!
      
  2. Or directly launching with llama.cpp using:

    ./build/bin/main --rope-freq-scale 8.0 -m ../meditron-7b/pytorch_model-00008-of-00008.bin -p "I have pain in my leg from toes to hip"
    
    • Output:
      Log start
      ...
      error loading model: llama_model_loader: failed to load model from ../meditron-7b/pytorch_model-00008-of-00008.bin
      

Expected Behavior

Successful integration of llama.cpp with Meditron, allowing the model to run on lower-end hardware.

Actual Behavior

Encountering a NotImplementedError for the architecture "LlamaForCausalLM" when trying to convert the model, and an error loading the model when launching directly with llama.cpp.

Possible Solution

Adjustments in llama.cpp to support the "LlamaForCausalLM" architecture used by Meditron. This could involve modifying the model conversion script or the model loading mechanism in llama.cpp.

Additional Context

Link to llama.cpp

Request

I kindly request the team to consider adding support for llama.cpp integration with Meditron. Or to give advices on how to implement it. This would be a significant enhancement, enabling the use of Meditron models on more diverse hardware setups, especially those at the lower end.

@martinjaggi
Copy link
Contributor

related: did you try these quantized models also?
https://huggingface.co/TheBloke/meditron-70B-GGUF
https://huggingface.co/TheBloke/meditron-7B-GGUF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants