-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Falcon support? #24
Comments
I'm also waiting for that PR to be merged. Hopefully it will be merged this weekend ggerganov/ggml#231 (comment) |
Thanks, do you know if it is possible to point ctransformers to a branch of ggml for testing? |
+1 to this However I don't think the ggml PR is the one to implement. Instead I would use the new implementation in ggllm.cpp: https://github.com/cmp-nct/ggllm.cpp This is now the best Falcon GGML implementation, including CUDA GPU acceleration with support for both 7B and 40B models. I don't know if this will end up also being in the GGML repo, or maybe even eventually the llama.cpp repo (as ggllm.cpp is a fork of llama.cpp). But either way, this is the Falcon implementation of interest right now. And I wonder whether there's even a need to wait for it to be fully stable? It's already useful and being used by people. I have four Falcon GGML repos now:
If ctransformers supported this I think it would help accelerate the use of Falcon GGML. |
@matthoffner It is not possible to point ctransformers to a branch of ggml as the model code has to be modified to integrate into the common interface I provide for all models. Thanks @TheBloke I was waiting for the PR to be merged but since you are already providing the files, I added experimental support for Falcon models using the ggllm fork in the latest version 0.2.10 It has CUDA support similar to the LLaMA models. I tested with 7B model but my machine doesn't have enough memory for the 40B model. |
Fantastic! That's great news, thank you marella. That was super quick. I will update my READMEs to mention this. @ParisNeo could you check if this works automatically in LoLLMS, and if so maybe add some Falcon GGML entries? Then I will mention also in the README, and you will be the first UI to support Falcon GGML! :) |
I am using 0.2.10. Am I missing something? |
You should use llm = AutoModelForCausalLM.from_pretrained("TheBloke/falcon-7b-instruct-GGML", model_type="falcon") |
Thank you very much for this nice work. Tested the 7B model on my pc and it is really solid even compared to 13B from other models. |
Yeah i was wondering about that too
|
Hey, I don't have a twitter account. I'm on LinkedIn https://www.linkedin.com/in/ravindramarella/ but I don't post anything there. |
Ok. Very nice profile by the way. Nice to meet you. |
I've been tracking the Falcon ggerganov/ggml#231 PR, and as I understand currently it won't work on a released version of
ggml
.Any suggestions on how to test it config wise are appreciated, I'm assuming
llama
might not work based on other PRs.The text was updated successfully, but these errors were encountered: