-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
any interest in the openchatkit on a power book? #96
Comments
GptNeoX is not as good as LLAMa |
@v3ss0n I'm not sure if it's simple like that. The fine-tuned GPT-NeoXT-Chat-Base-20B model that is part of OpenChatKit is focused on conversational interactions. And they are working on fine tuning it for other use cases. Testing their model on Hugging Face looks quite promising. Also it's Apache 2.0 licensed and there is a lot of stuff around the project besides from just using inference on the model as moderation and concepts for a retrieval system. So being able to use inference with their model(s) using ggml on commodity hardware would be interesting in my opinion. |
I see , i can see their benifit. |
You might want to read the original paper LLaMA: Open and Efficient Foundation Language Models if you want to understand why this is. LLaMA was trained similar to GPT-3 and not fine-tuned on specific tasks such as code generation and chat. |
Mixing this with code tune models like codegen would make it ChatGPT-like performance? Also we need CodegenCPP. This repo needs a discussion tab opened. |
Open-Chat-Kit along with Open-Assistant models are supported in Cformers/, along with 10 other models. You can now interface with the models with just 3 lines of code from python.
Generation speed is competitive to llama.cpp (75 ms/token for 12B model on my Macbook Pro) |
i hope to see this 2 framework collaborate , instead of compete , because - together we are the only fighting chance against big tech. |
Cformers and LLaMa.cpp are efforts on two orthogonal fronts to enable fast inference of SoTA AI models on CPU. We are not competing. Both are necessary and we would love to collaborate - as we have already indicated multiple times. If you go through our cformers README, we identified three fronts:
In order to avoid redundant efforts, we switched away from our original int4 LLaMa implementation in CPP to llama.cpp, even though ours had similar speed on CPU and was released before the first commit to llama.cpp was even made!
Eventually, we will to move away from LLaMa/Alpaca models because of its restrictive licensing, though we will have support for it in Cformers. |
Would be better if two communities able to merge. |
https://www.together.xyz/blog/openchatkit this new repository might also be a good candidate for any local deployment with a strong GPU. As the gptNeox focus is on GPU deployments.
The text was updated successfully, but these errors were encountered: