Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any interest in the openchatkit on a power book? #96

Closed
ninjamoba opened this issue Mar 13, 2023 · 9 comments
Closed

any interest in the openchatkit on a power book? #96

ninjamoba opened this issue Mar 13, 2023 · 9 comments
Labels
enhancement New feature or request hardware Hardware related question Further information is requested

Comments

@ninjamoba
Copy link

https://www.together.xyz/blog/openchatkit this new repository might also be a good candidate for any local deployment with a strong GPU. As the gptNeox focus is on GPU deployments.

@ninjamoba ninjamoba changed the title any interest in the chatkit on a power book? any interest in the openchatkit on a power book? Mar 13, 2023
@ggerganov ggerganov added the question Further information is requested label Mar 13, 2023
@v3ss0n
Copy link

v3ss0n commented Mar 13, 2023

GptNeoX is not as good as LLAMa

@sburnicki
Copy link

@v3ss0n I'm not sure if it's simple like that.

The fine-tuned GPT-NeoXT-Chat-Base-20B model that is part of OpenChatKit is focused on conversational interactions. And they are working on fine tuning it for other use cases.

Testing their model on Hugging Face looks quite promising.

Also it's Apache 2.0 licensed and there is a lot of stuff around the project besides from just using inference on the model as moderation and concepts for a retrieval system.

So being able to use inference with their model(s) using ggml on commodity hardware would be interesting in my opinion.

@v3ss0n
Copy link

v3ss0n commented Mar 14, 2023

I see , i can see their benifit.
So far LLAMA version is quite bad at code generation , otherwise quite good .
GOnna try their HF>

@gjmulder
Copy link
Collaborator

So far LLAMA version is quite bad at code generation , otherwise quite good .

You might want to read the original paper LLaMA: Open and Efficient Foundation Language Models if you want to understand why this is. LLaMA was trained similar to GPT-3 and not fine-tuned on specific tasks such as code generation and chat.

@v3ss0n
Copy link

v3ss0n commented Mar 14, 2023

Mixing this with code tune models like codegen would make it ChatGPT-like performance? Also we need CodegenCPP.

This repo needs a discussion tab opened.

@gjmulder gjmulder added enhancement New feature or request hardware Hardware related labels Mar 15, 2023
@Ayushk4
Copy link

Ayushk4 commented Mar 25, 2023

Open-Chat-Kit along with Open-Assistant models are supported in Cformers/, along with 10 other models.

You can now interface with the models with just 3 lines of code from python.

from interface import AutoInference as AI
ai = AI('OpenAssistant/oasst-sft-1-pythia-12b')
x = ai.generate("<|prompter|>What's the Earth total population<|endoftext|><|assistant|>", num_tokens_to_generate

Generation speed is competitive to llama.cpp (75 ms/token for 12B model on my Macbook Pro)

@v3ss0n
Copy link

v3ss0n commented Mar 25, 2023

i hope to see this 2 framework collaborate , instead of compete , because - together we are the only fighting chance against big tech.

@Ayushk4
Copy link

Ayushk4 commented Mar 25, 2023

Cformers and LLaMa.cpp are efforts on two orthogonal fronts to enable fast inference of SoTA AI models on CPU. We are not competing. Both are necessary and we would love to collaborate - as we have already indicated multiple times. If you go through our cformers README, we identified three fronts:

  1. Fast C/C++ LLM inference kernels: This is what LLaMa.cpp and GGML is for -- We use these libraries as backend. So far we use a copy-pasted version of the files. We plan to switch over to LLaMa.cpp or GGML as a git-submodule dependency.

In order to avoid redundant efforts, we switched away from our original int4 LLaMa implementation in CPP to llama.cpp, even though ours had similar speed on CPU and was released before the first commit to llama.cpp was even made!

  1. Easy to use API for fast AI inference in dynamically typed language like Python - This is what Cformers is for.

  2. Machine Learning Research & Exploration front - This is what our other efforts at nolano.org is for - like int3 quantization, sparsifying models, and training open-source LLaMa equivalent model - the latter two will soon be shared.

Eventually, we will to move away from LLaMa/Alpaca models because of its restrictive licensing, though we will have support for it in Cformers.

@v3ss0n
Copy link

v3ss0n commented Mar 25, 2023

Would be better if two communities able to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hardware Hardware related question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants