any interest in the openchatkit on a power book? #96

ninjamoba · 2023-03-13T16:43:04Z

https://www.together.xyz/blog/openchatkit this new repository might also be a good candidate for any local deployment with a strong GPU. As the gptNeox focus is on GPU deployments.

v3ss0n · 2023-03-13T18:26:42Z

GptNeoX is not as good as LLAMa

sburnicki · 2023-03-14T09:39:46Z

@v3ss0n I'm not sure if it's simple like that.

The fine-tuned GPT-NeoXT-Chat-Base-20B model that is part of OpenChatKit is focused on conversational interactions. And they are working on fine tuning it for other use cases.

Testing their model on Hugging Face looks quite promising.

Also it's Apache 2.0 licensed and there is a lot of stuff around the project besides from just using inference on the model as moderation and concepts for a retrieval system.

So being able to use inference with their model(s) using ggml on commodity hardware would be interesting in my opinion.

v3ss0n · 2023-03-14T10:48:38Z

I see , i can see their benifit.
So far LLAMA version is quite bad at code generation , otherwise quite good .
GOnna try their HF>

gjmulder · 2023-03-14T11:09:44Z

So far LLAMA version is quite bad at code generation , otherwise quite good .

You might want to read the original paper LLaMA: Open and Efficient Foundation Language Models if you want to understand why this is. LLaMA was trained similar to GPT-3 and not fine-tuned on specific tasks such as code generation and chat.

v3ss0n · 2023-03-14T11:25:56Z

Mixing this with code tune models like codegen would make it ChatGPT-like performance? Also we need CodegenCPP.

This repo needs a discussion tab opened.

Ayushk4 · 2023-03-25T17:18:37Z

Open-Chat-Kit along with Open-Assistant models are supported in Cformers/, along with 10 other models.

You can now interface with the models with just 3 lines of code from python.

from interface import AutoInference as AI
ai = AI('OpenAssistant/oasst-sft-1-pythia-12b')
x = ai.generate("<|prompter|>What's the Earth total population<|endoftext|><|assistant|>", num_tokens_to_generate

Generation speed is competitive to llama.cpp (75 ms/token for 12B model on my Macbook Pro)

v3ss0n · 2023-03-25T17:32:38Z

i hope to see this 2 framework collaborate , instead of compete , because - together we are the only fighting chance against big tech.

Ayushk4 · 2023-03-25T18:11:23Z

Cformers and LLaMa.cpp are efforts on two orthogonal fronts to enable fast inference of SoTA AI models on CPU. We are not competing. Both are necessary and we would love to collaborate - as we have already indicated multiple times. If you go through our cformers README, we identified three fronts:

Fast C/C++ LLM inference kernels: This is what LLaMa.cpp and GGML is for -- We use these libraries as backend. So far we use a copy-pasted version of the files. We plan to switch over to LLaMa.cpp or GGML as a git-submodule dependency.

In order to avoid redundant efforts, we switched away from our original int4 LLaMa implementation in CPP to llama.cpp, even though ours had similar speed on CPU and was released before the first commit to llama.cpp was even made!

Easy to use API for fast AI inference in dynamically typed language like Python - This is what Cformers is for.
Machine Learning Research & Exploration front - This is what our other efforts at nolano.org is for - like int3 quantization, sparsifying models, and training open-source LLaMa equivalent model - the latter two will soon be shared.

Eventually, we will to move away from LLaMa/Alpaca models because of its restrictive licensing, though we will have support for it in Cformers.

v3ss0n · 2023-03-25T19:07:18Z

Would be better if two communities able to merge.

ninjamoba changed the title ~~any interest in the chatkit on a power book?~~ any interest in the openchatkit on a power book? Mar 13, 2023

ggerganov added the question Further information is requested label Mar 13, 2023

gjmulder added enhancement New feature or request hardware Hardware related labels Mar 15, 2023

ggerganov closed this as completed Jul 28, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

any interest in the openchatkit on a power book? #96

any interest in the openchatkit on a power book? #96

ninjamoba commented Mar 13, 2023

v3ss0n commented Mar 13, 2023

sburnicki commented Mar 14, 2023

v3ss0n commented Mar 14, 2023

gjmulder commented Mar 14, 2023

v3ss0n commented Mar 14, 2023

Ayushk4 commented Mar 25, 2023

v3ss0n commented Mar 25, 2023 •

edited

Loading

Ayushk4 commented Mar 25, 2023 •

edited

Loading

v3ss0n commented Mar 25, 2023

any interest in the openchatkit on a power book? #96

any interest in the openchatkit on a power book? #96

Comments

ninjamoba commented Mar 13, 2023

v3ss0n commented Mar 13, 2023

sburnicki commented Mar 14, 2023

v3ss0n commented Mar 14, 2023

gjmulder commented Mar 14, 2023

v3ss0n commented Mar 14, 2023

Ayushk4 commented Mar 25, 2023

v3ss0n commented Mar 25, 2023 • edited Loading

Ayushk4 commented Mar 25, 2023 • edited Loading

v3ss0n commented Mar 25, 2023

v3ss0n commented Mar 25, 2023 •

edited

Loading

Ayushk4 commented Mar 25, 2023 •

edited

Loading