LlaMa #1473

slavakurilyak · 2023-03-06T17:45:03Z

It would be great to see LangChain integrate with LlaMa, a collection of foundation language models ranging from 7B to 65B
parameters.

LlaMa is a language model that was developed to improve upon existing models such as ChatGPT and GPT-3. It has several advantages over these models, such as improved accuracy, faster training times, and more robust handling of out-of-vocabulary words. LlaMa is also more efficient in terms of memory usage and computational resources. In terms of accuracy, LlaMa outperforms ChatGPT and GPT-3 on several natural language understanding tasks, including sentiment analysis, question answering, and text summarization. Additionally, LlaMa can be trained on larger datasets, enabling it to better capture the nuances of natural language. Overall, LlaMa is a more powerful and efficient language model than ChatGPT and GPT-3.

Here's the official repo by @facebookresearch. Here's the research abstract and PDF, respectively.

Note, this project is not to be confused with LlamaIndex (previously GPT Index) by @jerryjliu.

hwchase17 · 2023-03-06T18:17:08Z

@conceptofmind i believe you said you were working on this?

conceptofmind · 2023-03-06T18:31:31Z

@conceptofmind i believe you said you were working on this?

Yes actively working on this with a group of peers. We have successfully deployed inference with the 65B models. Working on a LangChain wrapper now.

conceptofmind · 2023-03-06T18:32:41Z

Would have to think about how to handle the sizes of different models though. I could see this becoming an issue for the end user.......

Electomanic · 2023-03-06T20:37:18Z

There is some ongoing work to use GPTQ to compress the models to 3 or 4 bits in this repo. Also a discussion going on over at the oobabooga repo.

Not sure if this is going to work but might be something to keep an eye on. If it works out it could be possible to run the larger models on a single consumer grade GPU.

The original paper is available here on arxiv.

conceptofmind · 2023-03-06T22:16:08Z

4 bit may be plausible. 8 bit should be fine. The weights are already in fp16 from my understanding. I would have to evaluate this further.

jooray · 2023-03-12T12:03:02Z

Yes, the weights are fp16. You can convert and run 4-bit using https://github.com/ggerganov/llama.cpp. I think 30B with full precision might be at least on par to 65B 4-bit in case of results. Llama.cpp runs on CPU, including Apple Silicon, which might be a good choice for developers with recent Macbooks, they could develop and run experiments locally with langchain without a need of GPUs.

fblissjr · 2023-03-12T17:59:03Z

There is some ongoing work to use GPTQ to compress the models to 3 or 4 bits in this repo. Also a discussion going on over at the oobabooga repo.

Not sure if this is going to work but might be something to keep an eye on. If it works out it could be possible to run the larger models on a single consumer grade GPU.

The original paper is available here on arxiv.

Confirmed working on a single consumer grade 4090 here with 13B. Waiting on the 30B 4 bit weights - failed at trying to run them at fp16. :)

conceptofmind · 2023-03-12T17:59:34Z

I am aware of all these alternatives. We are waiting to hear back from Huggingface before the decision is made. Once we have a concrete answer from them we will proceed from there.

I have some concerns about Llama.cpp since the author seems to have noted he has no interest in maintaining it. And there are other things to factor in when adding dependencies that can not be easily installed. It needs to be a relatively effortless setup for the best user experience.

gururise · 2023-03-13T16:07:44Z

Using GPTQ 4-bit quantized 30B model, outputs are (as far as I can tell) very good. Hope to see GPTQ 4-bit support in LangChain. The GPTQ quantization appears to be better than the 4-bit RTN quantization (currently) used in Llama.cpp

4-bit 30B model confirmed working on an OLD Tesla P40 GPU (24GB).

DamascusGit · 2023-03-14T03:10:43Z

Any info on running 7B model with Langchain?

niansa · 2023-03-15T18:28:00Z

Yes, the weights are fp16. You can convert and run 4-bit using https://github.com/ggerganov/llama.cpp. I think 30B with full precision might be at least on par to 65B 4-bit in case of results. Llama.cpp runs on CPU, including Apple Silicon, which might be a good choice for developers with recent Macbooks, they could develop and run experiments locally with langchain without a need of GPUs.

It'd be really neat if that's going to be an option 😄
Sure it's slow but hey you can run it on a literal laptop.

conceptofmind · 2023-03-16T15:31:59Z

Llama has been added to Huggingface: huggingface/transformers#21955

The only reason to add a specific wrapper would be to include the perf improvements from cpp or gptq

linonetwo · 2023-03-19T09:27:50Z

I think you are talking about a Pythion wrapper. So I'm going to write a TS wrapper for llama.cpp and alpaca.cpp for localhost private usage, if no one is working on this yet.

I will try extend the class BaseLLM to do so.

linonetwo · 2023-03-20T18:46:10Z

Here you are:

https://github.com/linonetwo/langchain-alpaca

https://www.npmjs.com/package/langchain-alpaca

works on all platforms and works fully locally.

For now, I will try to make a langchain-llama package.

wiz64 · 2023-03-22T18:14:53Z

I'm eagerly waiting to try it for a project :D !!!

asgeir · 2023-03-24T09:02:16Z

If anyone's interested, I've made a pass at wrapping the llama.cpp shared library using ctypes and deriving a custom LLM class for it.
https://gist.github.com/asgeir/3dd75109133b218bf62bab5ddfcbb387

rjadr · 2023-03-31T19:37:16Z

FYI: I just submitted this pull request to integrate llama.cpp into langchain:
#2242

juanps90 · 2023-04-05T18:20:13Z

FYI: I just submitted this pull request to integrate llama.cpp into langchain: #2242

Thank you very much!!

Do you think it would be possible to run LLaMA on GPU as well somehow?

conceptofmind · 2023-04-05T18:41:10Z

FYI: I just submitted this pull request to integrate llama.cpp into langchain: #2242

Thank you very much!!

Do you think it would be possible to run LLaMA on GPU as well somehow?

You are able to load Llama in through Huggingface and use it in a GPU-accelerated environment. https://huggingface.co/docs/transformers/main/en/model_doc/llama

kooshi · 2023-04-05T22:28:09Z

I also added Kobold/text-generation-webui support so you can run Llama or whatever you want locally.
I only tested it a bit, but it worked well back when I made it. I didn't intend on making a PR or maintaining it though, so anyone can feel free to take it and hack on it:
master...kooshi:langchain:kobold-api

1b5d · 2023-04-08T22:12:49Z

I've written an app to run llama based models using docker here: https://github.com/1b5d/llm-api thanks to llama-cpp-python and llama-cpp
You can specify the model in the config file, and the app will download it automatically and expose it via an API
Additionally you can use https://github.com/1b5d/langchain-llm-api in order to use this exposed API with Langchain, it also supports streaming
My goal is to easily run different models locally (and also remote) and switch between them easily, then use these APIs to develop with Langchain

To run it:

First configure and run docker compose up the API as described here: https://github.com/1b5d/llm-api
Then you can simply make requests to it

curl --location 'localhost:8000/generate' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "What is the capital of France?",
    "params": {
        ...
    }
}'

or you can play around with it using langchain via the lib

pip install langchain-llm-api

from langchain_llm_api import LLMAPI

llm = LLMAPI()
llm("What is the capital of France?")
...
\nThe capital of France is Paris.

fblissjr · 2023-04-10T16:00:12Z

I also added Kobold/text-generation-webui support so you can run Llama or whatever you want locally. I only tested it a bit, but it worked well back when I made it. I didn't intend on making a PR or maintaining it though, so anyone can feel free to take it and hack on it: master...kooshi:langchain:kobold-api

did you happen to test this with https://github.com/oobabooga/text-generation-webui ? haven't dug into kobold enough to know if the APIs are similar enough

dosubot · 2023-09-22T16:07:35Z

Hi, @slavakurilyak! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, this issue is a request for LangChain to integrate with LlaMa, a more powerful and efficient language model developed by Facebook Research. There has been ongoing work to use GPTQ to compress the models to 3 or 4 bits, and there has been a discussion about running LlaMa on GPUs. Additionally, a Python wrapper for llama.cpp has been created, and there are plans to create a TS wrapper as well. It's worth mentioning that Llama has been added to Huggingface, and there are other alternatives like Kobold/text-generation-webui and langchain-llm-api.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository, and please don't hesitate to reach out if you have any further questions or concerns!

Best regards,
Dosu

hwchase17 added llms 03 enhancement Enhancement of existing functionality labels Mar 6, 2023

jack-michaud mentioned this issue Mar 18, 2023

Rust Bindings ggerganov/llama.cpp#248

Closed

slavakurilyak mentioned this issue Mar 19, 2023

Alpaca (Fine-tuned LLaMA) #1777

Closed

slavakurilyak mentioned this issue Mar 31, 2023

Vicuna (Fine-tuned LLaMa) #2228

Closed

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 22, 2023

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 29, 2023

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlaMa #1473

LlaMa #1473

slavakurilyak commented Mar 6, 2023

hwchase17 commented Mar 6, 2023

conceptofmind commented Mar 6, 2023 •

edited

Loading

conceptofmind commented Mar 6, 2023

Electomanic commented Mar 6, 2023

conceptofmind commented Mar 6, 2023

jooray commented Mar 12, 2023

fblissjr commented Mar 12, 2023

conceptofmind commented Mar 12, 2023 •

edited

Loading

gururise commented Mar 13, 2023 •

edited

Loading

DamascusGit commented Mar 14, 2023

niansa commented Mar 15, 2023

conceptofmind commented Mar 16, 2023

linonetwo commented Mar 19, 2023 •

edited

Loading

linonetwo commented Mar 20, 2023 •

edited

Loading

wiz64 commented Mar 22, 2023

asgeir commented Mar 24, 2023 •

edited

Loading

rjadr commented Mar 31, 2023

juanps90 commented Apr 5, 2023

conceptofmind commented Apr 5, 2023

kooshi commented Apr 5, 2023

1b5d commented Apr 8, 2023 •

edited

Loading

fblissjr commented Apr 10, 2023

dosubot bot commented Sep 22, 2023

LlaMa #1473

LlaMa #1473

Comments

slavakurilyak commented Mar 6, 2023

hwchase17 commented Mar 6, 2023

conceptofmind commented Mar 6, 2023 • edited Loading

conceptofmind commented Mar 6, 2023

Electomanic commented Mar 6, 2023

conceptofmind commented Mar 6, 2023

jooray commented Mar 12, 2023

fblissjr commented Mar 12, 2023

conceptofmind commented Mar 12, 2023 • edited Loading

gururise commented Mar 13, 2023 • edited Loading

DamascusGit commented Mar 14, 2023

niansa commented Mar 15, 2023

conceptofmind commented Mar 16, 2023

linonetwo commented Mar 19, 2023 • edited Loading

linonetwo commented Mar 20, 2023 • edited Loading

wiz64 commented Mar 22, 2023

asgeir commented Mar 24, 2023 • edited Loading

rjadr commented Mar 31, 2023

juanps90 commented Apr 5, 2023

conceptofmind commented Apr 5, 2023

kooshi commented Apr 5, 2023

1b5d commented Apr 8, 2023 • edited Loading

fblissjr commented Apr 10, 2023

dosubot bot commented Sep 22, 2023

conceptofmind commented Mar 6, 2023 •

edited

Loading

conceptofmind commented Mar 12, 2023 •

edited

Loading

gururise commented Mar 13, 2023 •

edited

Loading

linonetwo commented Mar 19, 2023 •

edited

Loading

linonetwo commented Mar 20, 2023 •

edited

Loading

asgeir commented Mar 24, 2023 •

edited

Loading

1b5d commented Apr 8, 2023 •

edited

Loading