Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LlaMa #1473

Closed
slavakurilyak opened this issue Mar 6, 2023 · 23 comments
Closed

LlaMa #1473

slavakurilyak opened this issue Mar 6, 2023 · 23 comments
Labels
03 enhancement Enhancement of existing functionality

Comments

@slavakurilyak
Copy link

It would be great to see LangChain integrate with LlaMa, a collection of foundation language models ranging from 7B to 65B
parameters.

LlaMa is a language model that was developed to improve upon existing models such as ChatGPT and GPT-3. It has several advantages over these models, such as improved accuracy, faster training times, and more robust handling of out-of-vocabulary words. LlaMa is also more efficient in terms of memory usage and computational resources. In terms of accuracy, LlaMa outperforms ChatGPT and GPT-3 on several natural language understanding tasks, including sentiment analysis, question answering, and text summarization. Additionally, LlaMa can be trained on larger datasets, enabling it to better capture the nuances of natural language. Overall, LlaMa is a more powerful and efficient language model than ChatGPT and GPT-3.

Here's the official repo by @facebookresearch. Here's the research abstract and PDF, respectively.

Note, this project is not to be confused with LlamaIndex (previously GPT Index) by @jerryjliu.

@hwchase17
Copy link
Contributor

@conceptofmind i believe you said you were working on this?

@hwchase17 hwchase17 added llms 03 enhancement Enhancement of existing functionality labels Mar 6, 2023
@conceptofmind
Copy link
Contributor

conceptofmind commented Mar 6, 2023

@conceptofmind i believe you said you were working on this?

Yes actively working on this with a group of peers. We have successfully deployed inference with the 65B models. Working on a LangChain wrapper now.

@conceptofmind
Copy link
Contributor

Would have to think about how to handle the sizes of different models though. I could see this becoming an issue for the end user.......

@Electomanic
Copy link

There is some ongoing work to use GPTQ to compress the models to 3 or 4 bits in this repo. Also a discussion going on over at the oobabooga repo.

Not sure if this is going to work but might be something to keep an eye on. If it works out it could be possible to run the larger models on a single consumer grade GPU.

The original paper is available here on arxiv.

@conceptofmind
Copy link
Contributor

4 bit may be plausible. 8 bit should be fine. The weights are already in fp16 from my understanding. I would have to evaluate this further.

@jooray
Copy link

jooray commented Mar 12, 2023

Yes, the weights are fp16. You can convert and run 4-bit using https://github.com/ggerganov/llama.cpp. I think 30B with full precision might be at least on par to 65B 4-bit in case of results. Llama.cpp runs on CPU, including Apple Silicon, which might be a good choice for developers with recent Macbooks, they could develop and run experiments locally with langchain without a need of GPUs.

@fblissjr
Copy link

There is some ongoing work to use GPTQ to compress the models to 3 or 4 bits in this repo. Also a discussion going on over at the oobabooga repo.

Not sure if this is going to work but might be something to keep an eye on. If it works out it could be possible to run the larger models on a single consumer grade GPU.

The original paper is available here on arxiv.

Confirmed working on a single consumer grade 4090 here with 13B. Waiting on the 30B 4 bit weights - failed at trying to run them at fp16. :)

@conceptofmind
Copy link
Contributor

conceptofmind commented Mar 12, 2023

I am aware of all these alternatives. We are waiting to hear back from Huggingface before the decision is made. Once we have a concrete answer from them we will proceed from there.

I have some concerns about Llama.cpp since the author seems to have noted he has no interest in maintaining it. And there are other things to factor in when adding dependencies that can not be easily installed. It needs to be a relatively effortless setup for the best user experience.

@gururise
Copy link
Contributor

gururise commented Mar 13, 2023

Using GPTQ 4-bit quantized 30B model, outputs are (as far as I can tell) very good. Hope to see GPTQ 4-bit support in LangChain. The GPTQ quantization appears to be better than the 4-bit RTN quantization (currently) used in Llama.cpp

4-bit 30B model confirmed working on an OLD Tesla P40 GPU (24GB).

@DamascusGit
Copy link

Any info on running 7B model with Langchain?

@niansa
Copy link

niansa commented Mar 15, 2023

Yes, the weights are fp16. You can convert and run 4-bit using https://github.com/ggerganov/llama.cpp. I think 30B with full precision might be at least on par to 65B 4-bit in case of results. Llama.cpp runs on CPU, including Apple Silicon, which might be a good choice for developers with recent Macbooks, they could develop and run experiments locally with langchain without a need of GPUs.

It'd be really neat if that's going to be an option 😄
Sure it's slow but hey you can run it on a literal laptop.

@conceptofmind
Copy link
Contributor

Llama has been added to Huggingface: huggingface/transformers#21955

The only reason to add a specific wrapper would be to include the perf improvements from cpp or gptq

@linonetwo
Copy link

linonetwo commented Mar 19, 2023

I think you are talking about a Pythion wrapper. So I'm going to write a TS wrapper for llama.cpp and alpaca.cpp for localhost private usage, if no one is working on this yet.

I will try extend the class BaseLLM to do so.

@linonetwo
Copy link

linonetwo commented Mar 20, 2023

Here you are:

https://github.com/linonetwo/langchain-alpaca

https://www.npmjs.com/package/langchain-alpaca

works on all platforms and works fully locally.

For now, I will try to make a langchain-llama package.

@wiz64
Copy link

wiz64 commented Mar 22, 2023

I'm eagerly waiting to try it for a project :D !!!

@asgeir
Copy link

asgeir commented Mar 24, 2023

If anyone's interested, I've made a pass at wrapping the llama.cpp shared library using ctypes and deriving a custom LLM class for it.
https://gist.github.com/asgeir/3dd75109133b218bf62bab5ddfcbb387

@rjadr
Copy link
Contributor

rjadr commented Mar 31, 2023

FYI: I just submitted this pull request to integrate llama.cpp into langchain:
#2242

@juanps90
Copy link

juanps90 commented Apr 5, 2023

FYI: I just submitted this pull request to integrate llama.cpp into langchain: #2242

Thank you very much!!

Do you think it would be possible to run LLaMA on GPU as well somehow?

@conceptofmind
Copy link
Contributor

FYI: I just submitted this pull request to integrate llama.cpp into langchain: #2242

Thank you very much!!

Do you think it would be possible to run LLaMA on GPU as well somehow?

You are able to load Llama in through Huggingface and use it in a GPU-accelerated environment. https://huggingface.co/docs/transformers/main/en/model_doc/llama

@kooshi
Copy link

kooshi commented Apr 5, 2023

I also added Kobold/text-generation-webui support so you can run Llama or whatever you want locally.
I only tested it a bit, but it worked well back when I made it. I didn't intend on making a PR or maintaining it though, so anyone can feel free to take it and hack on it:
master...kooshi:langchain:kobold-api

@1b5d
Copy link

1b5d commented Apr 8, 2023

I've written an app to run llama based models using docker here: https://github.com/1b5d/llm-api thanks to llama-cpp-python and llama-cpp
You can specify the model in the config file, and the app will download it automatically and expose it via an API
Additionally you can use https://github.com/1b5d/langchain-llm-api in order to use this exposed API with Langchain, it also supports streaming
My goal is to easily run different models locally (and also remote) and switch between them easily, then use these APIs to develop with Langchain

To run it:

curl --location 'localhost:8000/generate' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "What is the capital of France?",
    "params": {
        ...
    }
}'
  • or you can play around with it using langchain via the lib
pip install langchain-llm-api
from langchain_llm_api import LLMAPI

llm = LLMAPI()
llm("What is the capital of France?")
...
\nThe capital of France is Paris.

@fblissjr
Copy link

I also added Kobold/text-generation-webui support so you can run Llama or whatever you want locally. I only tested it a bit, but it worked well back when I made it. I didn't intend on making a PR or maintaining it though, so anyone can feel free to take it and hack on it: master...kooshi:langchain:kobold-api

did you happen to test this with https://github.com/oobabooga/text-generation-webui ? haven't dug into kobold enough to know if the APIs are similar enough

@dosubot
Copy link

dosubot bot commented Sep 22, 2023

Hi, @slavakurilyak! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, this issue is a request for LangChain to integrate with LlaMa, a more powerful and efficient language model developed by Facebook Research. There has been ongoing work to use GPTQ to compress the models to 3 or 4 bits, and there has been a discussion about running LlaMa on GPUs. Additionally, a Python wrapper for llama.cpp has been created, and there are plans to create a TS wrapper as well. It's worth mentioning that Llama has been added to Huggingface, and there are other alternatives like Kobold/text-generation-webui and langchain-llm-api.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository, and please don't hesitate to reach out if you have any further questions or concerns!

Best regards,
Dosu

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 22, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 29, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
03 enhancement Enhancement of existing functionality
Projects
None yet
Development

No branches or pull requests