-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add llama 2 model #2262
Comments
Interesting to note that the model evaluation section in their paper lists a 34b model even though the site doesn't talk about it. I wonder if it'll be available. Does anyone have access to the models yet? I signed up but haven't received an e-mail. It's not super clear to me if it's meant to be instant or not. |
Interestingly, the paper talks about a 34B model, which is missing from the model card. |
@Azeirah no, i did not hear back yet either.
also, they are available on hf if your email is the same https://huggingface.co/meta-llama |
I just got access, but the download is flaky, check sums are not matching and the auth is hit or miss. https://github.com/facebookresearch/llama/blob/main/download.sh#L24C1-L43C7 Will update if I am actually able to download these weights |
The updated model code for Llama 2 is at the same facebookresearch/llama repo, diff here: meta-llama/llama@6d4c0c2 Seems codewise, the only difference is the addition of GQA on large models, i.e. the According to the paper, smaller models (i.e. the 7b/13b ones) don't have GQA, so in theory it seems it should be able to run unmodified. |
Email below with tracking links stripped. Same as llama-1 for the most part. Now if it would actually download..... You’re all set to start building with Llama 2. The models listed below are now available to you as a commercial license holder. By downloading a model, you are agreeing to the terms and conditions of the license, acceptable use policy and Meta’s privacy policy. Model weights available:
With each model download, you’ll receive a copy of the Llama 2 Community License and Acceptable Use Policy, and can find all other information on the model and code on GitHub. How to download the models:
The unique custom URL provided will remain valid for model downloads for 24 hours, and requests can be submitted multiple times. Helpful tips: You can find additional information about how to responsibly deploy Llama models in our Responsible Use Guide. If you need to report issues:
Subscribe to get the latest updates on Llama and Meta AI. Meta’s GenAI Team |
anyone else also randomly getting
for the small files? but |
I tried the 7B and it seems to be working fine, with cuda acceleration as well. |
I genuinely just think their servers are a bit overloaded given what I see posted here. It's a big release |
Yeah the GGML models are on hf now. |
Thebloke is a wizard O_O |
These worked as-is for me |
Holy heck what is this dude's upload speed? I'm watching https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/tree/main fill in live, they uploading gigabytes of model per minute! |
Wouldn't be surprised if he's uploading from a service like AWS or Azure, those have insane bandwidth available. |
As in, renting a VPS or dedicated server just to quantize + upload? (actually, come to think of it, that is an official recommendation by huggingface, wouldn't be surprised...) |
Depends on if you're using the quantised or non-quantised version as well, neither of you two posted which model you're using so comparing doesn't make sense :p |
Quantized. I'm using llama-2-13b.ggmlv3.q4_1.bin |
q4_0 should be even faster for only slightly less accuracy |
iirc q4_1 has an outdated perf/size tradeoff, use one of the kquants instead. (or q4_0) |
I was using @TheBloke's quantized 7B model. Just passed the args |
I think I have a 70B prototype here: #2276 Needs some more work and not 100% sure it is correct, but text generation looks coherent. |
Note #2276 breaks non-GQA models:
|
For clarity, it uses |
I made a simple change to main to add BOS. diff --git a/examples/main/main.cpp b/examples/main/main.cpp
index bcbcf12..5906cde 100644
--- a/examples/main/main.cpp
+++ b/examples/main/main.cpp
@@ -605,6 +605,8 @@ int main(int argc, char ** argv) {
// replace end of text token with newline token when in interactive mode
if (id == llama_token_eos() && params.interactive && !params.instruct) {
id = llama_token_newline.front();
+ embd_inp.push_back(llama_token_bos());
+ is_interacting = true;
if (params.antiprompt.size() != 0) {
// tokenize and inject first reverse prompt
const auto first_antiprompt = ::llama_tokenize(ctx, params.antiprompt.front(), false); and run it like ./main -m "$MODEL" -c 4096 -n -1 --in-prefix ' [INST] ' --in-suffix ' [/INST]' -i -p \
"[INST] <<SYS>>
$SYSTEM
<</SYS>>
$FIRST_MESSAGE [/INST]" I don't know if we want an argument like Regarding |
I think |
If you have Git Bash installed, you can run the .sh file from the Git Bash command line with: |
Those are hard coded for the instruct mode
|
Global launch, llama2-map module library frame composition 【23-7-20】全球首发,llama2-map模块库架构图 |
@ziwang-com those are just callgraphs for the python code. I'm sorry, but the python code already is simple to read as is, we don't really need those images. (also imho they feel harder to read than the python code) |
Would it be possible to move them into the model file? That would solve the issue of different models having different prompt formats |
Is Meta tokenizer identical to llama_cpp tokenizer? I think it should be. But I'm having a issue while decoding/encoding. |
for llama-2-chat, #2304 |
and server, #2306 |
70B support should be ready to merge in #2276 Btw, I did some tests with 7Bv2 and the generated texts from short prompts using |
It doesn't work with the following input:
The error is |
It worked in the vanilla case for me, but got similar error when I run the binary from "make LLAMA_CLBLAST=1". "-gqa 8" was added in both cases. |
I actually do use |
@kurnevsky I am having same problem, are you able to fix it? |
See #3002. Known workarounds are to not use the OpenCL backend with LLaMA 2, or to not use k-quants (Q*_K). |
@tikikun What do you mean to add the llama 2 model when this repo about the llama model? Also on the main page why does it say "Supported models:" and then lists a bunch of other LLMs when this repo is just about llama? |
LLaMA v2 and many other models are currently supported by |
What do you mean that it's currently supported. Isn't llama.cpp just about
llama 1?
…On Wed., Oct. 18, 2023, 3:32 a.m. Georgi Gerganov, ***@***.***> wrote:
Closed #2262 <#2262> as
completed.
—
Reply to this email directly, view it on GitHub
<#2262 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQF3ZKSPPHPZHZXUABYEO5DX76AXTAVCNFSM6AAAAAA2OVCXFGVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJQGY4DQNZRG44DMMI>
.
You are receiving this because you commented.Message ID: <ggerganov/llama.
***@***.***>
|
No, |
Meta just released llama 2 model, allowing commercial usage
https://ai.meta.com/resources/models-and-libraries/llama/
I have checked the model implementation and it seems different from llama_v1, maybe need a re-implementation
The text was updated successfully, but these errors were encountered: