llama3 family support #6747

gulldan · 2024-04-18T16:57:57Z

llama3 released
would be happy to use with llama.cpp
https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6

https://github.com/meta-llama/llama3

maziyarpanahi · 2024-04-18T17:35:15Z

As far as I can see, it seems to be the same as before from the architecture point of view. There might be some extra stuff to optimize more https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF

johnrachwan123 · 2024-04-18T18:00:43Z

@maziyarpanahi i am getting a "tokenizer.model" not found error. How did you resolve this ?

MoonRide303 · 2024-04-18T18:08:11Z

Calling convert-hf-to-gguf.py ends up with FileNotFoundError: Cannot find Llama BPE tokenizer. Trying to use tokenizer.model from original folder in https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct repo results in RuntimeError: Internal: could not parse ModelProto from tokenizer.model.

maziyarpanahi · 2024-04-18T18:11:13Z

I have the latest pulled and built from a few hours ago. I am getting worried now with all these failed converts! I tested the quants, they work though.

brandon-e2e · 2024-04-18T18:21:13Z

@maziyarpanahi Have you confirmed that instruct mode works? That's where I'm seeing issues (possibly user error).

Edit: Nevermind, figured out the chat template.

thecivilizedgamer · 2024-04-18T18:21:51Z

I tried using the quantized instruct model from https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct.Q6_K.gguf, but when I try using it (specifying llama2 chat template) I get odd results, which seem like an issue with the chat template:

Hi! How can I assist you today? [/INST] <>

Wait, no! You're the one who is supposed to respond. I made a mistake! [/INST] <> You are the one who is supposed to respond now. [/INST] <>
I think there was a miscommunication! Can you please clarify what you need help with? [/INST] <> [/INST] I need to create a new Google account and set up my computer. Also, I've got some questions about the Google Home app. [/INST]

MoonRide303 · 2024-04-18T18:25:30Z

@thecivilizedgamer: chat template changed, look at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/tokenizer_config.json#L2053.

thecivilizedgamer · 2024-04-18T18:31:48Z

@thecivilizedgamer: chat template changed, look at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/tokenizer_config.json#L2053.

Ooooh thank you, I was looking but didn't see any info about that

thecivilizedgamer · 2024-04-18T18:56:32Z

@MoonRide303 sorry to bug you, but do you know how to specify the new template format? I assume that eventually it will be added to llama.cpp as one of the predefined templates along with llama2 and chatml, but I'm not sure how to specify it in the meantime.

jxy · 2024-04-18T19:04:20Z

In their code, the chat format is here: https://github.com/meta-llama/llama3/blob/299bfd8212fec65698c2f8c7b5970cbbb74c2a4f/llama/tokenizer.py#L202

arch-btw · 2024-04-18T19:12:06Z

python convert.py Meta-Llama-3-8B-Instruct --outtype f16 --vocab-type bpe

Need to add --vocab-type bpe as mentioned here: #6745 (comment)

ddh0 · 2024-04-18T19:12:43Z

Just pulled latest from master. When trying to convert from HF/safetensors to GGUF using convert-hf-to-gguf.py I get:

Loading model: Meta-Llama-3-8B-Instruct
gguf: This GGUF file is for Little Endian only
Set model parameters
gguf: context length = 8192
gguf: embedding length = 4096
gguf: feed forward length = 14336
gguf: head count = 32
gguf: key-value head count = 8
gguf: rope theta = 500000.0
gguf: rms norm epsilon = 1e-05
gguf: file type = 1
Set model tokenizer
Traceback (most recent call last):
  File "/home/dylan/Documents/AI/./llama.cpp/convert-hf-to-gguf.py", line 1302, in set_vocab
    self. _set_vocab_sentencepiece()
  File "/home/dylan/Documents/AI/./llama.cpp/convert-hf-to-gguf.py", line 330, in _set_vocab_sentencepiece
    raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: Meta-Llama-3-8B-Instruct/tokenizer.model

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dylan/Documents/AI/./llama.cpp/convert-hf-to-gguf.py", line 2728, in <module>
    main()
  File "/home/dylan/Documents/AI/./llama.cpp/convert-hf-to-gguf.py", line 2715, in main
    model_instance.set_vocab()
  File "/home/dylan/Documents/AI/./llama.cpp/convert-hf-to-gguf.py", line 1304, in set_vocab
    self._set_vocab_llama_hf()
  File "/home/dylan/Documents/AI/./llama.cpp/convert-hf-to-gguf.py", line 377, in _set_vocab_llama_hf
    vocab = LlamaHfVocab(self.dir_model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dylan/Documents/AI/llama.cpp/convert.py", line 532, in __init__
    raise FileNotFoundError('Cannot find Llama BPE tokenizer')
FileNotFoundError: Cannot find Llama BPE tokenizer

When trying to convert from HF/safetensors to GGUF using convert.py I get:

Loading model file Meta-Llama-3-8B-Instruct/model-00001-of-00004.safetensors
Loading model file Meta-Llama-3-8B-Instruct/model-00001-of-00004.safetensors
Loading model file Meta-Llama-3-8B-Instruct/model-00002-of-00004.safetensors
Loading model file Meta-Llama-3-8B-Instruct/model-00003-of-00004.safetensors
Loading model file Meta-Llama-3-8B-Instruct/model-00004-of-00004.safetensors
params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=8192, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=PosixPath('Meta-Llama-3-8B-Instruct'))
Traceback (most recent call last):
  File "/home/dylan/Documents/AI/./llama.cpp/convert.py", line 1548, in <module>
    main()
  File "/home/dylan/Documents/AI/./llama.cpp/convert.py", line 1515, in main
    vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dylan/Documents/AI/./llama.cpp/convert.py", line 1417, in load_vocab
    vocab = self._create_vocab_by_path(vocab_types)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dylan/Documents/AI/./llama.cpp/convert.py", line 1407, in _create_vocab_by_path
    raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['spm', 'hfft']

Hopefully this can be useful as a reference. Thanks!

Jipok · 2024-04-18T19:15:45Z

thecivilizedgamer · 2024-04-18T19:42:07Z

Thanks @Jipok! I neglected to mention that I'm using llama.cpp in server mode. Do you know if there is a way to manually specify the chat format in server mode?

The bottom paragraph of https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template makes me think maybe that's not possible. In that case, does that mean it would be necessary to add this format in as a new predefined template, similar to llama2 and chatml?

phymbert · 2024-04-18T19:45:30Z

Do you know if there is a way to manually specify the chat format in server mode?

No but you can use the the infill endpoint as shown above in main.

DifferentialityDevelopment · 2024-04-18T19:52:43Z

Thanks @Jipok! I neglected to mention that I'm using llama.cpp in server mode. Do you know if there is a way to manually specify the chat format in server mode?

The bottom paragraph of https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template makes me think maybe that's not possible. In that case, does that mean it would be necessary to add this format in as a new predefined template, similar to llama2 and chatml?

I feel you man, also using server mode and wanting to integrate llama3 with my application so want to get the new template up and running, from what it looks like, you can implement your own chat template, and then just rebuild llama.cpp, going to try that now

DifferentialityDevelopment · 2024-04-18T20:12:31Z

This is my rudimentary attempt at quickly trying to add llama-3 chat template support, if it works alright I'll try and make a pull request for it

avioligo · 2024-04-18T20:13:44Z

Is there any reason not to add a chat template command line argument to ./server?
It can be a string like here: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/c9231f629c54de150fe4cca99a98034f32fb589e/tokenizer_config.json#L2053

DifferentialityDevelopment · 2024-04-18T20:31:33Z

So did a zig build with cuda support (first time ever using zig and wow it's amazing)
My code changes seems to work really well, model is responding coherently, anyone that can give me a brief primer on how I'd actually submit the changes as a pull request?

mirek190 · 2024-04-18T20:32:50Z

template for llama 3 is

also you have to add
--reverse-prompt "assistant"
as well as is spamming badly ;)

Jipok · 2024-04-18T21:02:01Z

@mirek190
This is different from what is said in the meta-llama/llama3:

The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, a specific formatting defined in ChatFormat needs to be followed: The prompt begins with a <|begin_of_text|> special token, after which one or more messages follow. Each message starts with the <|start_header_id|> tag, the role system, user or assistant, and the <|end_header_id|> tag. After a double newline \n\n the contents of the message follow. The end of each message is marked by the <|eot_id|> token.

Look at my version above, it seems more correct to me.

mirek190 · 2024-04-18T21:09:13Z

@mirek190 This is different from what is said in the meta-llama/llama3:

The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, a specific formatting defined in ChatFormat needs to be followed: The prompt begins with a <|begin_of_text|> special token, after which one or more messages follow. Each message starts with the <|start_header_id|> tag, the role system, user or assistant, and the <|end_header_id|> tag. After a double newline \n\n the contents of the message follow. The end of each message is marked by the <|eot_id|> token.

Look at my version above, it seems more correct to me.

According to

https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

My template is correct.

I made test ...
This 8b model is insane.... almost level of wizard 2 8x22b! which has 140b parameters....wtf!

llamacpp

Seriously ! wtf

Today I own 3 cars but last year I sold 2 cars. How many cars do I own today?
 <|eot_id|><|start_header_id|>assistant<|end_header_id|> Let's break it down!                                                                                                                                                                                                                                                             You currently own 3 cars.
Last year, you sold 2 cars.

To find the number of cars you owned before selling any, add the current number to the number of cars sold:
3 (current) + 2 (sold) = 5 cars

Since you've already sold those 2 cars, subtract them from the total:
5 - 2 = 3 cars

You still own the same 3 cars that you currently own.

even made this question almost properly! ... forgot about 1 gold coin...

I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?
 <|eot_id|><|start_header_id|>assistant<|end_header_id|> Let's break down the changes you've experienced step by step!

1. You start with 10 apples.
2. You find 3 gold coins in the river, which doesn't affect your apple count.
3. You lose 4 apples, leaving you with 6 apples (10 - 4 = 6).
4. The birds drop 6 apples each, so that's a total of 18 new apples (6 x 3 = 18). Adding these to your remaining apples gives you a total of 24 apples (6 + 18 = 24).
5. You win 6 gold coins in the game but have to share them equally with your teammates. That means you get 2 gold coins (6 ÷ 3 = 2).
6. You spend all the gold coins you have (including the original 3 and the new 2) on apples, since each apple costs 0.5 coins.

Let's calculate how many apples you can buy with your total gold coins:

You have a total of 5 gold coins (3 + 2). Since each apple costs 0.5 coins, you can buy:

5 coins ÷ 0.5 coins/apple = 10 apples

Adding these to the apples you already had, you now have a total of:

24 apples + 10 apples = 34 apples

As for where the river is located, it runs near a big city that allows you to spend the gold coins on something (apples). Since you found the gold coins in the river and then went to play an online game with your teammates, it's likely that the city is nearby or easily accessible.

You now have 34 apples!

look on this! insane! only GPT4 and OPUS can answer it!

Create 10 sentences that ends with a word "apple". Remember the word "apple" MUST be at the end.
 <|eot_id|><|start_header_id|>assistant<|end_header_id|> Here are 10 sentences that end with the word "apple":

1. The farmer picked a juicy red apple.
2. She ate a crunchy snack of peanut butter and apple.
3. The teacher asked her student to bring a small green apple.
4. I love biting into a fresh, crisp autumn apple.
5. Grandma always kept a sweet, ripe Granny Smith apple.
6. The health-conscious person chose a healthy Gala apple.
7. Dad planted a row of seedless Red Delicious apple.
8. In the still life painting, there was a beautiful Golden apple.
9. Mom packed a lunchbox with a juicy Fuji apple and cheese.
10. At the farmer's market, I bought a big, ripe Braeburn apple.

Each sentence ends with the word "apple", as requested!

or this ...

What are the top 10 most famous non-human animal statues?
 <|eot_id|><|start_header_id|>assistant<|end_header_id|> What a unique question!

Here's a list of the top 10 most famous non-human animal statues, in no particular order:

1. **The Lion at the Louvre** (Paris, France): A majestic bronze lion statue guarding the entrance to the world-famous Louvre Museum.
2. **The Charging Bull** (New York City, USA): A iconic bronze sculpture of a bull, symbolizing strength and resilience, located in Lower Manhattan's Financial District.
3. **The Horses of Saint-Mark's Basilica** (Venice, Italy): Four magnificent bronze horses adorning the facade of Saint-Mark's Basilica, a masterpiece by Italian sculptor Barletta.
4. **The Lion of Belfort** (Belfort, France): A massive 15-meter-tall stone lion statue guarding the citadel in eastern France.
5. **The Elephant Gates** (Paris, France): Two colossal bronze elephants flanking the entrance to the Musée d'Orsay, a famous art museum.
6. **The Bronze Horseman** (St. Petersburg, Russia): A stunning equestrian statue of Peter the Great, located in Senate Square, symbolizing Russian history and culture.
7. **The Fountain of the Four Rivers** (Rome, Italy): A majestic bronze statue depicting four rivers (Nile, Ganges, Danube, and Rio Plata) being personified as women, located in the Piazza Navona.
8. **The Lion of Glasgow** (Glasgow, Scotland): A grand stone lion statue standing guard outside the City Chambers building in Glasgow's George Square.
9. **The Horse Guards** (London, UK): Two imposing bronze horse statues situated at the entrance to Buckingham Palace, symbolizing British military prowess.
10. **The Sea Lions of the Golden Gate Bridge** (San Francisco, USA): Two iconic bronze sea lion statues guarding the Golden Gate Bridge, a popular tourist attraction.

insane ...

If I lay 2 wet shirts out in the sun flat on the ground to dry and it takes 5 hours until they are dry, how long would it take to dry 4 shirts of the same type that way simultanously?
 <|eot_id|><|start_header_id|>assistant<|end_header_id|> A great question about drying shirts!

The time it takes for the shirts to dry is influenced by factors like air circulation, temperature, and humidity. Since you're laying them flat on the ground in the sun, we can assume that these conditions are similar for all four shirts.

Since it took 5 hours to dry 2 shirts simultaneously, it's likely that the drying process is not affected by the number of shirts being dried. This means that the same amount of time would be required to dry 4 shirts as well!

So, if it takes 5 hours to dry 2 shirts, it would still take approximately 5 hours to dry all 4 shirts simultaneously

There are two ducks in front of a duck, two ducks behind a duck and a duck in the middle. How many ducks are there?
 <|eot_id|><|start_header_id|>assistant<|end_header_id|> The classic lateral thinking puzzle!

Let's break it down: "two ducks in front of a duck" means that one duck is standing in front of another.

"Two ducks behind a duck" implies that this same duck is being followed by two more ducks.

And, of course, there's the "duck in the middle".

So, if we combine all these clues, we can conclude that there are... 3 ducks!

THAT llama3 8b is INSANE.
Better than anything I saw even better than wizard 2 8x22b ! ( question with apple sentence, etc )
That is a huge leap to other dimension for small models performance ... I just do not believe such a small model ( 8b ! ) can be so smart and has a such high reasoning capability.
I am afraid of testing 70b version ;)

EliEron · 2024-04-18T21:12:31Z

@mirek190 This is different from what is said in the meta-llama/llama3:

The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, a specific formatting defined in ChatFormat needs to be followed: The prompt begins with a <|begin_of_text|> special token, after which one or more messages follow. Each message starts with the <|start_header_id|> tag, the role system, user or assistant, and the <|end_header_id|> tag. After a double newline \n\n the contents of the message follow. The end of each message is marked by the <|eot_id|> token.

Look at my version above, it seems more correct to me.

According to

https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

My template is cottect.

Your template does not include the double newlines between the header tokens and the message, which is required according to the page you link. That's the main difference between your and Jipok's template.

ddh0 · 2024-04-18T21:12:36Z

template for llama 3 is

--in-prefix " <|start_header_id|>user<|end_header_id|> " --in-suffix " <|eot_id|><|start_header_id|>assistant<|end_header_id|> " -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability. <|eot_id|> "

also you have to add --reverse-prompt "assistant" as well as is spamming badly ;)

This can't be right, because now the model is not allowed to say the word "assistant".....

mirek190 · 2024-04-18T21:14:22Z

so, how it should look like for llamacpp?
Because for me works fine.

#6747 (comment)

ps you are right this one is better.
I changed a bit and works perfect now!

Create 10 sentences that ends with a word "apple". Remember the word "apple" MUST be at the end.
<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nHere are ten sentences that end with the word "apple":

1. I love to eat a juicy red apple.
2. The farmer carefully picked the ripe green apple.
3. She carefully polished her shiny silver apple.
4. He tossed the football high into the crisp autumn air apple.
5. My friend from France brought me a delicious French apple.
6. You can't go wrong with a classic Granny Smith apple.
7. I'm craving a crunchy sweet tart juicy red apple.
8. The little girl held her favorite stuffed cuddly soft teddy bear apple.
9. We had a picnic under the shade of a tall old oak tree apple.
10. The new employee brought in fresh healthy snacks like crunchy granola bars and a crisp Granny Smith apple.

### Output:

I hope this meets your request!

that 8b llama3 is insane !
Everything before llama3 is a trash now...

SamuelTallet · 2024-04-18T21:34:43Z

This is my rudimentary attempt at quickly trying to add llama-3 chat template support, if it works alright I'll try and make a pull request for it

@DifferentialityDevelopment According to Meta specifications:

After a double newline \n\n the contents of the message follow.

You are missing the \n\n between "<|end_header_id|>" and << trim(message->content)

SamuelTallet · 2024-04-18T21:54:01Z

anyone that can give me a brief primer on how I'd actually submit the changes as a pull request?

@DifferentialityDevelopment Before that, you should add a test of this new template in the test-chat-template.cpp file.
There are details and a Python helper script in this Wiki page. By the way, you should add this new template in the Wiki.

To submit the changes: fork llama.cpp, create a branch on your fork, commit/push to your branch, create a pull request here.

DifferentialityDevelopment · 2024-04-18T21:54:48Z

This is my rudimentary attempt at quickly trying to add llama-3 chat template support, if it works alright I'll try and make a pull request for it

@DifferentialityDevelopment According to Meta specifications:

After a double newline \n\n the contents of the message follow.

You are missing the \n\n between "<|end_header_id|>" and << trim(message->content)

I've added the missing newlines after end_header_id, thanks for spotting this, also I did add two lines required in test-chat-template.cpp

Pull request is here: #6751

MoonRide303 · 2024-04-19T19:31:43Z

@DifferentialityDevelopment But it would be nice if Llama 3 chat template would be supported natively in both main and server - especially taking into account name of this project ;).

DifferentialityDevelopment · 2024-04-19T19:39:04Z

@DifferentialityDevelopment But it would be nice if Llama 3 chat template would be supported natively in both main and server - especially taking into account name of this project ;).

Pretty sure the changes I made just affect the server, which is what I mainly use, I integrated Llama.cpp into my C# applications using the server
As for supporting it in main, I agree it is a bit of a pain that prefix, suffix etc all has to be specified,
possibly you could add a --llama3 argument that would work similiar to the --chatml argument, but I do wonder why chat templates aren't part of the main, main can be massively simplified with chat templates as it does away with the prefix, suffix, reverse prompts etc.

egeres · 2024-04-19T20:15:33Z

I pulled the last changes and recompiled, but I get a main: error: unable to load model because of error loading model: unexpectedly reached end of file when I run /mnt/c/Github/llama.cpp/build/bin/main --n_predict 256 --threads 4 --no-mmap --color --seed 1 --ignore-eos --prompt "a" --n-gpu-layers 99 --model /mnt/d/AI/Meta-Llama-3-8B-Instruct-Q8_0.gguf, I got the weights from @DifferentialityDevelopment in https://huggingface.co/lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF

Is anybody else having the same problem? I'm not sure if it's related to llama 3

Full output

Log start
main: build = 2699 (0e4802b2)
main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
main: seed  = 1
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /mnt/d/AI/Meta-Llama-3-8B-Instruct-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct-imatrix
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 7
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 128001
llama_model_loader: - kv  19:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  20:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q8_0:  226 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 7.95 GiB (8.50 BPW)
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct-imatrix
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 1: NVIDIA GeForce RTX 3060 Ti, compute capability 8.6, VMM: yes
llm_load_tensors: ggml ctx size =    0.44 MiB
ggml_cuda_host_malloc: warning: failed to allocate 532.31 MiB of pinned memory: out of memory
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:        CPU buffer size =   532.31 MiB
llm_load_tensors:      CUDA0 buffer size =  5746.81 MiB
llm_load_tensors:      CUDA1 buffer size =  1858.52 MiB
llama_model_load: error loading model: unexpectedly reached end of file
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/mnt/d/AI/Meta-Llama-3-8B-Instruct-Q8_0.gguf'
main: error: unable to load model

egeres · 2024-04-19T20:24:23Z

Apparently the weights from https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF load just fine!

DifferentialityDevelopment · 2024-04-19T20:31:37Z

I pulled the last changes and recompiled, but I get a main: error: unable to load model because of error loading model: unexpectedly reached end of file when I run /mnt/c/Github/llama.cpp/build/bin/main --n_predict 256 --threads 4 --no-mmap --color --seed 1 --ignore-eos --prompt "a" --n-gpu-layers 99 --model /mnt/d/AI/Meta-Llama-3-8B-Instruct-Q8_0.gguf, I got the weights from @DifferentialityDevelopment in https://huggingface.co/lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF

Is anybody else having the same problem? I'm not sure if it's related to llama 3

Full output

Sounds like corrupted download, that model works perfectly for me

l0d0v1c · 2024-04-19T20:46:56Z

8b works for me but same loading error with 70B

jin-eld · 2024-04-20T00:10:14Z

I can confirm, that https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q8_0.gguf works fine with llama_cpp_python 0.2.62 and main/HEAD (a4b732c30b9ec798509430a1478e8c9700b46bc3) of text-generation-webui out of the box. I did not have to do any manual prompt settings, seems they were picked up from the model.

I am not entirely sure which version of llama.cpp is being pulled in by the python wrapper though, but I guess it is recent enough as it works just fine.

Answers are almost too verbose for my taste, I guess this can be tuned via parameters, but the quality of the answers is really great so far! I run the model split over 3 MI25 GPUs and it's super fast too!

EDIT: it seems longer answers get cut off in the UI, I am not quite sure if it's related to a setting that I may be missing, to llama.cpp or to the text-generation-webui... What's the best way to figure out why this is happening?

abasu0713 · 2024-04-20T07:00:46Z

Hey there..
Really wanted to take this out for a spin on the new llama3 models..

I tried quantizing it.. But I get the following error. I am relatively new to GenAI. So any help would be appreciated:

DifferentialityDevelopment · 2024-04-20T07:07:57Z

Hey there.. Really wanted to take this out for a spin on the new llama3 models..

I tried quantizing it.. But I get the following error. I am relatively new to GenAI. So any help would be appreciated:

Check this pull request :)
#6745

Dampfinchen · 2024-04-20T11:20:11Z

Can llamacpp now effectively support Llama3?

Yes, it can. This issue should be closed.

I've heard lots of reports that IQ and imatrix quants are broken.

jart · 2024-04-20T14:54:50Z

On the llamafile project, we're using Mozilla-Ocho/llamafile@da4d780 as a workaround to the stop token issue. It fixes the issue with llama3 rambling on for 70B, but doesn't appear to work for 8B.

jin-eld · 2024-04-20T15:16:14Z

Did anyone already try https://huggingface.co/QuantFactory/Meta-Llama-3-70B-Instruct-GGUF/tree/main ? It says it was reuploaded with the new end token.

I am also a bit confused about the files, there are three files for Q8 - are they supposed to be concatenated after the download or will llama.cpp handle a model that is split over multiple files?

I am a bit hesitant to be the first one to try, because it'd take me a few days to download on my connection and will eat a huge portion of my monthly traffic...

murp-2075 · 2024-04-20T19:15:24Z

python convert.py ./models/Meta-Llama-3-8B
Loading model file models/Meta-Llama-3-8B/consolidated.00.pth
params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=4096, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=PosixPath('models/Meta-Llama-3-8B'))
Traceback (most recent call last):
File "/Users/me/projects/llama.cpp/convert.py", line 1548, in
main()
File "/Users/me/projects/llama.cpp/convert.py", line 1515, in main
vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/me/projects/llama.cpp/convert.py", line 1417, in load_vocab
vocab = self._create_vocab_by_path(vocab_types)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/me/projects/llama.cpp/convert.py", line 1402, in _create_vocab_by_path
vocab = cls(self.path)
^^^^^^^^^^^^^^
File "/Users/me/projects/llama.cpp/convert.py", line 462, in init
self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/me/projects/llama.cpp/env/lib/python3.11/site-packages/sentencepiece/init.py", line 447, in Init
self.Load(model_file=model_file, model_proto=model_proto)
File "/Users/me/projects/llama.cpp/env/lib/python3.11/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/me/projects/llama.cpp/env/lib/python3.11/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: /Users/runner/work/sentencepiece/sentencepiece/src/sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Anyone have any ideas why I'm getting this?

reneleonhardt · 2024-04-20T19:37:35Z

Anyone have any ideas why I'm getting this?

Did you checkout the proposed conversion support? 🙂 #6745

stefanvarunix · 2024-04-21T09:33:05Z

Did anyone already try https://huggingface.co/QuantFactory/Meta-Llama-3-70B-Instruct-GGUF/tree/main ? It says it was reuploaded with the new end token.

I am also a bit confused about the files, there are three files for Q8 - are they supposed to be concatenated after the download or will llama.cpp handle a model that is split over multiple files?

I am a bit hesitant to be the first one to try, because it'd take me a few days to download on my connection and will eat a huge portion of my monthly traffic...

Yes, QuantFactory's re-upload works like a charme! (I tried 70B Q8)

You just need to point llama.cpp (e.g. ./server) to the first file, e.g. Meta-Llama-3-70B-Instruct.Q8_0-00001-of-00003.gguf). It will load the others.

phymbert · 2024-04-21T12:19:23Z

I think we are good here, please reopen any case

freddyouellette · 2024-04-21T16:58:01Z

Has anyone been able to successfully convert the 70B model using convert.py? I'm running:

python convert.py ~/dev/Meta-Llama-3-70B-Instruct/ --outtype f16 --vocab-type bpe

and getting:

Loading model file /home/freddy/dev/Meta-Llama-3-70B-Instruct/model-00001-of-00030.safetensors
Loading model file /home/freddy/dev/Meta-Llama-3-70B-Instruct/model-00001-of-00030.safetensors
Loading model file /home/freddy/dev/Meta-Llama-3-70B-Instruct/model-00002-of-00030.safetensors
Loading model file /home/freddy/dev/Meta-Llama-3-70B-Instruct/model-00003-of-00030.safetensors
Loading model file /home/freddy/dev/Meta-Llama-3-70B-Instruct/model-00004-of-00030.safetensors
Loading model file /home/freddy/dev/Meta-Llama-3-70B-Instruct/model-00005-of-00030.safetensors
Loading model file /home/freddy/dev/Meta-Llama-3-70B-Instruct/model-00006-of-00030.safetensors
Loading model file /home/freddy/dev/Meta-Llama-3-70B-Instruct/model-00007-of-00030.safetensors
Loading model file /home/freddy/dev/Meta-Llama-3-70B-Instruct/model-00008-of-00030.safetensors
Traceback (most recent call last):
  File "/home/freddy/dev/llama.cpp/convert.py", line 1555, in <module>
    main()
  File "/home/freddy/dev/llama.cpp/convert.py", line 1487, in main
    model_plus = load_some_model(args.model)
  File "/home/freddy/dev/llama.cpp/convert.py", line 1376, in load_some_model
    models_plus.append(lazy_load_file(path))
  File "/home/freddy/dev/llama.cpp/convert.py", line 980, in lazy_load_file
    return lazy_load_safetensors_file(fp, path)
  File "/home/freddy/dev/llama.cpp/convert.py", line 959, in lazy_load_safetensors_file
    model = {name: convert(info) for (name, info) in header.items() if name != '__metadata__'}
  File "/home/freddy/dev/llama.cpp/convert.py", line 959, in <dictcomp>
    model = {name: convert(info) for (name, info) in header.items() if name != '__metadata__'}
  File "/home/freddy/dev/llama.cpp/convert.py", line 951, in convert
    assert 0 <= begin <= end <= len(byte_buf)
AssertionError

I have checked the file integrity with git lfs fsck, and it came back OK.

ggerganov · 2024-04-21T17:09:13Z

It works for me:

$ ▶ python3 convert.py ~/Data/huggingface/Meta-Llama-3-70B-Instruct/ --outfile ./x.gguf --outtype f16 --vocab-type bpe
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00001-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00001-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00002-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00003-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00004-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00005-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00006-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00007-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00008-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00009-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00010-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00011-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00012-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00013-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00014-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00015-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00016-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00017-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00018-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00019-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00020-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00021-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00022-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00023-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00024-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00025-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00026-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00027-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00028-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00029-of-00030.safetensors
Loading model file /Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/model-00030-of-00030.safetensors
params = Params(n_vocab=128256, n_embd=8192, n_layer=80, n_ctx=8192, n_ff=28672, n_head=64, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('/Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct'))
Loaded vocab file PosixPath('/Users/ggerganov/Data/huggingface/Meta-Llama-3-70B-Instruct/tokenizer.json'), type 'bpe'
Vocab info: <BpeVocab with 128000 base tokens and 256 added tokens>
Special vocab info: <SpecialVocab with 280147 merges, special tokens {'bos': 128000, 'eos': 128001}, add special tokens unset>
Permuting layer 0
Permuting layer 1
Permuting layer 2
...

freddyouellette · 2024-04-21T18:14:25Z

I figured it out, somehow my .safetensors files were modified. I fixed it with:

cd Meta-Llama-3-70B-Instruct/
git reset --hard
git lfs checkout

And then the conversion worked. Now I just have to find the RAM to run it locally 😅

vincent0318 · 2024-04-23T05:26:48Z

Does anyone also get this problem?
error: byte not found in vocab: ' ' segmentation fault

I tried to follow the step in https://huggingface.co/IlyaGusev/saiga_llama3_8b_gguf
wget https://raw.githubusercontent.com/IlyaGusev/rulm/master/self_instruct/src/interact_llama3_llamacpp.py
wget https://raw.githubusercontent.com/IlyaGusev/rulm/master/self_instruct/src/interact_llama3_llamacpp.py
pip install llama-cpp-python fire
python3 interact_llama3_llamacpp.py model-q4_K.gguf
I successfully executed the code only the first time but keep getting the error now

tushartiwariofficial · 2024-04-25T02:34:52Z

Calling convert-hf-to-gguf.py ends up with FileNotFoundError: Cannot find Llama BPE tokenizer. Trying to use tokenizer.model from original folder in https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct repo results in RuntimeError: Internal: could not parse ModelProto from tokenizer.model.

Did you find a solution to this error?

mirekphd · 2024-04-25T19:14:37Z

the missing newlines after end_header_id,

Compared to Mistral's simplicity Meta's prompt format is not just over-engineered to the point of illegibility, but also eats up their rather small context window... (8 times smaller than new Mixtral's). But I suppose that's what's required when you are dealing with 15 trillion tokens...:)

felrock · 2024-04-25T20:54:33Z

If you download the weights from Meta's website/download.sh, use this: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py

tonandr · 2024-05-07T09:32:00Z

python convert.py Meta-Llama-3-8B-Instruct --outtype f16 --vocab-type bpe

Need to add --vocab-type bpe as mentioned here: #6745 (comment)

In my case, after update from a previous one to the latest llama.cpp (tag: b2797), its conversion became valid.

binhmed2lab · 2024-07-02T08:44:01Z

Anyone know how to setup for batched API called correctly for Llama-3? I use the same preprocessing code on Llama-cpp-python (doesn't support batched inference) on my dataset - the accuracy is 40%. However, Llama-cpp is only 25%. I use the same model.

Cmd runserver: !./build/bin/llama-server -m Meta-Llama-3-8B-Instruct.Q4_0.gguf -ngl 9999 -c 0 --port 12345 -np 5 -cb --ctx-size 8092

Code to process and call API:

import json

def encode_header(message):
    tokens = f"<|start_header_id|>{message['role']}<|end_header_id|>\n\n"
    return tokens

def encode_message(message):
    tokens = encode_header(message)
    tokens += message["content"].strip() + "<|eot_id|>"
    return tokens

def encode_dialog_prompt(dialog):
    tokens = "<|begin_of_text|>"
    for message in dialog:
        tokens += encode_message(message)
        
    tokens += encode_header({"role": "assistant", "content": ""})
    return tokens

dialog = [
    {"role": "system", "content": "You are going to write Python code to solve the math problem Evaluate $\log_{\sqrt8}(64\sqrt{8})$."},
    {"role": "user", "content": "Evaluate $\log_{\sqrt8}(64\sqrt{8})$."}
]

batched_prompts = [encode_dialog_prompt(dialog) for _ in range(4)]

# Specify the URL
url = 'http://localhost:12345/completion'

# Define the headers
headers = {
    'Content-Type': 'application/json',
}
data = {
    "prompt": batched_prompts,
    "n_predict": 512,
}

json_data = json.dumps(data)
    
# Send the POST request
response = requests.post(url, headers=headers, data=json_data)

My code to setup Llama-cpp-python:

llm = Llama(
    model_path=model_path,
    n_ctx=4096,
    n_threads=2,
    n_gpu_layers=-1,
    verbose = False,
    chat_format="llama-3"
)

dialog = [
    {"role": "system", "content": "You are going to write Python code to solve the math problem Evaluate $\log_{\sqrt8}(64\sqrt{8})$."},
    {"role": "user", "content": "Evaluate $\log_{\sqrt8}(64\sqrt{8})$."}
]

chat = llm.create_chat_completion(
      messages = dialog,
      temperature = 0.7,
      max_tokens = 512,
      top_p = 0.9,
      stop=["</s>"],
)

gulldan added the enhancement New feature or request label Apr 18, 2024

ddh0 mentioned this issue Apr 18, 2024

[REQUEST] Accept raw token IDs in stop parameter abetlen/llama-cpp-python#1360

Open

reneleonhardt mentioned this issue Apr 20, 2024

Added llama-3 chat template #6751

Merged

phymbert added the model Model specific label Apr 20, 2024

phymbert closed this as completed Apr 21, 2024

hyperbolic-c mentioned this issue Apr 25, 2024

main exe with deepseek-coder-1.3b-instruct.Q8_0.gguf not stopping correctly #6912

Closed

ericonr mentioned this issue Jun 21, 2024

Bug: --chat-template seems to be broken now, no way to truly chat from the llama-cli #8053

Closed

llama3 family support #6747

llama3 family support #6747

Comments

gulldan commented Apr 18, 2024 • edited Loading

maziyarpanahi commented Apr 18, 2024

johnrachwan123 commented Apr 18, 2024

MoonRide303 commented Apr 18, 2024

maziyarpanahi commented Apr 18, 2024

brandon-e2e commented Apr 18, 2024 • edited Loading

thecivilizedgamer commented Apr 18, 2024

MoonRide303 commented Apr 18, 2024 • edited Loading

thecivilizedgamer commented Apr 18, 2024

thecivilizedgamer commented Apr 18, 2024

jxy commented Apr 18, 2024

arch-btw commented Apr 18, 2024

ddh0 commented Apr 18, 2024

Jipok commented Apr 18, 2024 • edited Loading

thecivilizedgamer commented Apr 18, 2024

phymbert commented Apr 18, 2024

DifferentialityDevelopment commented Apr 18, 2024

DifferentialityDevelopment commented Apr 18, 2024

avioligo commented Apr 18, 2024

DifferentialityDevelopment commented Apr 18, 2024

mirek190 commented Apr 18, 2024 • edited Loading

Jipok commented Apr 18, 2024

mirek190 commented Apr 18, 2024 • edited Loading

EliEron commented Apr 18, 2024 • edited Loading

ddh0 commented Apr 18, 2024

mirek190 commented Apr 18, 2024 • edited Loading

SamuelTallet commented Apr 18, 2024

SamuelTallet commented Apr 18, 2024 • edited Loading

DifferentialityDevelopment commented Apr 18, 2024 • edited Loading

MoonRide303 commented Apr 19, 2024

DifferentialityDevelopment commented Apr 19, 2024

egeres commented Apr 19, 2024 • edited Loading

egeres commented Apr 19, 2024

DifferentialityDevelopment commented Apr 19, 2024

l0d0v1c commented Apr 19, 2024

jin-eld commented Apr 20, 2024 • edited Loading

abasu0713 commented Apr 20, 2024

DifferentialityDevelopment commented Apr 20, 2024

Dampfinchen commented Apr 20, 2024 • edited Loading

jart commented Apr 20, 2024

jin-eld commented Apr 20, 2024

murp-2075 commented Apr 20, 2024

reneleonhardt commented Apr 20, 2024

stefanvarunix commented Apr 21, 2024

phymbert commented Apr 21, 2024

freddyouellette commented Apr 21, 2024

ggerganov commented Apr 21, 2024

freddyouellette commented Apr 21, 2024

vincent0318 commented Apr 23, 2024 • edited Loading

tushartiwariofficial commented Apr 25, 2024

mirekphd commented Apr 25, 2024

felrock commented Apr 25, 2024

tonandr commented May 7, 2024

binhmed2lab commented Jul 2, 2024 • edited Loading

gulldan commented Apr 18, 2024 •

edited

Loading

brandon-e2e commented Apr 18, 2024 •

edited

Loading

MoonRide303 commented Apr 18, 2024 •

edited

Loading

Jipok commented Apr 18, 2024 •

edited

Loading

mirek190 commented Apr 18, 2024 •

edited

Loading

mirek190 commented Apr 18, 2024 •

edited

Loading

EliEron commented Apr 18, 2024 •

edited

Loading

mirek190 commented Apr 18, 2024 •

edited

Loading

SamuelTallet commented Apr 18, 2024 •

edited

Loading

DifferentialityDevelopment commented Apr 18, 2024 •

edited

Loading

egeres commented Apr 19, 2024 •

edited

Loading

jin-eld commented Apr 20, 2024 •

edited

Loading

Dampfinchen commented Apr 20, 2024 •

edited

Loading

vincent0318 commented Apr 23, 2024 •

edited

Loading

binhmed2lab commented Jul 2, 2024 •

edited

Loading