llama : add RWKV models support #846

multimediaconverter · 2023-04-08T06:32:31Z

RWKV (100% RNN) language model, which is the only RNN (as of now) that can match transformers in quality and scaling, while being faster and saves memory.

Info: https://github.com/BlinkDL/ChatRWKV

RWKV is a novel large language model architecture, with the largest model in the family having 14B parameters. In contrast to Transformer with O(n^2) attention, RWKV requires only state from previous step to calculate logits. This makes RWKV very CPU-friendly on large context lenghts.

Experimental GGML port: https://github.com/saharNooby/rwkv.cpp

The lastest "Raven"-series Alpaca-style-tuned RWKV 14B & 7B models are very good.
Online demo: https://huggingface.co/spaces/BlinkDL/Raven-RWKV-7B
Download: https://huggingface.co/BlinkDL/rwkv-4-raven

Edit by @ggerganov:

Adding @BlinkDL's comment below to OP for visibility:

v4 inference: https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_in_150_lines.py

v5 inference: https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_v5_demo.py

fast v4 & v5.2 inference: https://github.com/BlinkDL/ChatRWKV/blob/main/rwkv_pip_package/src/rwkv/model.py

v5.2 1.5B demo (great for its size): https://huggingface.co/spaces/BlinkDL/ChatRWKV-gradio

v5.2 1.5B benchmarks: https://twitter.com/BlinkDL_AI/status/1717543614434402661

a few remarks:

rwkv models have RNN-style "one" mode, and GPT-style "seq" mode

i am actually using exp(-exp(w))

seems it's good to precompute embedding+emb_layernorm in bf16

when using fp16, i am doing /2 every 6 layers, to avoid overflow

Green-Sky · 2023-04-21T15:11:31Z

closing this in favor of ggerganov/ggml#21

also https://github.com/saharNooby/rwkv.cpp seems to be it.

someone13574 · 2023-11-01T14:33:57Z

Now that support for other models is being added directly to llama.cpp, would rwkv support be reconsidered? It would be very nice to support it since support would mean it gets all the benefits that llama.cpp has over a separate project for only rwkv.

ggerganov · 2023-11-01T14:41:08Z

We should try to add it - it will probably be the most different compared to all other arches that we support as it is LSTM based so it will be a good exercise to see how easy it would fit in the existing framework

BlinkDL · 2023-11-01T23:39:24Z

@ggerganov Please check these :)

v4 inference: https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_in_150_lines.py

v5 inference: https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_v5_demo.py

fast v4 & v5.2 inference: https://github.com/BlinkDL/ChatRWKV/blob/main/rwkv_pip_package/src/rwkv/model.py

v5.2 1.5B demo (great for its size): https://huggingface.co/spaces/BlinkDL/ChatRWKV-gradio

v5.2 1.5B benchmarks: https://twitter.com/BlinkDL_AI/status/1717543614434402661

a few remarks:

rwkv models have RNN-style "one" mode, and GPT-style "seq" mode
i am actually using exp(-exp(w))
seems it's good to precompute embedding+emb_layernorm in bf16
when using fp16, i am doing /2 every 6 layers, to avoid overflow

KerfuffleV2 · 2023-11-02T07:48:16Z

Not sure if it helps, but I have a GGML-based Rust implementation here: https://github.com/KerfuffleV2/smolrsrwkv/blob/main/smolrwkv/src/ggml/graph.rs (that's just v4 inference)

This is actually the reason I made my first contribution to the project, trying to get the map ops (now superseded) to work around what GGML didn't support. I think that's mostly still the case, so the majority of these will probably still need to use custom mapping: https://github.com/KerfuffleV2/smolrsrwkv/blob/main/smolrwkv/src/ggml/map_ops.rs (the one_minus one is mainly just an optimization).

saharNooby · 2023-11-02T09:39:00Z

Hi all! Maintainer of rwkv.cpp here.

Indeed, having a separate repository for RWKV leads to ggml version lag, lack of computation backends that I can't commit to support with my limited time, and other issues.

That said, I like compactness and simplicity of rwkv.cpp repository; huge repos like llama.cpp with 10K+ lines C++ files scare me; though this is a subjective preference. I would not be able to commit supporting RWKV implementation in llama.cpp repo.

In the end, users will decide :)

On a more practical note:

If support for RWKV will be added into llama.cpp, I also suggest implementing conversion script for handling model files in rwkv.cpp format. The format is documented here. There are models hosted in Hugging Face in this format -- for example, here. kobold.cpp also supports this format.

Furthermore, if support for both RWKV v4 and RWKV v5 is implemented in llama.cpp, including conversion from rwkv.cpp format; and there is a reasonable commitment from maintainers of llama.cpp to fix bugs/add new versions of RWKV, I will be OK to mark rwkv.cpp as deprecated, add a link llama.cpp and stop maintaining the repo.

Until then, my plan is to continue support rwkv.cpp, including adding RWKV v5 support sometime later.

I won't be able to help with migrating rwkv.cpp code to llama.cpp, but of course anyone is free to use rwkv.cpp as a reference (or even copy-paste code -- not sure how licensing works).

ggerganov · 2023-11-02T12:40:00Z

Hi @saharNooby - great work with rwkv.cpp

I'm mainly interested to see what would llama.cpp need in order to add support for a new arch that is more different compared to what we are used to. It turned out that all the LLMs that we support so far are pretty much 99% the same thing with bias here and norm there. So I'm not sure how well the framework would accommodate a model that is fundamentally different, assuming RWKV is one (I haven't even looked in the details, so I don't really know if this statement is true).

I'm looking forward to contributions as I doubt I will have the time to implement it myself. So we will have to see if RWKV support will end up in llama.cpp at all. In any case, it's too early and definitely do not deprecate rwkv.cpp at this point.

Alternatively, we should also look for other LLM architectures that would present some sort of a challenge and try to integrate them as well, in the same spirit to understand what llama.cpp needs to be more general-purpose.

saharNooby · 2023-11-02T15:43:02Z

what would llama.cpp need in order to add support for a new arch that is more different compared to what we are used to

Regarding ggml: for a long time rwkv.cpp have used vanilla ggml, and only recently ggml was forked and a crutch was added to support very large cgraphs: Increase GGML_MAX_NODES from 4096 to 80000. But looks like you've recently removed this node limit altogether. Overall, I don't expect any changes will be required to ggml in order to support RWKV.

Regarding llama.cpp file: looks like I got what you mean -- supporting a new architecture in the file and surrounding infra (scripts, etc.) can indeed be difficult. Can't comment on that :)

that is fundamentally different, assuming RWKV is one

The only difference is that Attention was replaced with WKV, which can be computed in recurrent manner. Everything else -- layer structure, MLP, embed/unembed are same as in Transformers. Some early versions of RWKV even use the popular 20B_tokenizer; although later ones use custom World tokenizer which would need to be implemented (it's simple, does not even require Unicode normalization).

definitely do not deprecate rwkv.cpp at this point

Yep!

BlinkDL · 2023-11-02T19:33:56Z

I'm mainly interested to see what would llama.cpp need in order to add support for a new arch that is more different compared to what we are used to. It turned out that all the LLMs that we support so far are pretty much 99% the same thing with bias here and norm there. So I'm not sure how well the framework would accommodate a model that is fundamentally different, assuming RWKV is one (I haven't even looked in the details, so I don't really know if this statement is true).

the real difference is RWKV (and other "linear attention" models) uses a fixed-size state instead of a growing kv cache :)

so it's like:

output, new_state = model.forward(input, current_state)

and you can clone & save states, to make a "state cache" for various inputs to accelerate inference.

BlinkDL · 2023-11-02T21:09:26Z

RWKV v4 in 100 lines (using numpy): https://johanwind.github.io/2023/03/23/rwkv_details.html

another blogpost: https://fullstackdeeplearning.com/blog/posts/rwkv-explainer/

v4 details: https://ben.bolte.cc/rwkv-model

RWKV zoom talk (TUE, NOV 7 · 9:30 AM CST): https://www.meetup.com/silicon-valley-generative-ai/events/296395124/

RWKV sf meet (Saturday, Nov 11 1:00pm PT): https://partiful.com/e/bi6lGCvZXCzZQNN5FjXW

Cyberhan123 · 2023-11-03T09:47:39Z

I'm excited to see rwkv's progress, I love this model.

KerfuffleV2 · 2023-11-13T09:07:00Z

Is there a way to make RWKV's state stuff fit in with the current concept of sequences and KV cache manipulation? Can you do parallel generation with multiple independent sequences?

KerfuffleV2 · 2023-11-21T15:05:11Z

If it's helpful, I asked some questions in the RWKV discord:

[2:06 AM] Kerfuffle: This might be a pretty dumb question, but just thinking about how RWKV could fit into llama.cpp. Probably the biggest thing is figuring out how it can work with llama.cpp's idea of batches and sequences and parallel generation.
When doing generation, the API lets you add items to the batch, each one has: token id, sequence id, and position in the sequence. Then you call decode and it can run decode on all the items in the batch in parallel.
The API also includes KV cache manipulation stuff, so for example you can undo generation of the last N tokens and that kind of thing.
So now the actual question: Can you evaluate multiple independent sequences in parallel with RWKV? And also, can you edit the state kind of like the KV cache stuff when you are able to do something like remove some previously generated tokens from it?

[3:12 AM] Tomeno: you can run rwkv in parallel, but you can't edit the state like that - what you can do though is save and roll back to previous versions of the state cheaply

[3:20 AM] Kerfuffle: Thanks for the answer. Is there a way to save/roll back the state just for specific sequences when doing parallel generation?

[3:30 AM] Tomeno: well, i should say, save and load the state - the state is a "compressed" version of the entire context/sequence up to that point

[3:45 AM] Tomeno: so no, once it's processed, you can't separate the tokens that went into it

[3:46 AM] Tomeno: what you could do is something like save the state after every reply of a chatbot, and then you could load any point in that conversation back up and continue from there

[3:47 AM] Tomeno: or save a number of states to disk and load them back up at any time, no matter how long the input sequence was, the state is about the same size

[3:52 AM] Kerfuffle: Thanks again. I guess the main issue is keeping the state of sequences separate which I guess actually isn't possible.

[3:53 AM] Kerfuffle: Seems like it would be really hard to fit RWKV into llama.cpp as an alternative model architecture.

[4:17 AM] Kerfuffle: I feel like there's got to be a way to do separate sequences in general otherwise it's a HUGE strike against RWKV. Just for example, suppose I have an RWKV model that works as well as ChatGPT. I want to set up a website where people can query it. A service like that requires submitting queries in huge batches, doing a completely separate decode for each individual user just wouldn't work.

[4:20 AM] Tomeno: oh wait, i misunderstood what you meant

[4:20 AM] Tomeno: when you process multiple sequences in parallel, each of them has its own associated state

[4:21 AM] Tomeno: put very simply, the input to rwkv is state + next token

[4:23 AM] Kerfuffle: Ah, okay, good. Yeah, I have a vague idea of how it probably works then.

[4:23 AM] Tomeno: i thought when you wrote "roll back the state for specific sequences" you meant, like, take out a set of tokens from the context

[4:23 AM] Kerfuffle: You could just let each sequence have its own state and somehow do the calculation so the correct state is involved for each sequence.

[4:23 AM] Kerfuffle: You were correct. :) I was actually asking about both things.

[4:24 AM] Kerfuffle: I'm just generally trying to figure out how practical it is (or practical within my capabilities) to try to add RWKV support to llama.cpp

[4:24 AM] Tomeno: there were some demos of parallel inference posted recently though i have no idea how to find it

[4:25 AM] Kerfuffle: Well, the first step is knowing it's even possible, so that definitely helps.

[4:26 AM] Mathmagician: I think web-rwkv lets you inference multiple sequences in parallel

This is the web-rwkv implementation that was mentioned: https://github.com/cryscan/web-rwkv/

From that conversion, it seems like parallel generation wouldn't be too much of a problem. Howevever KV editing operations like rewinding or whatever seem like they would be extremely difficult. Tomeno mentioned saving the RWKV sequence state per token, which may be possible but I'm guessing the per token state is going to be too large to really make that practical. So I think the only way it could really work with how llama.cpp's KV cache manipulation ops work is to only allow completely clearing a sequence and nothing else.

On an unrelated note, a WebGPU backend seems like an interesting idea... web-rwkv uses WebGPU as its GPU backend. It actually ran pretty fast for me when I tried the example, and it probably would be possible to interface with the Rust wgpu crate from C++.

BlinkDL · 2023-11-21T17:09:31Z

you can save RWKV state per n tokens. and you can save them to ram / hd.

KerfuffleV2 · 2023-11-21T18:17:29Z

you can save RWKV state per n tokens. and you can save them to ram / hd.

I'm looking at it from the perspective of how it can be integrated into llama.cpp existing architectures. How big is the state? For 3B World5 is it 2560x2560?

BlinkDL · 2023-11-22T03:37:06Z

(2+64)*2560 numbers for each block

32*(2+64)*2560 numbers for full model

19h · 2024-01-29T12:57:46Z

There's been renewed progress in the RWKV space with Eagle-7b: https://blog.rwkv.com/p/eagle-7b-soaring-past-transformers.

sorasoras · 2024-03-11T11:49:55Z

RWKV should reconsider to implement on llama cpp given recent merge of MAMBA SSM.

compilade · 2024-03-13T22:37:07Z

RWKV should reconsider to implement on llama cpp given recent merge of MAMBA SSM.

If nobody else does it, I'll have time to work on RWKV in llama.cpp starting in May (in a month and a half).

Mamba took me a bit more than a month to implement in llama.cpp (but basic inference (with --batch-size 1) had been working after the first week). I expect RWKV will be slightly easier to implement since part of the work has already been thought through (KV cache API compatibility with recurrent models). It would be nice to make simultaneous state processing with recurrent models not require a custom ggml operator for each state type, though. I'll think about ways to make it simpler when I'll get to it.

If anyone reading this is interested in working on this before I have more time, feel free to go ahead.

LaylBongers · 2024-03-21T13:24:35Z

I've been taking up the task of implementing support for the RWKV 5 architecture. I've had some issues getting the included python conversion code adapted for RWKV, however. Of course, this is the first step to getting RWKV working.
I've been working on a conversion tool this week that I'll likely be publishing soon, after which I'll start implementing the architecture within llama.cpp. I'll keep everyone up to date as I'm working on it.

hiepxanh · 2024-03-21T14:49:03Z

Great to know that 🥰🥰🥰

BlinkDL · 2024-03-29T09:11:00Z

please try the much stronger v6.0-world 2.1 model :) design similar to v5. 1b6 done, 3b 7b soon

https://huggingface.co/spaces/BlinkDL/RWKV-Gradio-1

https://twitter.com/BlinkDL_AI/status/1773503808221712722

@LaylBongers

The difference between v6 and v5:
https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_v6_demo.py
vs
https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_v5_demo.py

LaylBongers · 2024-03-31T13:41:39Z

Over easter we've got a long weekend here, but I figured I'd give a few updates on my work on this:

I had issues adapting the python conversion code, so I've created and published a toolset to handle RWKV conversion for now. A bit too early and unverified to use just yet practically, but it's on the recursal org. I'll be using this as the basis for RWKV GGUF testing.
I've cloned the repo and started hacking away at it to add support, not much progress there yet.

On RWKV v6, I hadn't seen that demo yet! It looks straightforward to add both once one of the two is working.

LaylBongers · 2024-04-05T12:16:08Z

I got the tokenizer in and functional so far, working with the "tokenize" example. I'm considering submitting the tokenizer by itself as a small PR, to reduce review load, any thoughts on this?

ggerganov · 2024-04-05T18:48:10Z

Either way would be fine - the tokenizer alone might not be useful for anything else other than RWKV, so no point in merging it alone

LaylBongers · 2024-04-18T08:39:20Z

I'm hitting some issues with the vk cache initialization, taking this moment to update on the work done so far.

WIP code available here: https://github.com/RWKV/llama.cpp
Containing right now just the tokenizer, and an attempt at placeholder model loading and graph initialization.

This can be tested using a partial generated GGUF over here, generated using gguf-swiss:
https://huggingface.co/LaylBongers/temp-rwkvgguf-partial/tree/main

Currently I'm having some issues tracking down an initialization issue:

ggml_backend_alloc_ctx_tensors_from_buft: all tensors in the context are already allocated
llama_kv_cache_init: failed to allocate buffer for kv cache
llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache

compilade · 2024-04-18T14:24:57Z

I'm hitting some issues with the vk cache initialization

The KV cache for recurrent models is sized from the GGUF metadata keys {model}.ssm.state_size, {model}.ssm.inner_size, and {model}.ssm.kernel_size. These get read into hparams.ssm_d_state, hparams.ssm_d_inner and hparams.ssm_d_conv, respectively.

The following are used to size the kv_self.k_l and kv_self.v_l tensors for recurrent models:

llama.cpp/llama.cpp

Lines 1865 to 1875 in 0d56246

    
           uint32_t n_embd_k_s() const { // dimension of the rolling state embeddings 
        
               // corresponds to Mamba's conv_states size 
        
               // TODO: maybe support other convolution strides than 1 
        
               // NOTE: since the first column of the conv_state is shifted out each time, it's not actually needed 
        
               return (ssm_d_conv > 0 ? ssm_d_conv - 1 : 0) * ssm_d_inner; 
        
           } 
        
           uint32_t n_embd_v_s() const { // dimension of the recurrent state embeddings 
        
               // corresponds to Mamba's ssm_states size 
        
               return ssm_d_state * ssm_d_inner; 
        
           }

If RWKV uses 2 different recurrent states (e.g. one for time mix and the other for channel mix, though I'm not yet sure how they are used), it might be useful to add a new metadata key for the stride of the convolution and make it 0 for RWKV (possibly called {model}.ssm.conv_stride). Otherwise, if only a single recurrent state is required, it should be enough to only use {model}.ssm.state_size and {model}.ssm.inner_size and the v_l tensors. I'd like to make it less Mamba-centric, and re-using metadata keys across RWKV and Mamba could achieve this, though it might make hybrids of the two harder in the future (though such hybrids don't seem likely, I think?).

Re-using k_l and v_l for recurrent states isn't ideal and will be changed soon-ish (work-in-progress at master...compilade/refactor-kv-cache, which will be advancing further once I find more free time) to support hybrid recurrent Transformer models, and so recurrent models will be identified by their use of the relevant metadata keys for the recurrent state size. Parallel sequence management for recurrent models is also slightly simpler in that branch. This is a preview of what is coming next month.

LaylBongers · 2024-05-05T21:45:18Z

Another update; thank for the notes! I've resolved initial crash issues on initialization, though mostly with hacky temporary placeholders (like re-using ssm scope keys). I'll put up a new version of the temporary GGUF file on Monday. The remainder of the work to be done now is to fill in the rest of the network graph, link it up with the KV cache hack for tracking state, and then start handling all the individual hacks one by one.

BlinkDL · 2024-05-27T19:51:54Z

More reference:
https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v5/rwkv_v6_demo.py
https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_v6_demo_cuda_bf16.py

BlinkDL · 2024-06-15T09:45:08Z

I got the tokenizer in and functional so far, working with the "tokenize" example. I'm considering submitting the tokenizer by itself as a small PR, to reduce review load, any thoughts on this?

please check the unit tests in https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py (v.s. reference tokenizer)
and please verify the binary length of each token (must equal the number at the end of each line)

BlinkDL · 2024-07-08T10:36:45Z

https://github.com/RWKV/rwkv.cpp supports v6 now

MoonRide303 · 2024-09-02T07:04:27Z

Conversion and quantization using b3651 worked fine (src HF model: https://huggingface.co/RWKV/v6-Finch-7B-HF).

Conversation (using llama-server) initially produced some output, but on 6 attempts it crashed 3 times after 1st or 2nd message - ending up with

llama.cpp:3628: GGML_ASSERT(cell.has_seq_id(seq_id)) failed

It doesn't look like fully supported / working, yet.

MollySophia · 2024-09-02T07:10:09Z

Conversion and quantization using b3651 worked fine (src HF model: https://huggingface.co/RWKV/v6-Finch-7B-HF).

Conversation (using llama-server) initially produced some output, but on 6 attempts it crashed 3 times after 1st or 2nd message - ending up with
llama.cpp:3628: GGML_ASSERT(cell.has_seq_id(seq_id)) failed
It doesn't look like fully supported / working, yet.

Thanks for your testing.
I'll try to reproduce this and see what's wrong later.

compilade · 2024-09-02T14:23:47Z

@MoonRide303 @MollySophia This should be fixed in #9249

MoonRide303 · 2024-09-02T21:07:08Z

@MoonRide303 @MollySophia This should be fixed in #9249

I briefly tested Q6_K quant of Finch 7B using llama-server b3658 - seems to be okay (no longer crashing).

SinanAkkoyun · 2024-09-05T10:50:15Z

What tps speeds are you getting on a GPU?

someone13574 · 2024-12-10T21:48:32Z

Would rwkv-7 support be possible in the future given that we now have a model release? https://huggingface.co/BlinkDL/rwkv-7-world/tree/main

MollySophia · 2024-12-11T05:52:50Z

Would rwkv-7 support be possible in the future given that we now have a model release? https://huggingface.co/BlinkDL/rwkv-7-world/tree/main

Absolutely! We will work on that soon.
RWKV 7 looks amazing to me.

someone13574 · 2024-12-11T14:40:16Z

There is also a 32b model converted from qwen-2.5-32b, based on rwkv-6 which just released: https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1/blob/main/modeling_rwkv6qwen2.py.

The modelling code just takes qwen's code and replaces it's attention with rwkv-6's attention.

Edit:

It appears that the code for the rwkv-attention used in the qrwkv6 is slightly different from the one used in rwkv6. In the modeling code on huggingface, it ends up calling this kernel instead of this kernel, which means qrwkv6 doesn't have the time_faaaa tensor.

SlyEcho mentioned this issue Apr 10, 2023

What would it take to 100x the context window? #799

Closed

Green-Sky added the model Model specific label Apr 21, 2023

Green-Sky closed this as completed Apr 21, 2023

ggerganov changed the title ~~[Feature request] Add RWKV models support~~ llm : add RWKV models support Nov 1, 2023

ggerganov added this to ggml : roadmap Nov 1, 2023

ggerganov moved this to Todo in ggml : roadmap Nov 1, 2023

ggerganov added help wanted Extra attention is needed good first issue Good for newcomers labels Nov 1, 2023

ggerganov reopened this Nov 1, 2023

ggerganov changed the title ~~llm : add RWKV models support~~ llama : add RWKV models support Nov 1, 2023

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

llama: fix exception in Llama.__del__ (ggerganov#846)

eefd76f

easp mentioned this issue Dec 21, 2023

Add support for RWKV ollama/ollama#1612

Open

BlinkDL mentioned this issue Mar 26, 2024

RWKV 5 supported vLLM？LMdeploy？TGI？Fastllm？FasterTransformer？ BlinkDL/RWKV-LM#232

Open

LaylBongers mentioned this issue Apr 3, 2024

Support partial unicode codepoints (raw bytes) in tokens #6462

Closed

4 tasks

MollySophia mentioned this issue Aug 11, 2024

llama : support RWKV v6 models #8980

Merged

2 tasks

ggerganov closed this as completed in #8980 Sep 1, 2024

ggerganov moved this from Todo to Done in ggml : roadmap Sep 2, 2024

llama : add RWKV models support #846

llama : add RWKV models support #846

Comments

multimediaconverter commented Apr 8, 2023 • edited by ggerganov Loading

Green-Sky commented Apr 21, 2023

someone13574 commented Nov 1, 2023 • edited Loading

ggerganov commented Nov 1, 2023

BlinkDL commented Nov 1, 2023 • edited Loading

KerfuffleV2 commented Nov 2, 2023

saharNooby commented Nov 2, 2023 • edited Loading

ggerganov commented Nov 2, 2023

saharNooby commented Nov 2, 2023

BlinkDL commented Nov 2, 2023 • edited Loading

BlinkDL commented Nov 2, 2023 • edited Loading

Cyberhan123 commented Nov 3, 2023

KerfuffleV2 commented Nov 13, 2023

KerfuffleV2 commented Nov 21, 2023 • edited Loading

BlinkDL commented Nov 21, 2023

KerfuffleV2 commented Nov 21, 2023

BlinkDL commented Nov 22, 2023

19h commented Jan 29, 2024

sorasoras commented Mar 11, 2024

compilade commented Mar 13, 2024

LaylBongers commented Mar 21, 2024

hiepxanh commented Mar 21, 2024

BlinkDL commented Mar 29, 2024 • edited Loading

LaylBongers commented Mar 31, 2024

LaylBongers commented Apr 5, 2024

ggerganov commented Apr 5, 2024

LaylBongers commented Apr 18, 2024

compilade commented Apr 18, 2024 • edited Loading

LaylBongers commented May 5, 2024

BlinkDL commented May 27, 2024

BlinkDL commented Jun 15, 2024

BlinkDL commented Jul 8, 2024

MoonRide303 commented Sep 2, 2024

MollySophia commented Sep 2, 2024

compilade commented Sep 2, 2024

MoonRide303 commented Sep 2, 2024

SinanAkkoyun commented Sep 5, 2024

someone13574 commented Dec 10, 2024

MollySophia commented Dec 11, 2024

someone13574 commented Dec 11, 2024 • edited Loading

multimediaconverter commented Apr 8, 2023 •

edited by ggerganov

Loading

someone13574 commented Nov 1, 2023 •

edited

Loading

BlinkDL commented Nov 1, 2023 •

edited

Loading

saharNooby commented Nov 2, 2023 •

edited

Loading

BlinkDL commented Nov 2, 2023 •

edited

Loading

BlinkDL commented Nov 2, 2023 •

edited

Loading

KerfuffleV2 commented Nov 21, 2023 •

edited

Loading

BlinkDL commented Mar 29, 2024 •

edited

Loading

compilade commented Apr 18, 2024 •

edited

Loading

someone13574 commented Dec 11, 2024 •

edited

Loading