-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support StableLM From StabilityAI #1063
Comments
are they just new |
related ggerganov/ggml#10 |
This was quick! 😅 They've included a bit in the ReadMe indicating that compatibility with llama.cpp is actively desired. :) EDIT: related HN thread https://news.ycombinator.com/item?id=35629127 |
This models will be compatible with llama.cpp? |
Definitely interested in this. Interesting that they specifically highlight wanting llama.cpp/ggml support. |
If it really is GPT NeoX, this repo has conversion, quantization, and support for basic inference for GPT NeoX and other model formats. https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/converters/convert_gptneox_to_ggml.py https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/quantize_gptneox.cpp |
Here is a very quick and dirty implementation using Also, found a bug in multi-threaded |
Is it? |
Yes it's using GPT-NeoX architecture. The model details can be seen here: https://github.com/Stability-AI/StableLM/blob/main/configs/stablelm-base-alpha-7b.yaml # model settings
"num-layers": 16,
"hidden-size": 6144,
"num-attention-heads": 48,
"seq-length": 4096,
"max-position-embeddings": 4096,
# architecture design
"norm": "layernorm",
"pos-emb": "rotary",
"rotary_pct": 0.25,
"activation": "gelu",
"no-weight-tying": true,
"gpt_j_residual": true,
"output_layer_parallelism": "column", |
The q4_x files output from ggml are not compatible with llama.cpp? |
It seems so currently. |
I've converted/quantized stablelm-tuned-alpha-7b to Q4_3 and it works great with ggml, but llama.cpp throws |
I am getting the same error |
Are you using the specific binary for stablelm? It seems separated from the looks of it in https://github.com/ggerganov/ggml/tree/master/examples/stablelm |
Are there plan to integrate ggml/examples/stablelm into llama.cpp? |
There seems to be a bug in the existing StableLM implementation in https://github.com/ggerganov/ggml/tree/master/examples/stablelm#warning Best way to fix this is to compare outputs with the reference implementation. |
So, I ran the HF transformers implementation and I observe the same "increasing magnitude" behaviour as in the To do this, I changed the following line: to: print(attn_scores);
attn_weights = nn.functional.softmax(attn_scores, dim=-1) Here is the output log from a sample run: For comparison, here is running GPT-2 using HF transformers with the same change: Notice how the GPT-2 values are all well below So is my understanding incorrect or is there something wrong with the StableLM model? |
I believe this behavior is correct and is a result of how the models were trained. The text output seems to be coherent and the values only rarely converge to -inf. I may be out of line, but is it possible this is normal? I will continue to look further into this but I doubt softmax would work at all if this was a major issue. If you have any further insight I would love to dive deeper. |
Absolutely. It's just my intuitive understanding that the scaling before the soft max layer has the purpose of preventing exactly this kind of magnitude increase. But I could be wrong and this is fine. |
Blog Post Announcement
(It may be using the same architecture as GPT-NeoX)
GitHub Repo
In case these links 404 due to being posted early by accident:
https://archive.is/ZQszO
https://archive.ph/U0Pr8
(Checkpoint links are Hugging Face repos with model weights)
*3T Planned
The text was updated successfully, but these errors were encountered: