Support StableLM From StabilityAI #1063

MarkSchmidty · 2023-04-19T15:48:06Z

Blog Post Announcement
(It may be using the same architecture as GPT-NeoX)

In case these links 404 due to being posted early by accident:
https://archive.is/ZQszO
https://archive.ph/U0Pr8

(Checkpoint links are Hugging Face repos with model weights)

Size	StableLM-Base-Alpha	StableLM-Tuned-Alpha	Training Tokens [in progress]	Context Window	Web Demo
3B	checkpoint	checkpoint	800B [1.5T]*	4096
7B	checkpoint	checkpoint	800B [1.5T]*	4096	HuggingFace
15B	(in progress)	(pending)	1.5T*
30B	(in progress)	(pending)	1.5T*
65B	(in progress)	(pending)	1.5T*
175B	(planned)

*3T Planned

Green-Sky · 2023-04-19T16:19:47Z

are they just new GPT-NeoX models? or did they forget to update the model cards on hf ? 😄

Green-Sky · 2023-04-19T16:22:46Z

related ggerganov/ggml#10

jessejohnson · 2023-04-19T16:23:37Z

This was quick! 😅

They've included a bit in the ReadMe indicating that compatibility with llama.cpp is actively desired. :)

EDIT: related HN thread https://news.ycombinator.com/item?id=35629127

NoNamedCat · 2023-04-19T16:42:56Z

This models will be compatible with llama.cpp?

rabidcopy · 2023-04-19T19:32:42Z

Definitely interested in this. Interesting that they specifically highlight wanting llama.cpp/ggml support.

rabidcopy · 2023-04-19T20:42:17Z

If it really is GPT NeoX, this repo has conversion, quantization, and support for basic inference for GPT NeoX and other model formats. https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/converters/convert_gptneox_to_ggml.py https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/quantize_gptneox.cpp

ggerganov · 2023-04-19T21:51:13Z

Here is a very quick and dirty implementation using ggml:

ggerganov/ggml#96

Also, found a bug in multi-threaded ggml_cpy():

https://github.com/ggerganov/ggml/pull/96/files#diff-b4a500ab2765c31526c5541f3e51e21e46990b87d9774cac6f3089db315bdc5bR5655-R5660

acheong08 · 2023-04-20T00:12:45Z

are they just new GPT-NeoX models? or did they forget to update the model cards on hf ? smile

Is it?

MarkSchmidty · 2023-04-20T00:56:06Z

Yes it's using GPT-NeoX architecture. The model details can be seen here: https://github.com/Stability-AI/StableLM/blob/main/configs/stablelm-base-alpha-7b.yaml

  # model settings
  "num-layers": 16,
  "hidden-size": 6144,
  "num-attention-heads": 48,
  "seq-length": 4096,
  "max-position-embeddings": 4096,

  # architecture design
  "norm": "layernorm",
  "pos-emb": "rotary",
  "rotary_pct": 0.25,
  "activation": "gelu",
  "no-weight-tying": true,
  "gpt_j_residual": true,
  "output_layer_parallelism": "column",

ggerganov · 2023-04-20T20:41:57Z

Merged in ggml: https://github.com/ggerganov/ggml/tree/master/examples/stablelm

mhkhung · 2023-04-21T22:15:35Z

The q4_x files output from ggml are not compatible with llama.cpp?

fgdfgfthgr-fox · 2023-04-22T23:57:34Z

The q4_x files output from ggml are not compatible with llama.cpp?

It seems so currently.

magicrobotmonkey · 2023-04-24T14:38:03Z

I've converted/quantized stablelm-tuned-alpha-7b to Q4_3 and it works great with ggml, but llama.cpp throws error loading model: missing tok_embeddings.weight, seems like some support is missing.

AndreiSva · 2023-04-24T19:56:36Z

I am getting the same error

mikeggh · 2023-04-24T20:33:48Z

Are you using the specific binary for stablelm? It seems separated from the looks of it in https://github.com/ggerganov/ggml/tree/master/examples/stablelm

wkkautas · 2023-04-25T05:11:10Z

Are there plan to integrate ggml/examples/stablelm into llama.cpp?
Also it would be great if a single llama.cpp binary is able to use also gpt-2 and gpt-j.

ggerganov · 2023-04-27T16:13:02Z

There seems to be a bug in the existing StableLM implementation in ggml.
See the updated README for more details:

https://github.com/ggerganov/ggml/tree/master/examples/stablelm#warning

Best way to fix this is to compare outputs with the reference implementation.
Any help will be appreciated.

ggerganov · 2023-04-28T15:23:49Z

So, I ran the HF transformers implementation and I observe the same "increasing magnitude" behaviour as in the ggml implementation.

To do this, I changed the following line:

https://github.com/huggingface/transformers/blob/c2c99dc7ef5edab8f7674a1eb00cf6ac6996fd0f/src/transformers/models/gpt_neox/modeling_gpt_neox.py#L234

to:

        print(attn_scores);
        attn_weights = nn.functional.softmax(attn_scores, dim=-1)

Here is the output log from a sample run:

softmax-stablelm.txt

For comparison, here is running GPT-2 using HF transformers with the same change:

softmax-gpt-2.txt

Notice how the GPT-2 values are all well below 1e1 for each layer, while the StableLM numbers jump all the way up to 1e3.
The GPT-2 behaviour is also observed for GPT-J and LLaMA models (these are the models that I currently play with the most). To me, it kind of makes sense to be this way and it seems to be correct, while the StableLM numbers are weird.

So is my understanding incorrect or is there something wrong with the StableLM model?
In any case, I no longer think there is a bug in the ggml implementation.

byroneverson · 2023-04-29T09:00:03Z

I believe this behavior is correct and is a result of how the models were trained. The text output seems to be coherent and the values only rarely converge to -inf. I may be out of line, but is it possible this is normal? I will continue to look further into this but I doubt softmax would work at all if this was a major issue. If you have any further insight I would love to dive deeper.

ggerganov · 2023-04-29T18:58:54Z

is it possible this is normal?

Absolutely. It's just my intuitive understanding that the scaling before the soft max layer has the purpose of preventing exactly this kind of magnitude increase. But I could be wrong and this is fine.

Green-Sky added enhancement New feature or request model Model specific labels Apr 19, 2023

philpax mentioned this issue Apr 20, 2023

Support for StableLM rustformers/llm#146

Closed

prusnak changed the title ~~⭐ Support StableLM From StabilityAI~~ Support StableLM From StabilityAI Apr 20, 2023

ggerganov added the help wanted Extra attention is needed label Apr 27, 2023

ggerganov mentioned this issue May 2, 2023

bug in stablelm implementation ggerganov/ggml#125

Closed

digiwombat mentioned this issue May 7, 2023

Set n_ctx for llama.cpp models when loading/reloading oobabooga/text-generation-webui#1872

Closed

ggerganov closed this as completed Jul 28, 2023

azulika mentioned this issue Sep 28, 2023

Add JapaneseStableLM Support #3373

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support StableLM From StabilityAI #1063

Support StableLM From StabilityAI #1063

MarkSchmidty commented Apr 19, 2023 •

edited

Loading

Green-Sky commented Apr 19, 2023

Green-Sky commented Apr 19, 2023

jessejohnson commented Apr 19, 2023 •

edited

Loading

NoNamedCat commented Apr 19, 2023

rabidcopy commented Apr 19, 2023

rabidcopy commented Apr 19, 2023

ggerganov commented Apr 19, 2023

acheong08 commented Apr 20, 2023

MarkSchmidty commented Apr 20, 2023 •

edited

Loading

ggerganov commented Apr 20, 2023

mhkhung commented Apr 21, 2023

fgdfgfthgr-fox commented Apr 22, 2023

magicrobotmonkey commented Apr 24, 2023

AndreiSva commented Apr 24, 2023

mikeggh commented Apr 24, 2023

wkkautas commented Apr 25, 2023

ggerganov commented Apr 27, 2023 •

edited

Loading

ggerganov commented Apr 28, 2023 •

edited

Loading

byroneverson commented Apr 29, 2023

ggerganov commented Apr 29, 2023

Support StableLM From StabilityAI #1063

Support StableLM From StabilityAI #1063

Comments

MarkSchmidty commented Apr 19, 2023 • edited Loading

Green-Sky commented Apr 19, 2023

Green-Sky commented Apr 19, 2023

jessejohnson commented Apr 19, 2023 • edited Loading

NoNamedCat commented Apr 19, 2023

rabidcopy commented Apr 19, 2023

rabidcopy commented Apr 19, 2023

ggerganov commented Apr 19, 2023

acheong08 commented Apr 20, 2023

MarkSchmidty commented Apr 20, 2023 • edited Loading

ggerganov commented Apr 20, 2023

mhkhung commented Apr 21, 2023

fgdfgfthgr-fox commented Apr 22, 2023

magicrobotmonkey commented Apr 24, 2023

AndreiSva commented Apr 24, 2023

mikeggh commented Apr 24, 2023

wkkautas commented Apr 25, 2023

ggerganov commented Apr 27, 2023 • edited Loading

ggerganov commented Apr 28, 2023 • edited Loading

byroneverson commented Apr 29, 2023

ggerganov commented Apr 29, 2023

MarkSchmidty commented Apr 19, 2023 •

edited

Loading

jessejohnson commented Apr 19, 2023 •

edited

Loading

MarkSchmidty commented Apr 20, 2023 •

edited

Loading

ggerganov commented Apr 27, 2023 •

edited

Loading

ggerganov commented Apr 28, 2023 •

edited

Loading