Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support StableLM From StabilityAI #1063

Closed
MarkSchmidty opened this issue Apr 19, 2023 · 20 comments
Closed

Support StableLM From StabilityAI #1063

MarkSchmidty opened this issue Apr 19, 2023 · 20 comments
Labels
enhancement New feature or request help wanted Extra attention is needed model Model specific

Comments

@MarkSchmidty
Copy link

MarkSchmidty commented Apr 19, 2023

Blog Post Announcement
(It may be using the same architecture as GPT-NeoX)

Launch In Colab

GitHub Repo

In case these links 404 due to being posted early by accident:
https://archive.is/ZQszO
https://archive.ph/U0Pr8

(Checkpoint links are Hugging Face repos with model weights)

Size StableLM-Base-Alpha StableLM-Tuned-Alpha Training Tokens [in progress] Context Window Web Demo
3B checkpoint checkpoint 800B [1.5T]* 4096
7B checkpoint checkpoint 800B [1.5T]* 4096 HuggingFace
15B (in progress) (pending) 1.5T*
30B (in progress) (pending) 1.5T*
65B (in progress) (pending) 1.5T*
175B (planned)

*3T Planned

@Green-Sky Green-Sky added enhancement New feature or request model Model specific labels Apr 19, 2023
@Green-Sky
Copy link
Collaborator

are they just new GPT-NeoX models? or did they forget to update the model cards on hf ? 😄

@Green-Sky
Copy link
Collaborator

related ggerganov/ggml#10

@jessejohnson
Copy link
Contributor

jessejohnson commented Apr 19, 2023

This was quick! 😅

They've included a bit in the ReadMe indicating that compatibility with llama.cpp is actively desired. :)

EDIT: related HN thread https://news.ycombinator.com/item?id=35629127

@NoNamedCat
Copy link

This models will be compatible with llama.cpp?

@rabidcopy
Copy link
Contributor

Definitely interested in this. Interesting that they specifically highlight wanting llama.cpp/ggml support.

@rabidcopy
Copy link
Contributor

If it really is GPT NeoX, this repo has conversion, quantization, and support for basic inference for GPT NeoX and other model formats. https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/converters/convert_gptneox_to_ggml.py https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/quantize_gptneox.cpp

@ggerganov
Copy link
Owner

Here is a very quick and dirty implementation using ggml:

ggerganov/ggml#96

Also, found a bug in multi-threaded ggml_cpy():

https://github.com/ggerganov/ggml/pull/96/files#diff-b4a500ab2765c31526c5541f3e51e21e46990b87d9774cac6f3089db315bdc5bR5655-R5660

@acheong08
Copy link

are they just new GPT-NeoX models? or did they forget to update the model cards on hf ? smile

Is it?

@MarkSchmidty
Copy link
Author

MarkSchmidty commented Apr 20, 2023

Yes it's using GPT-NeoX architecture. The model details can be seen here: https://github.com/Stability-AI/StableLM/blob/main/configs/stablelm-base-alpha-7b.yaml

  # model settings
  "num-layers": 16,
  "hidden-size": 6144,
  "num-attention-heads": 48,
  "seq-length": 4096,
  "max-position-embeddings": 4096,

  # architecture design
  "norm": "layernorm",
  "pos-emb": "rotary",
  "rotary_pct": 0.25,
  "activation": "gelu",
  "no-weight-tying": true,
  "gpt_j_residual": true,
  "output_layer_parallelism": "column",

@prusnak prusnak changed the title ⭐ Support StableLM From StabilityAI Support StableLM From StabilityAI Apr 20, 2023
@ggerganov
Copy link
Owner

Merged in ggml: https://github.com/ggerganov/ggml/tree/master/examples/stablelm

@mhkhung
Copy link

mhkhung commented Apr 21, 2023

The q4_x files output from ggml are not compatible with llama.cpp?

@fgdfgfthgr-fox
Copy link

The q4_x files output from ggml are not compatible with llama.cpp?

It seems so currently.

@magicrobotmonkey
Copy link

I've converted/quantized stablelm-tuned-alpha-7b to Q4_3 and it works great with ggml, but llama.cpp throws error loading model: missing tok_embeddings.weight, seems like some support is missing.

@AndreiSva
Copy link

I am getting the same error

@mikeggh
Copy link

mikeggh commented Apr 24, 2023

Are you using the specific binary for stablelm? It seems separated from the looks of it in https://github.com/ggerganov/ggml/tree/master/examples/stablelm

@wkkautas
Copy link

Are there plan to integrate ggml/examples/stablelm into llama.cpp?
Also it would be great if a single llama.cpp binary is able to use also gpt-2 and gpt-j.

@ggerganov
Copy link
Owner

ggerganov commented Apr 27, 2023

There seems to be a bug in the existing StableLM implementation in ggml.
See the updated README for more details:

https://github.com/ggerganov/ggml/tree/master/examples/stablelm#warning

Best way to fix this is to compare outputs with the reference implementation.
Any help will be appreciated.

@ggerganov ggerganov added the help wanted Extra attention is needed label Apr 27, 2023
@ggerganov
Copy link
Owner

ggerganov commented Apr 28, 2023

So, I ran the HF transformers implementation and I observe the same "increasing magnitude" behaviour as in the ggml implementation.

To do this, I changed the following line:

https://github.com/huggingface/transformers/blob/c2c99dc7ef5edab8f7674a1eb00cf6ac6996fd0f/src/transformers/models/gpt_neox/modeling_gpt_neox.py#L234

to:

        print(attn_scores);
        attn_weights = nn.functional.softmax(attn_scores, dim=-1)

Here is the output log from a sample run:

softmax-stablelm.txt

For comparison, here is running GPT-2 using HF transformers with the same change:

softmax-gpt-2.txt

Notice how the GPT-2 values are all well below 1e1 for each layer, while the StableLM numbers jump all the way up to 1e3.
The GPT-2 behaviour is also observed for GPT-J and LLaMA models (these are the models that I currently play with the most). To me, it kind of makes sense to be this way and it seems to be correct, while the StableLM numbers are weird.


So is my understanding incorrect or is there something wrong with the StableLM model?
In any case, I no longer think there is a bug in the ggml implementation.

@byroneverson
Copy link

I believe this behavior is correct and is a result of how the models were trained. The text output seems to be coherent and the values only rarely converge to -inf. I may be out of line, but is it possible this is normal? I will continue to look further into this but I doubt softmax would work at all if this was a major issue. If you have any further insight I would love to dive deeper.

@ggerganov
Copy link
Owner

is it possible this is normal?

Absolutely. It's just my intuitive understanding that the scaling before the soft max layer has the purpose of preventing exactly this kind of magnitude increase. But I could be wrong and this is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed model Model specific
Projects
None yet
Development

No branches or pull requests