Model files are big? #6

python273 · 2023-04-19T16:23:39Z

https://huggingface.co/stabilityai/stablelm-base-alpha-3b/tree/main

Looks like 3B is 14.7GB, and if I understand correctly, it's supposed to be f16. Even with f32, it should be about 11.2G. With f16, 5.6G. Am I missing something?

For reference LLaMA 7B (f16) is 12.6G.

upd: I guess it's actually f32. But still seems a little bigger than should be?

jon-tow · 2023-04-19T17:18:04Z

The actual model sizes are:
3B: 3,638,525,952
7B: 7,869,358,080

The fp32 weights are provided to allow users to reduce precision to their needs. We will consider providing the weights in f16 since this is a common complaint :)

Thank you for pointing it out!

python273 · 2023-04-19T17:38:00Z

Ok, the size seems about right then.

# took the size from disk. huggingface shows in / 1000**3
>>> (10_161_140_290+4_656_666_941) / 1024 / 1024 / 1024
13.800158380530775
>>> (3_638_525_952 * 4) / 1024 / 1024 / 1024
13.5545654296875

f16 weights would be nice, to download less stuff

andysalerno · 2023-04-19T23:10:30Z

@jon-tow on this topic, do you expect these models to quantize well down to 4bits (or lower) via GPTQ and/or other quantizing strategies?

I don't see why not, since GPTQ seems to be a general technique that works well for different transformer models. But I'm asking because part of reason behind Stable Diffusion's success is from how well it runs on consumer hardware. So I'm wondering if these models will follow a similar goal, of running very well on consumer hardware, and therefore consider quantization from the very beginning?

jon-tow · 2023-04-20T06:49:48Z

Hi, @andysalerno! I do expect these models to quantize quite well. They're pretty wide, which should help reduce bandwidth boundness compared to models of similar size when quantized.

MarkSchmidty · 2023-04-20T07:37:41Z

There's a 4.9GB ggml 4bit GPTQ quantization for StableLM-7B up on HuggingFace which works in llama.cpp for fast CPU inference.

(For comparison, LLaMA-7B in the same format is 4.1GB. But, StableLM-7B is actually closer to 8B parameters than 7B.)

vvsotnikov · 2023-04-20T10:14:42Z

For the sake of convenience (2x less download size/RAM/VRAM), I've uploaded 16-bit versions of tuned models to HF Hub:
https://huggingface.co/vvsotnikov/stablelm-tuned-alpha-7b-16bit
https://huggingface.co/vvsotnikov/stablelm-tuned-alpha-3b-16bit

iboyles · 2023-04-20T15:07:50Z

Yeah we need a Colab for this stuff that doesn't crash from ram out of memory lol

jrincayc · 2023-04-26T00:32:48Z

There's a 4.9GB ggml 4bit GPTQ quantization for StableLM-7B up on HuggingFace which works in llama.cpp for fast CPU inference.

(For comparison, LLaMA-7B in the same format is 4.1GB. But, StableLM-7B is actually closer to 8B parameters than 7B.)

Hm, how do you actually run this?
I tried https://github.com/ggerganov/llama.cpp ( 4afcc378698e057fcde64e23eb664e5af8dd6956 and also 5addcb120cf2682c7ede0b1c520592700d74c87c )

and got:

./main -m ../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin -p "this is a test"
main: seed = 1682468827
llama.cpp: loading model from ../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin
error loading model: missing tok_embeddings.weight
llama_init_from_file: failed to load model
main: error: failed to load model '../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin'

pratikchhapolika · 2023-05-04T08:11:48Z

Hi @jon-tow @python273 Why we have multiple .bin files inside stabilityai/stablelm-base-alpha-7b? When we load the model which bin file is loaded?

amrrs mentioned this issue Apr 19, 2023

Colab OOM #16

Closed

jon-tow pinned this issue Apr 24, 2023

twmmason closed this as completed Apr 25, 2023

twmmason unpinned this issue Apr 25, 2023

twmmason pinned this issue Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model files are big? #6

Model files are big? #6

python273 commented Apr 19, 2023 •

edited

Loading

jon-tow commented Apr 19, 2023

python273 commented Apr 19, 2023

andysalerno commented Apr 19, 2023

jon-tow commented Apr 20, 2023

MarkSchmidty commented Apr 20, 2023 •

edited

Loading

vvsotnikov commented Apr 20, 2023

iboyles commented Apr 20, 2023

jrincayc commented Apr 26, 2023 •

edited

Loading

pratikchhapolika commented May 4, 2023

Model files are big? #6

Model files are big? #6

Comments

python273 commented Apr 19, 2023 • edited Loading

jon-tow commented Apr 19, 2023

python273 commented Apr 19, 2023

andysalerno commented Apr 19, 2023

jon-tow commented Apr 20, 2023

MarkSchmidty commented Apr 20, 2023 • edited Loading

vvsotnikov commented Apr 20, 2023

iboyles commented Apr 20, 2023

jrincayc commented Apr 26, 2023 • edited Loading

pratikchhapolika commented May 4, 2023

python273 commented Apr 19, 2023 •

edited

Loading

MarkSchmidty commented Apr 20, 2023 •

edited

Loading

jrincayc commented Apr 26, 2023 •

edited

Loading