config.json for 4bit quantized files #1038

philwee · 2023-04-18T05:59:26Z

I've been able to convert files from HF format to f16 and 4bit, but I've not been able to figure out what config.json (or what changes to the config.json) to use when attempting to evaluate 4 bit quantized models. (Which is needed when trying to evaluate models)

I've tried to just change the torch_dtype to int4, but that doesn't seem to work.

For context, this is that the current config.json for llama 13b looks like:

{
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13824,
  "max_position_embeddings": 2048,
  "max_sequence_length": 2048,
  "model_type": "llama",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-06,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.28.0.dev0",
  "use_cache": true,
  "vocab_size": 32000
}

I would really appreciate any help. Thank you!

The text was updated successfully, but these errors were encountered:

philwee · 2023-04-25T16:53:34Z

Update: turns out lm evals does not support 4 bit quantizes yet, will need to work on that first (or if anyone wants to work on it, please see: EleutherAI/lm-evaluation-harness#417

github-actions · 2024-04-09T01:10:09Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config.json for 4bit quantized files #1038

config.json for 4bit quantized files #1038

philwee commented Apr 18, 2023 •

edited

Loading

philwee commented Apr 25, 2023

github-actions bot commented Apr 9, 2024

config.json for 4bit quantized files #1038

config.json for 4bit quantized files #1038

Comments

philwee commented Apr 18, 2023 • edited Loading

philwee commented Apr 25, 2023

github-actions bot commented Apr 9, 2024

philwee commented Apr 18, 2023 •

edited

Loading