The table below includes the checkpoints for each models:
Model | Quantization | CKPT | WikiText | ARC-C | Hellaswag | MMLU |
---|---|---|---|---|---|---|
TinyLlaMA-1.1B-v1.0-Chat | W8A8 | ckpt | 15.5 | 31.9 | 59.2 | 25.0 |
TinyLlaMA-1.1B-v1.0-Chat | W4A8 | ckpt | 17.1 | 32.3 | 57.0 | 25.5 |
StableLM-2-1.6B | W8A8 | ckpt | 29.7 | 37.1 | 63.6 | 30.0 |
StableLM-2-1.6B | W4A8 | ckpt | 33.6 | 35.6 | 60.5 | 24.1 |
Gemma-2B | W8A8 | ckpt | 20.3 | 21.8 | 40.9 | 25.8 |
Gemma-2B | W4A8 | ckpt | 21.4 | 23.0 | 38.9 | 25.6 |
- Download the checkpoint
CUDA_VISIBLE_DEVICES=0 python eval/harness_eval.py \
--tasks wikitext,arc_challenge,hellaswag,hendrycksTest*
--mode custom --hf_path ${CKPT} --output_dir ${OUTPUT_DIR}