LLama-3-8B model dumped by LMQuant in 4w8a set raises errors when running e2e benchmark in QServe. #29

Patrick-Lew · 2024-08-12T04:11:37Z

I dumped the quantised llama-3-8B model from LMQuant, using QoQ, the command as follows written in
lmquant/projects/llm/scripts
/qoq.sh
# QoQ (W4A8KV4 with per-channel weight quantization) on Llama3-8B python -m lmquant.llm.run configs/llm.yaml configs/qoq/gchn.yaml --model-name llama3-8b --smooth-xw-alpha 0.05 --smooth-xw-beta 0.95 --smooth-yx-strategy GridSearch --smooth-yx-beta " -2"
and I append --save-model and --model-path to that command.
Then I run the checkpoint_convert.py script in QServe, get the checkpoint model.

then run e2e by this command

but the results are like

I want to know whether this e2e benchmark can only be run on the llama-3-8B-instruct model that you kindly provided in huggingface repo?
I also try running e2e benchmark on other model like llama-3-8B(non-instruct) but it also raises error like above.

Thanks.
Patrick

The text was updated successfully, but these errors were encountered:

ys-2020 · 2024-10-01T01:17:01Z

Hi,

Thanks for your interests in QServe! We would suggest you kindly use instruction-tuned models for the e2e generation for robust outputs. The current conversation template is designed for instruct models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLama-3-8B model dumped by LMQuant in 4w8a set raises errors when running e2e benchmark in QServe. #29

LLama-3-8B model dumped by LMQuant in 4w8a set raises errors when running e2e benchmark in QServe. #29

Patrick-Lew commented Aug 12, 2024

ys-2020 commented Oct 1, 2024

LLama-3-8B model dumped by LMQuant in 4w8a set raises errors when running e2e benchmark in QServe. #29

LLama-3-8B model dumped by LMQuant in 4w8a set raises errors when running e2e benchmark in QServe. #29

Comments

Patrick-Lew commented Aug 12, 2024

ys-2020 commented Oct 1, 2024