You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I dumped the quantised llama-3-8B model from LMQuant, using QoQ, the command as follows written in lmquant/projects/llm/scripts
/qoq.sh # QoQ (W4A8KV4 with per-channel weight quantization) on Llama3-8B python -m lmquant.llm.run configs/llm.yaml configs/qoq/gchn.yaml --model-name llama3-8b --smooth-xw-alpha 0.05 --smooth-xw-beta 0.95 --smooth-yx-strategy GridSearch --smooth-yx-beta " -2"
and I append --save-model and --model-path to that command.
Then I run the checkpoint_convert.py script in QServe, get the checkpoint model.
then run e2e by this command
but the results are like
I want to know whether this e2e benchmark can only be run on the llama-3-8B-instruct model that you kindly provided in huggingface repo?
I also try running e2e benchmark on other model like llama-3-8B(non-instruct) but it also raises error like above.
Thanks.
Patrick
The text was updated successfully, but these errors were encountered:
Thanks for your interests in QServe! We would suggest you kindly use instruction-tuned models for the e2e generation for robust outputs. The current conversation template is designed for instruct models.
I dumped the quantised llama-3-8B model from LMQuant, using QoQ, the command as follows written in
lmquant/projects/llm/scripts
/qoq.sh
# QoQ (W4A8KV4 with per-channel weight quantization) on Llama3-8B python -m lmquant.llm.run configs/llm.yaml configs/qoq/gchn.yaml --model-name llama3-8b --smooth-xw-alpha 0.05 --smooth-xw-beta 0.95 --smooth-yx-strategy GridSearch --smooth-yx-beta " -2"
and I append --save-model and --model-path to that command.
Then I run the checkpoint_convert.py script in QServe, get the checkpoint model.
then run e2e by this command
but the results are like
I want to know whether this e2e benchmark can only be run on the llama-3-8B-instruct model that you kindly provided in huggingface repo?
I also try running e2e benchmark on other model like llama-3-8B(non-instruct) but it also raises error like above.
Thanks.
Patrick
The text was updated successfully, but these errors were encountered: