You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the past, we've had tested out that the outputs from nyuntam's w4a16 quant algo (awq) can be directly used as inputs for the q4f16_awq quantisation scheme of mlcllm. We expect that this should still hold true. Ideally, if someone choses q4f16_awq as the quantisation, nyuntam's AutoAWQ should be used as the intermediary job, and the output(s) be used to continue mlc-llm's weight conversion and model compilation.
for 3bit quantisation, mlc-llm supports Omniquant's inputs as per this notebook
all the platforms supported by mlc-llm should supported ootb, though testing for the same is subject to a test environment availability.
The text was updated successfully, but these errors were encountered:
Feature type?
Algorithm request
A proposal draft (if any)
MLC-LLM is an LLM Deployment Engine with ML Compilation.
https://github.com/mlc-ai/mlc-llm
It has a very wide environment and backend support
Primarily,
the different quantisation schemes should be supported ootb.
q4f16_awq
quantisation scheme of mlcllm. We expect that this should still hold true. Ideally, if someone chosesq4f16_awq
as the quantisation, nyuntam's AutoAWQ should be used as the intermediary job, and the output(s) be used to continue mlc-llm's weight conversion and model compilation.all the platforms supported by mlc-llm should supported ootb, though testing for the same is subject to a test environment availability.
The text was updated successfully, but these errors were encountered: