[feature-request] Add support for mlc-llm #18

shwu-nyunai · 2024-09-05T07:33:40Z

Feature type?
Algorithm request

A proposal draft (if any)
MLC-LLM is an LLM Deployment Engine with ML Compilation.
https://github.com/mlc-ai/mlc-llm

It has a very wide environment and backend support

Primarily,

the different quantisation schemes should be supported ootb.
- In the past, we've had tested out that the outputs from nyuntam's w4a16 quant algo (awq) can be directly used as inputs for the q4f16_awq quantisation scheme of mlcllm. We expect that this should still hold true. Ideally, if someone choses q4f16_awq as the quantisation, nyuntam's AutoAWQ should be used as the intermediary job, and the output(s) be used to continue mlc-llm's weight conversion and model compilation.
- for 3bit quantisation, mlc-llm supports Omniquant's inputs as per this notebook
all the platforms supported by mlc-llm should supported ootb, though testing for the same is subject to a test environment availability.

The text was updated successfully, but these errors were encountered:

shwu-nyunai added feature-request algorithm-request good first issue Good for newcomers labels Sep 5, 2024

Provide feedback