[NOTICE] Transition from ChatModule to MLCEngine #2217

tqchen · 2024-04-25T13:09:49Z

As we start to formalize MLC LLM Engine, we are moving towards a more comprehensive API that is OpenAI compatible. This means a lot of new features that allows us to do more things across our backends, including

JSON mode and function calls
Multimodality
Prefix and prompt caching
Speculative decoding

The project started with ChatModule, which primarily focuses on Chat. This is a note to community that we are planning to phasing out ChatModule in favor of the MLCEngine. Another advantage is that after this we will have a single engine that backs all our backends, enabling features in one backend quickly enabled in another.

Transition

As of now ChatModule is still available, but we try to avoid mention it in docs. The current mlc_llm chat is still backed by the ChatModule. We are working on a JSONFFIEngine, with pure json_string input/output, to enable us to expose a broader set of interface(as in openai) to broader set of backends. So the transition will happen once JSONFFIEngine lands. One additional thing we will need is automatic prefix caching to speedup multiround chat. Backends like iOS and android will interface with JSONFFIEngine (that have full openai features)

JSONFFIEngine
PrefixCache

Additionally, we understand that there is a desire to access MLC through a low-level API, that directly leverages TVM runtime. ChatModule and its CLI has been useful for some debugging purposes.

For such low-level debugging, we do not necessarily need the full engine that supports continuos batching and spec decoding. We introduce a debug chat https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/testing/debug_chat.py, which features more inspection and single round input output generation. We can consider build possible c++ versions of it as well .

We are still working on some of the above items, so this is not an immediate item, but we would like to bring awareness to the community.

MikeLP · 2024-04-29T23:52:05Z

@tqchen Does a new engine still support custom or modified chat/conversation templates in MLC config? For example, sometimes there are differences between the original model and the fine-tuned model.

tqchen · 2024-04-30T00:13:16Z

@MikeLP yes, we should keep such customization

tqchen · 2024-05-06T02:27:08Z

#2279 brings an initial iOS ver of MLCEngine

tqchen · 2024-05-22T20:29:52Z

#2380 transitions iOS ChatApp to MLCEngine

tqchen · 2024-05-27T17:51:06Z

#2410 transitions the android to the MLCEngine

tqchen · 2024-05-27T21:51:47Z

We have completed the transition steps

tqchen added the status: tracking Tracking work in progress label Apr 25, 2024

github-project-automation bot added this to MLC LLM Tracking Apr 25, 2024

tqchen mentioned this issue Apr 25, 2024

[ANDROID] Revive mlc_chat_cli utility #2214

Closed

tqchen mentioned this issue May 6, 2024

[iOS] Initial scaffolding of MLCEngine in Swift #2279

Merged

tqchen closed this as completed May 27, 2024

github-project-automation bot moved this to Done in MLC LLM Tracking May 27, 2024

tqchen mentioned this issue Jun 11, 2024

[Question] Is mlc chat deprecated? #2567

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NOTICE] Transition from ChatModule to MLCEngine #2217

[NOTICE] Transition from ChatModule to MLCEngine #2217

tqchen commented Apr 25, 2024 •

edited

Loading

MikeLP commented Apr 29, 2024

tqchen commented Apr 30, 2024

tqchen commented May 6, 2024

tqchen commented May 22, 2024

tqchen commented May 27, 2024

tqchen commented May 27, 2024

[NOTICE] Transition from ChatModule to MLCEngine #2217

[NOTICE] Transition from ChatModule to MLCEngine #2217

Comments

tqchen commented Apr 25, 2024 • edited Loading

Transition

MikeLP commented Apr 29, 2024

tqchen commented Apr 30, 2024

tqchen commented May 6, 2024

tqchen commented May 22, 2024

tqchen commented May 27, 2024

tqchen commented May 27, 2024

tqchen commented Apr 25, 2024 •

edited

Loading