-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NOTICE] Transition from ChatModule to MLCEngine #2217
Comments
@tqchen Does a new engine still support custom or modified chat/conversation templates in MLC config? For example, sometimes there are differences between the original model and the fine-tuned model. |
@MikeLP yes, we should keep such customization |
#2279 brings an initial iOS ver of MLCEngine |
#2380 transitions iOS ChatApp to MLCEngine |
#2410 transitions the android to the MLCEngine |
We have completed the transition steps |
As we start to formalize MLC LLM Engine, we are moving towards a more comprehensive API that is OpenAI compatible. This means a lot of new features that allows us to do more things across our backends, including
The project started with ChatModule, which primarily focuses on Chat. This is a note to community that we are planning to phasing out ChatModule in favor of the MLCEngine. Another advantage is that after this we will have a single engine that backs all our backends, enabling features in one backend quickly enabled in another.
Transition
As of now ChatModule is still available, but we try to avoid mention it in docs. The current mlc_llm chat is still backed by the ChatModule. We are working on a JSONFFIEngine, with pure json_string input/output, to enable us to expose a broader set of interface(as in openai) to broader set of backends. So the transition will happen once JSONFFIEngine lands. One additional thing we will need is automatic prefix caching to speedup multiround chat. Backends like iOS and android will interface with JSONFFIEngine (that have full openai features)
Additionally, we understand that there is a desire to access MLC through a low-level API, that directly leverages TVM runtime. ChatModule and its CLI has been useful for some debugging purposes.
For such low-level debugging, we do not necessarily need the full engine that supports continuos batching and spec decoding. We introduce a debug chat https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/testing/debug_chat.py, which features more inspection and single round input output generation. We can consider build possible c++ versions of it as well .
We are still working on some of the above items, so this is not an immediate item, but we would like to bring awareness to the community.
The text was updated successfully, but these errors were encountered: