Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NOTICE] Transition from ChatModule to MLCEngine #2217

Closed
2 tasks done
tqchen opened this issue Apr 25, 2024 · 6 comments
Closed
2 tasks done

[NOTICE] Transition from ChatModule to MLCEngine #2217

tqchen opened this issue Apr 25, 2024 · 6 comments
Labels
status: tracking Tracking work in progress

Comments

@tqchen
Copy link
Contributor

tqchen commented Apr 25, 2024

As we start to formalize MLC LLM Engine, we are moving towards a more comprehensive API that is OpenAI compatible. This means a lot of new features that allows us to do more things across our backends, including

  • JSON mode and function calls
  • Multimodality
  • Prefix and prompt caching
  • Speculative decoding

The project started with ChatModule, which primarily focuses on Chat. This is a note to community that we are planning to phasing out ChatModule in favor of the MLCEngine. Another advantage is that after this we will have a single engine that backs all our backends, enabling features in one backend quickly enabled in another.

Transition

As of now ChatModule is still available, but we try to avoid mention it in docs. The current mlc_llm chat is still backed by the ChatModule. We are working on a JSONFFIEngine, with pure json_string input/output, to enable us to expose a broader set of interface(as in openai) to broader set of backends. So the transition will happen once JSONFFIEngine lands. One additional thing we will need is automatic prefix caching to speedup multiround chat. Backends like iOS and android will interface with JSONFFIEngine (that have full openai features)

  • JSONFFIEngine
  • PrefixCache

Additionally, we understand that there is a desire to access MLC through a low-level API, that directly leverages TVM runtime. ChatModule and its CLI has been useful for some debugging purposes.

For such low-level debugging, we do not necessarily need the full engine that supports continuos batching and spec decoding. We introduce a debug chat https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/testing/debug_chat.py, which features more inspection and single round input output generation. We can consider build possible c++ versions of it as well .

We are still working on some of the above items, so this is not an immediate item, but we would like to bring awareness to the community.

@MikeLP
Copy link

MikeLP commented Apr 29, 2024

@tqchen Does a new engine still support custom or modified chat/conversation templates in MLC config? For example, sometimes there are differences between the original model and the fine-tuned model.

@tqchen
Copy link
Contributor Author

tqchen commented Apr 30, 2024

@MikeLP yes, we should keep such customization

@tqchen
Copy link
Contributor Author

tqchen commented May 6, 2024

#2279 brings an initial iOS ver of MLCEngine

@tqchen
Copy link
Contributor Author

tqchen commented May 22, 2024

#2380 transitions iOS ChatApp to MLCEngine

@tqchen
Copy link
Contributor Author

tqchen commented May 27, 2024

#2410 transitions the android to the MLCEngine

@tqchen
Copy link
Contributor Author

tqchen commented May 27, 2024

We have completed the transition steps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: tracking Tracking work in progress
Projects
Status: Done
Development

No branches or pull requests

3 participants
@MikeLP @tqchen and others