Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Add Phi3.5-mini #555

Merged
merged 1 commit into from
Aug 23, 2024
Merged

Conversation

CharlieFRuan
Copy link
Contributor

@CharlieFRuan CharlieFRuan commented Aug 23, 2024

This PR adds the newly release Phi3.5-mini, adding the following model_ids to our prebuilt model list:

  • Phi-3.5-mini-instruct-q4f16_1-MLC (4k KVCache)
  • Phi-3.5-mini-instruct-q4f32_1-MLC (4k KVCache)
  • Phi-3.5-mini-instruct-q4f16_1-MLC-1k (1k KVCache)
  • Phi-3.5-mini-instruct-q4f16_1-MLC-1k (1k KVCache)

See mlc-ai/binary-mlc-llm-libs#136 for on which commits of TVM and MLC-LLM this is compiled with.

Note that Phi-3.5-mini comes with support up to 128K context (unlike Phi-3-mini which only has 4k) thanks to rope scaling which MLC-LLM supports, which you can take advantage of in WebLLM by increasing ModelRecord.overrides.context_window_size or specifying it in ChatOptions when loading a model, as long as there is enough VRAM.

@CharlieFRuan CharlieFRuan merged commit 2639a80 into mlc-ai:main Aug 23, 2024
1 check passed
CharlieFRuan added a commit that referenced this pull request Aug 23, 2024
### Change
- #555

### TVMjs
- Updated to current head:
apache/tvm@1518008
  - Main change is apache/tvm#17251
- This is needed for WASMs compiled after
apache/tvm#17257 is merged (e.g. Phi-3.5). TVM
global functions that returns bool need this PR to run correctly (e.g.
`AcceptToken()` in BNFGrammar) in runtime.
- However, these are backward compatible to WASMs compiled prior to this
PR. Tested with Phi-3 (old WASM) running grammar.
jzhao62 pushed a commit to jzhao62/web-llm that referenced this pull request Dec 8, 2024
This PR adds the newly release Phi3.5-mini, adding the following
`model_id`s to our prebuilt model list:
- `Phi-3.5-mini-instruct-q4f16_1-MLC` (4k KVCache)
- `Phi-3.5-mini-instruct-q4f32_1-MLC` (4k KVCache)
- `Phi-3.5-mini-instruct-q4f16_1-MLC-1k` (1k KVCache)
- `Phi-3.5-mini-instruct-q4f16_1-MLC-1k` (1k KVCache)

See mlc-ai/binary-mlc-llm-libs#136 for on which
commits of TVM and MLC-LLM this is compiled with.

Note that Phi-3.5-mini comes with support up to 128K context (unlike
Phi-3-mini which only has 4k) thanks to rope scaling which MLC-LLM
supports, which you can take advantage of in WebLLM by increasing
`ModelRecord.overrides.context_window_size` or specifying it in
`ChatOptions` when loading a model, as long as there is enough VRAM.
jzhao62 pushed a commit to jzhao62/web-llm that referenced this pull request Dec 8, 2024
### Change
- mlc-ai#555

### TVMjs
- Updated to current head:
apache/tvm@1518008
  - Main change is apache/tvm#17251
- This is needed for WASMs compiled after
apache/tvm#17257 is merged (e.g. Phi-3.5). TVM
global functions that returns bool need this PR to run correctly (e.g.
`AcceptToken()` in BNFGrammar) in runtime.
- However, these are backward compatible to WASMs compiled prior to this
PR. Tested with Phi-3 (old WASM) running grammar.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant