[Refactor] Clean-up Management of Model/Artifact/Engine Info #66

sunggg · 2023-11-15T23:09:34Z

Problem

Currently, there are several factors that makes the deployment flow tricky:

No separation between compile-time info (e.g., build option, model info) and deployment info (e.g., engine config). Often times, we are mix-using them so current engine needs compile-time info, such as num_shards besides model artifact. This is unnecessary.
Unnecessary duplication of info management.
Dependency on hf model config for model and tokenizer info. This also requires artifact to have such info in specific path structure so the deployment flow should copy such info separately after every compilation.
No way to check which build options is used for the given artifact.
Implicit name deduction. Since the deduction rule has changed over time, ollm needed to be updated accordingly.
Disco compilation requires two steps of build commands.

Changes

To overcome these issues, this PR lands the following changes:

Explicit separation between compile-time info and deployment info. Compile-time info is managed by ModelArtifactConfig while deployment info is managed by MLCServeEngineConfig.
Remove redundant info management. All necessary info is managed by two structs: ModelArtifactConfig and MLCServeEngineConfig
Removed the dependency on HF configs. Build script will dump artifact as follows:

`model_artifact_path` (`asset` in ollm) has the following structure
|- compiled artifact (.so)
|- `build_config.json`: stores compile-time info, such as `num_shards`, `quantization` and entire build flags used. 
|- params/ : stores weights in mlc format and `ndarray-cache.json`. 
|            `ndarray-cache.json` is especially important for Disco.
|- model/ : stores info from hf model cards such as max context length and tokenizer

Build options used to produce the artifact can be found in build_config.json.
No implicit name deduction. While respecting the previous deduction rule, it also supports direct specification of model artifact name.

model artifact name: llama-2-13b-chat-hf-q0f16-presharded-1gpu
before: `--local-id llama-2-13b-chat-hf-q0f16`
after: `--local-id llama-2-13b-chat-hf-q0f16-presharded-1gpu`  or `--local-id llama-2-13b-chat-hf-q0f16`

Add syntactic sugar for Disco build. No separate steps needed, --num-shards N should suffice.
Address issues in Changes to support input validation to match OpenAI behavior. #65 and Add handling of max_gen_len from mlc-llm chat_config #64

Todo

High: ollm integration
High: determine default engine config
Low: mlc_serve having dependency on mlc_llm makes the packaging tricky. Better to remove it.

cc. @jroesch @elvin-n @masahi

serve/mlc_serve/engine/base.py

serve/mlc_serve/model/base.py

serve/mlc_serve/run.py

jroesch

Overall LGTM, couple small follow up nits/questions.

elvin-n

LGTM

mlc_llm/build.py

sunggg added 3 commits November 15, 2023 16:50

wip

331bd1d

works

1204b21

fix

c96740e

jroesch reviewed Nov 15, 2023

View reviewed changes

serve/mlc_serve/engine/base.py Outdated Show resolved Hide resolved

jroesch reviewed Nov 15, 2023

View reviewed changes

serve/mlc_serve/model/base.py Show resolved Hide resolved

jroesch reviewed Nov 15, 2023

View reviewed changes

serve/mlc_serve/run.py Outdated Show resolved Hide resolved

jroesch approved these changes Nov 15, 2023

View reviewed changes

masahi approved these changes Nov 16, 2023

View reviewed changes

reflect feedback and add checks for engine configs

994accc

elvin-n approved these changes Nov 16, 2023

View reviewed changes

mlc_llm/build.py Outdated Show resolved Hide resolved

fix

52543e2

sunggg merged commit 858a444 into octoml:batch-serving Nov 16, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Clean-up Management of Model/Artifact/Engine Info #66

[Refactor] Clean-up Management of Model/Artifact/Engine Info #66

sunggg commented Nov 15, 2023 •

edited

Loading

jroesch left a comment

elvin-n left a comment

[Refactor] Clean-up Management of Model/Artifact/Engine Info #66

[Refactor] Clean-up Management of Model/Artifact/Engine Info #66

Conversation

sunggg commented Nov 15, 2023 • edited Loading

Problem

Changes

Todo

jroesch left a comment

Choose a reason for hiding this comment

elvin-n left a comment

Choose a reason for hiding this comment

sunggg commented Nov 15, 2023 •

edited

Loading