Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge with mlc-ai/main (68cd794d02bbff9842f08b6b2ff37eb582f411c0, 2024-08-01) #277

Merged
merged 532 commits into from
Aug 2, 2024
Merged
Changes from 1 commit
Commits
Show all changes
532 commits
Select commit Hold shift + click to select a range
3621bf6
[Eagle] Run additional decode for draft model when all proposals are …
vinx13 May 7, 2024
df4e2f3
[iOS] Introducing package CLI for iOS app packaging (#2297)
MasterJH5574 May 8, 2024
8a31986
Increase the timeout in PopenServer (#2298)
yongwww May 8, 2024
65f9716
[LLM-CHAT] Enable gpu softmax for penality softmax (#2288)
krishnaraj36 May 8, 2024
1bd1ab0
[iOS][REFACTOR] Restructure the iOS folders (#2299)
tqchen May 8, 2024
c580140
[KVCACHE][TIR] Improved tir schedule for decode tir page attention (#…
krishnaraj36 May 8, 2024
10f3e4d
[Sampler] Remove unneeded output_prob_dist param (#2300)
vinx13 May 9, 2024
33c15e7
Enable cuda graph for batch_verify (#2304)
vinx13 May 9, 2024
dbd13f4
[Android] Introducing mlc4j and app packaging (#2305)
MasterJH5574 May 10, 2024
b62dd91
[DOCS] Minor cleanup (#2308)
tqchen May 10, 2024
37230db
[DOCS] Update android doc (#2309)
tqchen May 10, 2024
8bb1d6e
[DOCS] Update android doc (#2310)
tqchen May 10, 2024
459ffe3
[SLM] Support BERT architecture. Implement a text embedding module (#…
rickzx May 10, 2024
ea391de
[Serving] Log batch size in NVTX (#2312)
vinx13 May 10, 2024
b01cfab
[Model] Removing unnecessary reshapes in get_logits (#2314)
vinx13 May 10, 2024
347222c
Skip cublas dispatch for single batch (#2315)
vinx13 May 10, 2024
73b733d
Auto updated submodule references
May 10, 2024
3a0b42c
[DOCS] Remove mention of legacy modules (#2318)
tqchen May 10, 2024
2b8aadf
[Android] Add `-j` option to cmake build (#2321)
MasterJH5574 May 10, 2024
98f0424
[DOCS] More clear android instruction (#2327)
tqchen May 11, 2024
21feb70
[Serving] Refactor to consolidate new request prefill (#2329)
vinx13 May 12, 2024
45a0487
[iOS] Make MLCEngine input to take in structured data (#2330)
tqchen May 12, 2024
679d3a8
[REFACTOR] Refactor JSONFFI Conv template (#2331)
tqchen May 13, 2024
821ee5d
[Eagle] Fix the requests for additional decode in eagle verify (#2336)
vinx13 May 13, 2024
bc6e3ed
[Serving][Grammar] Refactor GrammarStateMatcher and support LLaMA-3 (…
Ubospica May 14, 2024
0c03537
[DebugChat] Fix DebugChat softmax function and save logits to debug f…
rickzx May 14, 2024
b247f8d
[Serving] Add Medusa speculative decoding (#2337)
vinx13 May 14, 2024
2bbbd52
Fix cublas offloading (#2343)
vinx13 May 15, 2024
227dbb8
Add false for arg worker0_only in disco.empty (#2344)
yongwww May 15, 2024
9b89e04
Auto updated submodule references
May 15, 2024
56ea156
[JSONFFIEngine] Refactor device argument and request_stream_callback …
anibohara2000 May 15, 2024
152ecc4
[Serving] Add reset_engine in debug_entrypoints (#2347)
yongwww May 16, 2024
ac1cd51
[Bugfix] Make sequence_length dtype int64 in EngineConfig. Fix Mistra…
rickzx May 18, 2024
96fc289
[JSON FFI] Example Android Application using JSON FFI Engine (#2322)
Kartik14 May 18, 2024
0e3d536
[iOS] Update MLCEngine API to latest JSON FFI convention (#2359)
tqchen May 18, 2024
9998076
[JSONFFI] Fix JSONFFI conv template. Add unit tests (#2360)
rickzx May 19, 2024
beb126c
[Fix][Serving] Fix prefill chunk in interactive mode (#2363)
MasterJH5574 May 20, 2024
2146f15
[Fix][Serving] Respect sliding window size in config inference (#2364)
MasterJH5574 May 20, 2024
27dc5c8
[iOS] Add padding to app icon (#2365)
Neet-Nestor May 21, 2024
8aed35e
[Serving] Fix the self-ref in engine (#2367)
tqchen May 21, 2024
5444fd5
[Serving] Prefix Cache (#2295)
cyx-6 May 21, 2024
3c0b15c
[Fix] Use static_cast for `.size()` for safety (#2369)
MasterJH5574 May 21, 2024
ff39925
[Serving] Sliding-window-aware request prefill (#2370)
MasterJH5574 May 22, 2024
db039cf
[iOS] Update MLCSwift to fully follow OAI style. (#2371)
tqchen May 22, 2024
edc434d
Add nvtx in logic update (#2372)
yongwww May 22, 2024
8d3194c
[Test] Use HF model for JIT as much as possible (#2373)
MasterJH5574 May 22, 2024
20c198f
[Fix] Fix prefix cache reset and forking logic (#2374)
cyx-6 May 22, 2024
a5e71b3
[CLI] Migrate CLI to use the new Engine (#2375)
tqchen May 22, 2024
0724983
[TESTING] Introduce testing util to manage models (#2377)
tqchen May 22, 2024
6dd6c89
[REFACTOR][Rename] MLC_LLM_SOURCE_DIR and TVM_SOURCE_DIR source dire…
tqchen May 22, 2024
6de0f55
[REFACTOR][ENV] MLC_CACHE_DIR to MLC_LLM_HOME (#2379)
tqchen May 22, 2024
547060a
[iOS] Switch MLC Chat to use MLCEngine (#2380)
tqchen May 22, 2024
db833aa
[REFACTOR] Cleanup legacy code (#2381)
tqchen May 22, 2024
600a3e5
[Fix] Update prefix cache config (#2382)
cyx-6 May 22, 2024
2e1ff62
[PREFIX-CACHE] Fix some issues with prefix cache (#2384)
tqchen May 23, 2024
7eaeed1
[FIX] Typo on OpenAI Chat class in engine (#2385)
Faolain May 23, 2024
ac4dff7
[Serving][Refactor] Metrics and stats for CLI (#2387)
MasterJH5574 May 23, 2024
fbe3b9e
[REFACTOR] Organize metrics (#2390)
tqchen May 23, 2024
9631cc3
[Fix] Avoid ref capture in prefix cache contruction (#2391)
MasterJH5574 May 23, 2024
370fca5
[REFACTOR] Cleanup Metrics (#2392)
tqchen May 23, 2024
00c2292
[FIX] Fix mlc llm source dir argument (#2394)
tqchen May 23, 2024
ddbec62
[Fix] Fix the serialization of SpecDecodeMetrics (#2395)
MasterJH5574 May 23, 2024
eb546ee
[Fix] Update missing change in engine ffi func name (#2396)
cyx-6 May 23, 2024
040b10e
Auto updated submodule references
May 24, 2024
641b64b
[Fix] Fix no prefix cache (#2397)
cyx-6 May 24, 2024
988e9f0
add hasattr safecheck for MLCEngineBase (#2400)
BodhiHu May 24, 2024
70f2a76
[Refactor] Expose EngineConfig in engine constructor (#2399)
MasterJH5574 May 24, 2024
37da8e4
[REFACTOR] Introduce RequestMetrics and metrics endpoint (#2401)
tqchen May 24, 2024
a6d3cc1
[Fix] Fix format issue of MLCEngineBase (#2402)
MasterJH5574 May 24, 2024
9f96333
[FIX] fix comments in radix_tree.py (#2403)
ita9naiwa May 24, 2024
db78862
[Fix] Fix metric names in tests and static PrefixCacheModes (#2404)
MasterJH5574 May 24, 2024
d12afce
[Op] Tree attention (#2376)
spectrometerHBH May 24, 2024
d39272a
[REFACTOR] Reorganize GenerationConfig DebugConfig and FFI (#2407)
tqchen May 24, 2024
d770270
[Fix] Fix vector OOB when no inputs can be prefilled in spec decode (…
MasterJH5574 May 24, 2024
97df697
[Fix] Update number of available pages after prefix cache free (#2409)
MasterJH5574 May 24, 2024
7eba612
[REFACTOR] Enable validation logic in GenerationConfig (#2411)
tqchen May 24, 2024
905620c
[Chat] Support chat completion config override (#2412)
MasterJH5574 May 24, 2024
cd79b96
Change name RedixPage -> RadixPage in RadixTree.cc (#2413)
ita9naiwa May 24, 2024
cfc0597
[Fix] Fix ignore_eos support (#2414)
MasterJH5574 May 24, 2024
135419e
[Test][Refactor] Update tests to use require_test_model (#2415)
MasterJH5574 May 25, 2024
b18284b
[Serving] Enable GPU Sampling (#2368)
Hzfengsy May 25, 2024
0b2cbb2
[REFACTOR] Support latest include_usage and DebugOptions (#2417)
tqchen May 26, 2024
3b272eb
[DOWNLOAD] MLC_DOWNLOAD_POLICY and MLC_LLM_READONLY_WEIGHT_CACHES (#2…
tqchen May 26, 2024
c62e143
[REFACTOR] Rename MLC_LLM_READONLY_WEIGHT_CACHES (#2423)
tqchen May 26, 2024
13c0661
[Tokenizer] Auto-detect TokenizerInfo from tokenizer.json (#2416)
Ubospica May 26, 2024
8b38a4b
[REFACTOR] Remove dependencies on legacy chat_module (#2424)
tqchen May 26, 2024
ff91749
[REFACTOR] Terminology download=>download_cache (#2425)
tqchen May 26, 2024
14bec5a
[REFACTOR] Move GenerationConfig to protocol (#2427)
tqchen May 26, 2024
ae88612
Update README.md
Neet-Nestor May 27, 2024
0df00bf
[site] Add hero section to website (#2430)
Neet-Nestor May 27, 2024
1025926
[Compile] Skip CUDA graph rewrite when target is not CUDA (#2433)
MasterJH5574 May 27, 2024
00e79d1
[DOCS] Simplify read me (#2435)
tqchen May 27, 2024
21ac3a2
[DOCS] Update title to focus on engine feature
tqchen May 27, 2024
4538cc7
[Metadata] Remove stale KV cache size (#2434)
MasterJH5574 May 27, 2024
526114e
[iOS] Update the MLCSwift APIs to async (#2436)
tqchen May 27, 2024
c87d369
[Android] Switch MLC Chat to use MLCEngine (#2410)
mengshyu May 27, 2024
5b73ec3
[iOS] Remove Legacy ChatModule (#2437)
tqchen May 27, 2024
16fb729
[Delivery] Update model delivery script to support specifying the out…
rickzx May 27, 2024
ba8e20a
[Android] Remove Legacy ChatModule (#2438)
mengshyu May 27, 2024
be15b22
[Refactor] Remove ChatModule (#2440)
MasterJH5574 May 27, 2024
50adede
[Fix][REST] Fix usage-related server tests (#2441)
MasterJH5574 May 27, 2024
dc40656
[Site] Enlarge hero image in small screens
Neet-Nestor May 27, 2024
f2db8e4
Fix lint
tqchen May 27, 2024
d93e5a6
[ANDROID] Patches to enable windows usescase (#2443)
tqchen May 28, 2024
709644f
[DOCS] Guides for android on windows (#2444)
tqchen May 28, 2024
4df3abf
[DOCS] mention git-lfs (#2445)
tqchen May 28, 2024
2fc9c63
Fix Llama-3 conversation template. Add unit test (#2442)
rickzx May 27, 2024
cd4a853
[Grammar][Wasm] Update new grammar to wasm runtime (#2446)
CharlieFRuan May 28, 2024
de61926
[Model] Use float32 for RoPE calculation (#2449)
MasterJH5574 May 28, 2024
cf4bffe
[LogitProcessor] Use min float value as the mask value (#2451)
MasterJH5574 May 28, 2024
570380c
[Protocol] Use `by_alias=True` when dumping pydantic classes (#2450)
MasterJH5574 May 28, 2024
30e46b4
[Protocol] Use `by_alias=True` when dumping pydantic classes (#2452)
MasterJH5574 May 28, 2024
e9a63ed
[DOCS] Updates the URL of the Android APK (#2453)
mengshyu May 28, 2024
d1f5f51
Auto updated submodule references
May 28, 2024
6c31701
[Fix][Phi3] Add `</s>` as stop token for phi3 (#2455)
CharlieFRuan May 28, 2024
d7c159e
[Site] Add GitHub link to hero section
Neet-Nestor May 29, 2024
477da69
Update README.md
Neet-Nestor May 29, 2024
dc091e7
[Hermes2] Add conv template for Hermes2-Pro-Llama3 (#2457)
CharlieFRuan May 29, 2024
27d1f6f
[Compile] Add max_batch_size to metadata (#2463)
MasterJH5574 May 29, 2024
f2c1582
[REFACTOR] Re-organize the modules after transition to MLCEngine (#2464)
tqchen May 29, 2024
e90f2e7
[Serving] Add ICHECK for running batch size (#2465)
MasterJH5574 May 29, 2024
5df26b6
Auto updated submodule references
May 29, 2024
a8e85d0
[TEST] Start to categorize tests (#2466)
tqchen May 29, 2024
249b945
Implemented FP8 calibration (#2454)
vinx13 May 29, 2024
9efb1ba
[CI] Update CUDA build script with FlashInfer options (#2469)
MasterJH5574 May 30, 2024
e0e779a
[Serving] Use preferred host memory for host NDArrays (#2468)
MasterJH5574 May 30, 2024
515823c
[TEST] Temp disable UT stage
tqchen May 30, 2024
c4d337d
[CUDA] Turn on cuda graph at O2 (#2467)
vinx13 May 30, 2024
96d752c
[CI] Enable GPU env in CI (#2476)
tqchen May 30, 2024
cf0278f
[CMake] Update config.cmake generation script (#2478)
MasterJH5574 May 30, 2024
16f0af4
[TEST] MockEchoEngine (#2479)
tqchen May 31, 2024
33dbfd1
Auto updated submodule references
May 31, 2024
ab52b72
[Fix] Fix JSONFFI MemoryBufferStream after dmlc bump (#2480)
MasterJH5574 May 31, 2024
61889fe
[JSON-FFI] Enable n generation and pass in json schema (#2481)
tqchen May 31, 2024
8fc5efa
Refactor model delivery script to use pydantic (#2482)
rickzx May 31, 2024
589c76f
Fix tokenizers encode batch (#2484)
vinx13 Jun 1, 2024
c1628dd
[Bugfix] Fix delivered log issue in delivery cli (#2489)
rickzx Jun 2, 2024
abd7d51
Support Qwen2-MoE Architecture (#2089)
Hzfengsy Jun 2, 2024
46ee63a
[3rdparty] Bump tokenizers-cpp to include HF tokenizers bump (#2490)
MasterJH5574 Jun 2, 2024
71828b0
[Bench] Add mlc bench (#2474)
yongwww Jun 3, 2024
5b4fc07
Auto updated submodule references
Jun 3, 2024
91cc194
Enable n-sampling for Medusa spec decoding (#2495)
vinx13 Jun 3, 2024
94de2a4
[CONFIG] Remove mean_gen_len from the config (#2493)
tqchen Jun 3, 2024
c8bfb50
Update ios android docs (#2497)
tqchen Jun 3, 2024
5a8a728
[Bench] Add seed to __init__ and some minor change (#2496)
yongwww Jun 4, 2024
90170e6
[Fix][Config] Max total sequence length overflow with sliding window …
MasterJH5574 Jun 4, 2024
c0c33a5
[Serving] PagedKVCache tree-attention integration (#2487)
MasterJH5574 Jun 4, 2024
d6f7a58
[Sampler] Enhance checks for whether FlashInfer is enabled (#2502)
MasterJH5574 Jun 4, 2024
70b3102
[Android] Updates the default mode list and the APK link in the docum…
mengshyu Jun 4, 2024
e63aab4
[Fix] Fix the global func name of TokenizerDecode (#2514)
MasterJH5574 Jun 5, 2024
8e56d95
[Fix] Use the correct model to validate stream_options (#2508)
zifeitong Jun 5, 2024
4179922
[Fix] Typo in docs/install/tvm.rst (#2507)
zifeitong Jun 5, 2024
64e33c5
[FP8] Use f32 scale to enable better fusion (#2505)
vinx13 Jun 5, 2024
3bdc8f6
[Metrics] Add ttft and itl to server metrics (#2510)
yongwww Jun 5, 2024
3184294
[Model] Fix config detection for Mistral (#2504)
MasterJH5574 Jun 5, 2024
78e59ab
[Fix] Provide a GetTokenId API for SampleResult (#2516)
Ubospica Jun 5, 2024
3f36236
[Reapply][BUGFIX] Fix rare deadlock in threaded engine (#2429) (#2518)
MasterJH5574 Jun 6, 2024
fbc75c0
[Fix] Fix metrics division by 0 (#2519)
MasterJH5574 Jun 6, 2024
80789f4
Corrected the folder path for Android Studio Project (#2520)
Ramees025 Jun 6, 2024
fd51f97
Update tvm.rst
tqchen Jun 6, 2024
9de380c
[iOS] Update model list (#2524)
spectrometerHBH Jun 6, 2024
1881992
[Android] Updates the order of mode list and the APK link in the docu…
mengshyu Jun 6, 2024
61f5623
[Sampler] Skip top-p renormalization if top-p is 1 in CPUSampler (#2528)
MasterJH5574 Jun 6, 2024
9d16fec
[Docs] Rename javascript.rst to webllm.rst (#2531)
CharlieFRuan Jun 6, 2024
69c600c
[Conv] Add tinyLlama v1.0 conv template (#2530)
CharlieFRuan Jun 6, 2024
868334d
[iOS] correct mistral q3 url and handle screen switch off (#2529)
tqchen Jun 6, 2024
206db55
[Grammar] Fix include protection and paths in docstring (#2515)
Ubospica Jun 7, 2024
50a1a7c
[Tokenizer][Fix] Fix SegFault when analyzing tokenizers without token…
Ubospica Jun 7, 2024
5f71aa9
[Serving] Use stop strs and token ids for completions (#2534)
MasterJH5574 Jun 7, 2024
a096c91
[Serving] Support tensor parallel shards override in command line (#2…
MasterJH5574 Jun 7, 2024
9be4b92
Add tie_word_embedding option for Qwen2 model (#2535)
rickzx Jun 7, 2024
b5b40ee
[Bench] Defaults to aiohttp client, add ServerMetrics (#2527)
yongwww Jun 7, 2024
e601409
[Android] Remove var capture in TVM_SOURCE_DIR (#2538)
MasterJH5574 Jun 7, 2024
d5fbde2
[Fix] Fix inconsistent system prompt handling (#2539)
MasterJH5574 Jun 7, 2024
208642d
[Attention] Fix attn kernel for general GQA group size (#2543)
MasterJH5574 Jun 7, 2024
fcb50a2
fix: typo error (#2544)
michaelhenry Jun 7, 2024
6bd049e
[Fix] Fix attn kernel build issue (#2545)
MasterJH5574 Jun 7, 2024
961d5f1
[iOS] Add Qwen2 support (#2547)
tqchen Jun 7, 2024
78b6e1f
[Android] Add Qwen2 support (#2548)
mengshyu Jun 7, 2024
26a9cf0
[Android] Escape backslashes and quotation marks (#2546)
MasterJH5574 Jun 7, 2024
6bbd49c
[EngineConfig] Add override options (#2550)
MasterJH5574 Jun 7, 2024
f489d8d
[Site] Update link to webllm
Neet-Nestor Jun 8, 2024
db896d1
[Site] Update heading
Neet-Nestor Jun 8, 2024
203cda6
[Preset] Add model preset for model delivery (#2553)
CharlieFRuan Jun 8, 2024
9633c9f
Update docs to remove mention of older models (#2557)
tqchen Jun 8, 2024
c25834d
[Docs] Fix typo in mlc_llm chat command (#2560)
Neet-Nestor Jun 9, 2024
931587b
Fix compilation for gcc 13.2 (#2561)
elvin-n Jun 10, 2024
4234262
[Tokenizer] Priorize HuggingFace/SentencePiece over ByteLevelBPE (#2559)
MasterJH5574 Jun 10, 2024
42f146d
[Serving][Grammar] Jump-forward decoding (#2551)
Ubospica Jun 11, 2024
a231ae1
[Delivery] Update model delivery script (#2565)
rickzx Jun 11, 2024
873827c
[Model] Enhance error reporting for invalid tensor-parallel settings …
MasterJH5574 Jun 12, 2024
dcece51
[Serving] Apply tree structure in draft token verification (#2563)
vinx13 Jun 12, 2024
07c92b0
[Bench] Json mode bench (#2552)
cyx-6 Jun 12, 2024
94a0295
[Model] Support Multi-GPU for Qwen-MoE model (#2573)
MasterJH5574 Jun 13, 2024
ceba951
[Metrics] Add missing fields in `Reset` (#2574)
MasterJH5574 Jun 13, 2024
75b970b
[Doc] Update WebLLM doc (#2578)
CharlieFRuan Jun 14, 2024
e9340c3
[Op] Top-4 implementation for MoE model (#2586)
MasterJH5574 Jun 17, 2024
437166a
[Model] Gemma 1.1 compatibility (#2594)
MasterJH5574 Jun 19, 2024
6a48a02
[Serving] Hybrid prefill (#2604)
cyx-6 Jun 25, 2024
cbf0b02
Update quick_start.rst to fix broken links (#2607)
GunjanDhanuka Jun 27, 2024
d911c60
[Fix] Set the missed prefill finish time (#2613)
MasterJH5574 Jul 1, 2024
fbb6a48
[Android] Reduce binary size (#2606)
MasterJH5574 Jul 1, 2024
0575b92
[Fix] Gemma hidden_activation compatibility (#2614)
MasterJH5574 Jul 1, 2024
c09b108
Update debug_compare (#2612)
Hzfengsy Jul 2, 2024
2d32094
[SLM] Add support for InternLM2 architecture (#2608)
tlopex Jul 2, 2024
0fb5609
[Fix] Prefix cache only enables sliding window on leaf sequence (#2615)
cyx-6 Jul 2, 2024
adc6ee6
[Android] Update include path for tvm runtime src (#2616)
MasterJH5574 Jul 2, 2024
5b63980
[Fix] Mark the decode requests in hybrid prefill (#2621)
MasterJH5574 Jul 4, 2024
ebf5617
[Fix] Fix the chunked prefill condition (#2628)
MasterJH5574 Jul 5, 2024
5165a58
[SLM] Internlm2 Multi-GPU support (#2626)
tlopex Jul 8, 2024
c6122d7
[Serving] Merge multiple token embedding lookup into one (#2629)
MasterJH5574 Jul 8, 2024
c7756f9
[Model] Support Internlm2.5 (#2630)
tlopex Jul 8, 2024
7d73cfa
Fix for RWKV new config and new format vocab (#2632)
Hzfengsy Jul 8, 2024
16a79ab
[Fix] Fix KV cache single-page copy kernel (#2644)
MasterJH5574 Jul 11, 2024
64d8dc6
[Fix][Tokenizer] Fix failure in decoding tokens for ByteLevel BPE (#2…
Ubospica Jul 11, 2024
cbf6ae0
[Fix][Bitmask] Mask dummy padded tokens for grammar (#2651)
CharlieFRuan Jul 12, 2024
2345900
[Engine] Reduce action post-process overhead (#2653)
MasterJH5574 Jul 13, 2024
17ad72c
[PrefixCache] Defer sequence extension (#2654)
MasterJH5574 Jul 14, 2024
5bedaec
[Model] Support Starcoder2 (#2657)
tlopex Jul 15, 2024
baeb195
[Engine] Lazy recompute in GetRunningRequestStateEntries (#2655)
MasterJH5574 Jul 15, 2024
8290a97
[Fix] Fix prefix cache reuse with eagle mode (#2664)
cyx-6 Jul 16, 2024
52c0638
[Model] Support SmolLM (#2667)
CharlieFRuan Jul 17, 2024
c06bb39
[SLM] Starcoder2 Multi-GPU support (#2662)
tlopex Jul 17, 2024
4c4f060
[Engine] Defer the collection of decode inputs in prefill (#2668)
MasterJH5574 Jul 18, 2024
b1834f8
support mistral-nemo (#2676)
yyjhao Jul 22, 2024
a49abcc
[Model] Fix annotation typos (#2672)
tlopex Jul 22, 2024
ecae55c
[Model] Support Llama3.1 (#2682)
MasterJH5574 Jul 23, 2024
cdbd3ed
[SLM] Introduce microsoft/Phi-3 vision (#2658)
mengshyu Jul 24, 2024
9e23e37
[Preset] Add llama3.1 to preset, comment out llama3 (#2683)
CharlieFRuan Jul 24, 2024
fd20c56
[Pass] Rewrite FuseAddRMSNorm to avoid binding rewrite recursion (#2689)
MasterJH5574 Jul 25, 2024
a6aabd6
Initialize all `local_top_k` values in `gating_softmax_topk` (#2694)
Lunderberg Jul 26, 2024
803becc
[Serving] Fix spec decoding call packed with rvalue (#2699)
vinx13 Jul 26, 2024
1364830
[ASYNC] Properly abort cleanup in async handling (#2698)
tqchen Jul 26, 2024
6156dc3
[Serve] Expose prefill mode option (#2701)
cyx-6 Jul 28, 2024
da06a06
[Fix] Fix hybrid prefill disabled (#2705)
cyx-6 Jul 29, 2024
3c7a6d5
Turn on custom allreduce by default in O3 (#2706)
vinx13 Jul 30, 2024
551f3fe
[Fix] Fix hybrid prefill index error (#2707)
cyx-6 Jul 30, 2024
95f8797
[Bench] Revamp benchmark submodule (#2702)
MasterJH5574 Jul 30, 2024
d54007b
[Serving] Fix handling of num_tokens_for_next_decode in spec decoding…
vinx13 Jul 30, 2024
31efb35
Update worker.py for compatibility with upstream TVM (#2712)
Lunderberg Jul 31, 2024
0561a9b
Add support for Gemma2 (#2674)
yyjhao Jul 31, 2024
39069f7
[Preset] Add gemma2 preset (#2715)
CharlieFRuan Aug 1, 2024
7296565
[Android] Update model for Andorid APK (#2718)
mengshyu Aug 1, 2024
59cf662
[iOS] Add Gemma2 for iOS app (#2717)
MasterJH5574 Aug 1, 2024
97bbf52
Default bundle gemma2 (#2721)
tqchen Aug 1, 2024
b0f2731
[Bench] LLMPerf dataset (#2713)
cyx-6 Aug 1, 2024
709f484
[ConvTemplate] Update Gemma template with <bos> (#2722)
MasterJH5574 Aug 1, 2024
68cd794
[C++] Handle system_prefix_token_ids in C++ Conv template (#2723)
MasterJH5574 Aug 1, 2024
a0702ca
Merge remote-tracking branch 'upstream/main' into spark/merge-upstrea…
sunggg Aug 1, 2024
e413b3c
Delete .gitmodules
sunggg Aug 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[CI] Enable GPU env in CI (mlc-ai#2476)
* [CI] Enable GPU env in CI

This PR enables GPU env in ci docker/bash.sh

* remove dep on tvm testing plugin
tqchen authored May 30, 2024
commit 96d752ca13f75cddbf33c4723a10eace0b512b30
34 changes: 32 additions & 2 deletions ci/bash.sh
Original file line number Diff line number Diff line change
@@ -47,12 +47,42 @@ else
COMMAND=("$@")
fi

if [[ -n ${MLC_CI_SETUP_DEPS:-} ]]; then
DOCKER_ENV="${DOCKER_ENV} -e MLC_CI_SETUP_DEPS=${MLC_CI_SETUP_DEPS}"
fi

# Use nvidia-docker if the container is GPU.
if [[ ! -z $CUDA_VISIBLE_DEVICES ]]; then
DOCKER_ENV="${DOCKER_ENV} -e CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}"
if [[ -n ${CUDA_VISIBLE_DEVICES:-} ]]; then
DOCKER_ENV="${DOCKER_ENV} -e CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}"
if type nvidia-docker 1> /dev/null 2> /dev/null; then
DOCKER_BINARY=nvidia-docker
else
DOCKER_BINARY=docker
DOCKER_ENV="${DOCKER_ENV} --gpus all"
fi

# nvidia-docker treats Vulkan as a graphics API, so we need to
# request passthrough of graphics APIs. This could also be set in
# the Dockerfile.
DOCKER_ENV="${DOCKER_ENV} -e NVIDIA_DRIVER_CAPABILITIES=compute,graphics,utility"

# vulkan comaptibility
ICD_SEARCH_LOCATIONS=(
# https://github.com/KhronosGroup/Vulkan-Loader/blob/master/loader/LoaderAndLayerInterface.md#icd-discovery-on-linux
/usr/local/etc/vulkan/icd.d
/usr/local/share/vulkan/icd.d
/etc/vulkan/icd.d
/usr/share/vulkan/icd.d
/etc/glvnd/egl_vendor.d
/usr/share/glvnd/egl_vendor.d
)
for filename in $(find "${ICD_SEARCH_LOCATIONS[@]}" -name "*nvidia*.json" 2> /dev/null); do
DOCKER_VOLUMNS="${DOCKER_VOLUMNS} -v ${filename}:${filename}:ro"
done
fi

# Print arguments.
echo "DOCKER_BINARY ${DOCKER_BINARY}"
echo "WORKSPACE: ${WORKSPACE}"
echo "IMAGE NAME: ${DOCKER_IMAGE_NAME}"
echo "ENV VARIABLES: ${DOCKER_ENV}"
29 changes: 23 additions & 6 deletions ci/jenkinsfile.groovy
Original file line number Diff line number Diff line change
@@ -17,13 +17,14 @@

import org.jenkinsci.plugins.pipeline.modeldefinition.Utils

run_cpu = "bash ci/bash.sh mlcaidev/ci-cpu:4d61e5d -e GPU cpu"
run_cuda = "bash ci/bash.sh mlcaidev/ci-cu121:4d61e5d -e GPU cuda-12.1"
run_rocm = "bash ci/bash.sh mlcaidev/ci-rocm57:4d61e5d -e GPU rocm-5.7"
run_cpu = "bash ci/bash.sh mlcaidev/ci-cpu:4d61e5d -e GPU cpu -e MLC_CI_SETUP_DEPS 1"
run_cuda = "bash ci/bash.sh mlcaidev/ci-cu121:4d61e5d -e GPU cuda-12.1 -e MLC_CI_SETUP_DEPS 1"
run_rocm = "bash ci/bash.sh mlcaidev/ci-rocm57:4d61e5d -e GPU rocm-5.7 -e MLC_CI_SETUP_DEPS 1"

pkg_cpu = "bash ci/bash.sh mlcaidev/package-rocm57:561ceee -e GPU cpu -e MLC_CI_SETUP_DEPS 1"
pkg_cuda = "bash ci/bash.sh mlcaidev/package-cu121:561ceee -e GPU cuda-12.1 -e MLC_CI_SETUP_DEPS 1"
pkg_rocm = "bash ci/bash.sh mlcaidev/package-rocm57:561ceee -e GPU rocm-5.7 -e MLC_CI_SETUP_DEPS 1"

pkg_cpu = "bash ci/bash.sh mlcaidev/package-rocm57:561ceee -e GPU cpu"
pkg_cuda = "bash ci/bash.sh mlcaidev/package-cu121:561ceee -e GPU cuda-12.1"
pkg_rocm = "bash ci/bash.sh mlcaidev/package-rocm57:561ceee -e GPU rocm-5.7"

def per_exec_ws(folder) {
return "workspace/exec_${env.EXECUTOR_NUMBER}/" + folder
@@ -176,6 +177,22 @@ stage('Build') {
)
}

stage('Unittest') {
parallel(
'CUDA': {
node('GPU') {
ws(per_exec_ws('mlc-llm-unittest')) {
init_git(false)
sh(script: "ls -alh", label: 'Show work directory')
unpack_lib('mlc_wheel_cuda', 'wheels/*.whl')
sh(script: "${run_cuda} conda env export --name ci-unittest", label: 'Checkout version')
sh(script: "${run_cuda} conda run -n ci-unittest ./ci/task/test_unittest.sh", label: 'Testing')
}
}
}
)
}

stage('Model Compilation') {
parallel(
'CUDA': {
9 changes: 6 additions & 3 deletions ci/task/pylint.sh
Original file line number Diff line number Diff line change
@@ -6,9 +6,12 @@ set -x
: ${GPU:="cpu"}
export PYTHONPATH="./python":${PYTHONPATH:-""}

# TVM Unity is a dependency to this testing
pip install --quiet --pre -U -f https://mlc.ai/wheels mlc-ai-nightly
pip install --quiet --pre -U cuda-python
if [[ -n ${MLC_CI_SETUP_DEPS:-} ]]; then
echo "MLC_CI_SETUP_DEPS=1 start setup deps"
# TVM Unity is a dependency to this testing
pip install --quiet --pre -U -f https://mlc.ai/wheels mlc-ai-nightly
pip install --quiet --pre -U cuda-python
fi

pylint --jobs $NUM_THREADS ./python/
pylint --jobs $NUM_THREADS --recursive=y ./tests/python/
10 changes: 10 additions & 0 deletions ci/task/test_unittest.sh
Original file line number Diff line number Diff line change
@@ -2,6 +2,16 @@
set -eo pipefail
set -x

# this scripts only triggers in CI_ENV where these environment variable are passed
if [[ -n ${MLC_CI_SETUP_DEPS:-} ]]; then
echo "MLC_CI_SETUP_DEPS=1 start setup deps.."
# Install dependency
pip install --force-reinstall wheels/*.whl
pip install --quiet pytest
pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly-cu121
export LD_LIBRARY_PATH=/usr/local/cuda/compat/:$LD_LIBRARY_PATH
fi

# run all tests that are categorized as "unittest"
# add pytestmarker = [pytest.mark.unittest] in the test file
# so they will be run here
3 changes: 0 additions & 3 deletions tests/python/conftest.py
Original file line number Diff line number Diff line change
@@ -16,9 +16,6 @@
# under the License.
# pylint: disable=missing-module-docstring,unused-import
import pytest
import tvm.testing

pytest_plugins = ["tvm.testing.plugin"]


def pytest_configure(config):