Releases: xorbitsai/inference
Releases · xorbitsai/inference
v0.5.1
What's new in 0.5.1 (2023-09-26)
These are the changes in inference v0.5.1.
Enhancements
- ENH: Safe iterate stream of ggml model by @codingl2k1 in #449
- ENH: Skip download if model exists by @aresnow1 in #495
Documentation
- DOC: vLLM by @UranusSeven in #491
Full Changelog: v0.5.0...v0.5.1
v0.5.0
What's new in 0.5.0 (2023-09-22)
These are the changes in inference v0.5.0.
New features
- FEAT: incorporate vLLM by @UranusSeven in #445
- FEAT: add register model page for dashboard by @Bojun-Feng in #420
- FEAT: internlm 20b by @UranusSeven in #486
- FEAT: support glaive coder by @UranusSeven in #490
- FEAT: Support download models from modelscope by @aresnow1 in #475
Enhancements
- ENH: shorten OpenBuddy's desc by @UranusSeven in #471
- ENH: enable vLLM on Linux with cuda by @UranusSeven in #472
- ENH: vLLM engine supports more models by @UranusSeven in #477
- ENH: remove subpool on failure by @UranusSeven in #478
- ENH: support trust_remote_code when launching a model by @UranusSeven in #479
- ENH: vLLM auto tensor parallel by @UranusSeven in #480
Bug fixes
- BUG: llama-cpp version dismatch by @Bojun-Feng in #473
- BUG: incorrect endpoint on host 0.0.0.0 by @UranusSeven in #474
- BUG: prompt style not set as expected on web UI by @UranusSeven in #489
Tests
Documentation
Full Changelog: v0.4.4...v0.5.0
v0.4.4
What's new in 0.4.4 (2023-09-19)
These are the changes in inference v0.4.4.
Bug fixes
- BUG: stop auto download from self-hosted storage for locale zh_CN by @UranusSeven in #465
Full Changelog: v0.4.3...v0.4.4
v0.4.3
v0.4.2
What's new in 0.4.2 (2023-09-15)
These are the changes in inference v0.4.2.
New features
- FEAT: concurrent generation by @codingl2k1 in #417
- FEAT: Support gguf by @aresnow1 in #446
- FEAT: Support OpenBuddy by @codingl2k1 in #444
Enhancements
- ENH: client support desc model by @UranusSeven in #442
- ENH: caching from self-hosted storage by @UranusSeven in #419
- ENH: Assign worker sub pool at runtime instead of pre-allocated by @ChengjieLi28 in #437
- ENH: add benchmark script by @UranusSeven in #451
Bug fixes
- BUG: Fix restful client for embedding models by @aresnow1 in #439
- BUG: cmdline double line breaker by @UranusSeven in #441
- BUG: no error raised on unsupported fmt by @UranusSeven in #443
- BUG: Xinferecen list failed if embedding models are launched by @aresnow1 in #452
Tests
- TST: skip self-hosted storage tests by @UranusSeven in #453
Documentation
- DOC: fix baichuan-2 and make naming consistent by @UranusSeven in #432
- DOC: update hot topics by @UranusSeven in #456
Others
- CI: Fix Windows CI by @codingl2k1 in #440
New Contributors
- @ChengjieLi28 made their first contribution in #437
Full Changelog: v0.4.1...v0.4.2
v0.4.1
What's new in 0.4.1 (2023-09-07)
These are the changes in inference v0.4.1.
Bug fixes
- BUG: Searching in UI results in white screen by @Bojun-Feng in #431
- BUG: Include json in MANIFEST.in by @aresnow1 in #435
Documentation
Full Changelog: v0.4.0...v0.4.1
v0.4.0
What's new in 0.4.0 (2023-09-06)
These are the changes in inference v0.4.0.
New features
- FEAT: Support CodeLlama-Instruct by @jiayini1119 in #414
- FEAT: Add embedding models support by @aresnow1 in #418
- FEAT: Support replica by @codingl2k1 in #410
- FEAT: support baichuan2 by @UranusSeven in #425
Bug fixes
- BUG: cmdline chat duplicates user msg by @UranusSeven in #428
- BUG: llama_cpp model context length by @UranusSeven in #429
Documentation
- DOC: update readme by @UranusSeven in #423
New Contributors
- @codingl2k1 made their first contribution in #410
Full Changelog: v0.3.0...v0.4.0
v0.3.0
What's new in 0.3.0 (2023-09-04)
These are the changes in inference v0.3.0.
Enhancements
- ENH: help message for CLI by @Bojun-Feng in #367
Bug fixes
- BUG: Asking to pad but the tokenizer does not have a padding token by @jiayini1119 in #407
- BUG: empty results for non-stream inference by @UranusSeven in #415
- BUG: Make context_length optional in model family by @Bojun-Feng in #394
Others
- EHN: auto retry download on network errors by @jiayini1119 in #405
- FEAT : Add Model Dashboard by @Bojun-Feng in #366
Full Changelog: v0.2.3...v0.3.0
v0.2.3
What's new in 0.2.3 (2023-08-30)
These are the changes in inference v0.2.3.
Bug fixes
- BUG: fix subprocess log on linux by @UranusSeven in #357
Others
- CHORE: lock llama-cpp-python version by @UranusSeven in #406
Full Changelog: v0.2.2...v0.2.3
v0.2.2
What's new in 0.2.2 (2023-08-25)
These are the changes in inference v0.2.2.
New features
- FEAT: Support Llama-2 PyTorch model by @jiayini1119 in #387
- FEAT: code-llama by @UranusSeven in #402
Enhancements
- ENH: Update max_tokens to 32k by @Bojun-Feng in #386
Bug fixes
- BUG: last token is duplicated by @UranusSeven in #398
Documentation
Others
- fix chatglm params by @Bojun-Feng in #400
Full Changelog: v0.2.1...v0.2.2