Releases: xorbitsai/inference
Releases · xorbitsai/inference
v0.2.1
What's new in 0.2.1 (2023-08-23)
These are the changes in inference v0.2.1.
New features
Enhancements
- ENH: including default context length into model family by @Bojun-Feng in #374
Bug fixes
- BUG: PyTorch generate config max_new_tokens not compatible with RESTful API by @Bojun-Feng in #373
- BUG: llm class match by @UranusSeven in #383
- BUG: return chat model handle by @UranusSeven in #382
- BUG: xinference cache dir doesn't exist by @UranusSeven in #380
New Contributors
Full Changelog: v0.2.0...v0.2.1
v0.2.0
What's new in 0.2.0 (2023-08-19)
These are the changes in inference v0.2.0.
New features
- FEAT:Support Starchat-Beta and StarCoderPlus with Pytorch. by @RayJi01 in #333
- FEAT: Support Ctransformers by @RayJi01 in #289
- FEAT: internlm by @UranusSeven in #352
- FEAT: Support Vicuna-v1.5 and Vicuna-v1.5-16k by @RayJi01 in #343
- FEAT: wizardmath by @UranusSeven in #351
- FEAT: support generate/chat/create_embedding/register/unregister/registrations method in cmdline by @pangyoki in #363
Enhancements
- ENH: Use Llama 2 chat for inference in LangChain QA demo by @jiayini1119 in #324
- ENH: cache from URI by @UranusSeven in #350
- ENH: Update System Prompt for llama-2-chat by @Bojun-Feng in #359
- ENH: RESTful client supports custom model APIs by @jiayini1119 in #360
- BLD: fix readthedocs by @UranusSeven in #340
- BLD: fix readthedocs by @UranusSeven in #342
Bug fixes
- BUG: Chatglm max_length doesn't work by @Bojun-Feng in #349
- BUG: builtin stop_token_ids changes by @UranusSeven in #353
- BUG: custom model related bugs by @UranusSeven in #364
Documentation
- DOC: framework by @UranusSeven in #332
- DOC: models by @UranusSeven in #338
- DOC: fix README.md by @UranusSeven in #354
- DOC: update builtin models by @UranusSeven in #365
Others
- FEAT : Add Model Dashboard by @Bojun-Feng in #334
- Revert "FEAT : Add Model Dashboard" by @UranusSeven in #362
Full Changelog: v0.1.3...v0.2.0
v0.1.3
What's new in 0.1.3 (2023-08-09)
These are the changes in inference v0.1.3.
Enhancements
- ENH: accelerate 4-bit quantization for pytorch model by @pangyoki in #284
- ENH: remove chatglmcpp from deps by @UranusSeven in #329
- ENH: auto detect device in pytorch model by @pangyoki in #322
- ENH: Include model revision by @RayJi01 in #320
Bug fixes
- BUG: fix mps and cuda device detection for pytorch model by @pangyoki in #331
- Bug: Fix grammar mistake in examples by @Bojun-Feng in #336
- BUG: Fix log level on subprocess by @RayJi01 in #335
Documentation
- DOC: fix doc warnings by @UranusSeven in #314
- DOC: add ja_JP and update po files by @UranusSeven in #315
- DOC: custom models by @UranusSeven in #325
Others
Full Changelog: v0.1.2...v0.1.3
v0.1.2
What's new in 0.1.2 (2023-08-04)
These are the changes in inference v0.1.2.
New features
- FEAT: custom model by @UranusSeven in #290
Enhancements
- ENH: select q4_0 as default quantization method for ggmlv3 model in benchmark by @pangyoki in #293
- ENH: disable gradio telemetry by @UranusSeven in #299
Bug fixes
- BUG: llm_family.json encoding by @UranusSeven in #297
- BUG: handle ChatGLM ggml specific case for RESTful API by @jiayini1119 in #309
- BUG: handle Qwen update by @UranusSeven in #307
Others
- DEMO: LangChain QA System with Xinference LLMs and Milvus Vector DB by @jiayini1119 in #304
- Chore: update issue template by @UranusSeven in #300
- Chore: remove codecov by @UranusSeven in #308
Full Changelog: v0.1.1...v0.1.2
v0.1.1
What's new in 0.1.1 (2023-08-03)
These are the changes in inference v0.1.1.
New features
- FEAT: add opt-125m pytorch model and add ut by @pangyoki in #263
- FEAT: support falcon 40b pytorch model by @pangyoki in #278
- FEAT: pytorch model embeddings by @jiayini1119 in #282
- FEAT: support falcon-instruct 7b and 40b pytorch model by @jiayini1119 in #287
- FEAT: support chatglm/chatglm2/chatglm2-32k pytorch model by @pangyoki in #283
- FEAT: support qwen 7b by @UranusSeven in #294
Enhancements
- ENH: Support Enviroment Variable by @RayJi01 in #285
- REF: split supervisor and worker by @UranusSeven in #279
Bug fixes
- BUG: fix import torch error even if user don't want to launch torch model by @pangyoki in #274
- BUG: empty legacy model dir by @UranusSeven in #276
Tests
Documentation
- DOC: Update README_ja_JP.md by @eltociear in #269
- DOC: add docstring to client methods by @RayJi01 in #247
Full Changelog: v0.1.0...v0.1.1
v0.1.0
What's new in 0.1.0 (2023-07-28)
These are the changes in inference v0.1.0.
New features
- FEAT: support fp4 and int8 quantization for pytorch model by @pangyoki in #238
- FEAT: support llama-2-chat-70b ggml by @UranusSeven in #257
Enhancements
- ENH: skip 4-bit quantization for non-linux or non-cuda local deployment by @UranusSeven in #264
- ENH: handle legacy cache by @UranusSeven in #266
- REF: model family by @UranusSeven in #251
Bug fixes
- BUG: fix restful stop parameters by @RayJi01 in #241
- BUG: download integrity hot fix by @RayJi01 in #242
- BUG: disable baichuan-chat and baichuan-base on macos by @pangyoki in #250
- BUG: delete tqdm_class in snapshot_download by @pangyoki in #258
- BUG: ChatGLM Parameter Switch by @Bojun-Feng in #262
- BUG: refresh related fields when format changes by @UranusSeven in #265
- BUG: Show downloading progress in gradio by @aresnow1 in #267
- BUG: LLM json not included by @UranusSeven in #268
Tests
- TST: Update ChatGLM Tests by @Bojun-Feng in #259
Documentation
- DOC: Update installation part in readme by @aresnow1 in #253
- DOC: update readme for pytorch model by @pangyoki in #207
Full Changelog: v0.0.6...v0.1.0
v0.0.6
What's new in 0.0.6 (2023-07-24)
These are the changes in inference v0.0.6.
Enhancements
Bug fixes
- BUG: baichuan-chat and baichuan-base don't support MacOS by @pangyoki in #202
- BUG: fix pytorch model generate bug when stream is True by @pangyoki in #210
- BUG: solve the problem that pytorch model still occupies memory after terminating the model by @pangyoki in #219
- BUG: fix baichuan-chat configure by @pangyoki in #217
- BUG: Update requirements of gradio by @aresnow1 in #216
- BUG: chat stopwords by @UranusSeven in #222
- BUG: disable vicuna pytorch model by @pangyoki in #225
- BUG: Set default embedding to be True by @jiayini1119 in #236
Documentation
- DOC: Add notes for metal GPU acceleration by @aresnow1 in #213
- DOC: Add Japanese README by @eltociear in #228
- DOC: Adding Examples to documentation by @RayJi01 in #196
New Contributors
- @eltociear made their first contribution in #228
Full Changelog: v0.0.5...v0.0.6
v0.0.5
What's new in 0.0.5 (2023-07-19)
These are the changes in inference v0.0.5.
New features
- FEAT: support pytorch models by @pangyoki in #157
- FEAT: support vicuna-v1.3 33B by @Bojun-Feng in #192
- FEAT: support baichuan-chat pytorch model by @pangyoki in #190
- FEAT: pytorch model support MPS backend by @pangyoki in #198
- FEAT: Embedding by @jiayini1119 in #194
- FEAT: LLaMA-2 by @UranusSeven in #203
Enhancements
- ENH: Implement RESTful API stream generate by @jiayini1119 in #171
- ENH: set default device to
mps
on MacOS by @pangyoki in #205 - ENH: Set default mlock to true and mmap to false by @RayJi01 in #206
- ENH: add Gradio ChatInterface chatbot to example by @Bojun-Feng in #208
Bug fixes
- BUG: fix pytorch int8 by @pangyoki in #197
- BUG: RuntimeError when launching model using kwargs whose value is of type int by @jiayini1119 in #209
- BUG: Fix some gradio issues by @aresnow1 in #200
Documentation
- DOC: sphinx init by @UranusSeven in #189
- DOC: chinese readme by @UranusSeven in #191
Full Changelog: v0.0.4...v0.0.5
v0.0.4
What's new in 0.0.4 (2023-07-14)
These are the changes in inference v0.0.4.
New features
- FEAT: implement chat and generate in RESTful client by @jiayini1119 in #161
- FEAT: support wizard-v1.1 by @UranusSeven in #183
Bug fixes
- BUG: fix example chat by @UranusSeven in #165
Documentation
Others
Full Changelog: v0.0.3...v0.0.4