Skip to content

Latest commit

 

History

History
60 lines (41 loc) · 2.2 KB

lmdeploy.md

File metadata and controls

60 lines (41 loc) · 2.2 KB

Inference by LMDeploy

English | 简体中文

LMDeploy is an efficient, user-friendly toolkit designed for compressing, deploying, and serving LLM models.

This article primarily highlights the basic usage of LMDeploy. For a comprehensive understanding of the toolkit, we invite you to refer to the tutorials.

Installation

Install lmdeploy with pip (python 3.8+)

pip install lmdeploy

Offline batch inference

With just 4 lines of codes, you can execute batch inference using a list of prompts:

from lmdeploy import pipeline
pipe = pipeline("internlm/internlm2-chat-7b")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

With dynamic ntk, LMDeploy can handle a context length of 200K for InternLM2:

from lmdeploy import pipeline, TurbomindEngineConfig
engine_config = TurbomindEngineConfig(session_len=200000,
                                      rope_scaling_factor=2.0)
pipe = pipeline("internlm/internlm2-chat-7b", backend_engine=engine_config)
gen_config = GenerationConfig(top_p=0.8,
                              top_k=40,
                              temperature=0.8,
                              max_new_tokens=1024)
response = pipe(prompt, gen_config=gen_config)
print(response)

For more information about LMDeploy pipeline usage, please refer to here.

Serving

LMDeploy's api_server enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:

lmdeploy serve api_server internlm/internlm2-chat-7b

The default port of api_server is 23333. After the server is launched, you can communicate with server on terminal through api_client:

lmdeploy serve api_client http://0.0.0.0:23333

Alternatively, you can test the server's APIs oneline through the Swagger UI at http://0.0.0.0:23333. A detailed overview of the API specification is available here.