[OpenAI] Support text completion via engine.completions.create() #534

CharlieFRuan · 2024-08-10T07:02:30Z

completions.create(), as opposed to chat.completions.create(), is something we have not supported prior to this PR. Compared to chat.completions, completions is pure text completion with no conversation. That is, given the user's input prompt, the model autoregressively generates, ignoring any chat template. For more, see examples/text-completion and https://platform.openai.com/docs/api-reference/completions/object

Example usage:

  const engine: webllm.MLCEngineInterface = await webllm.CreateMLCEngine(
   "Llama-3.1-8B-Instruct-q4f32_1-MLC",
  );
  const reply0 = await engine.completions.create({  // as opposed to `chat.completions.create`
    prompt: "List 3 US states: ",  // as opposed to `messages`
  });
  console.log(reply0);

To introduce this, we needed various changes to the internal code structure for code reuse. We split the change into 3 parts.

User-facing changes

Only change is user now has access to engine.completions.create() in addition to engine.chat.completions.create() (the behavior of which should not change)

WebWorker changes

These are the changes to the interaction between WebWorker and MLCEngine.

Implement completion() in addition to chatCompletion(), which are two parallel entry points
Add messages completionNonStreaming and completionStreamInit, parallel to their chatCompletion counterpart
Rename message chatCompletionStreamNextChunk to completionStreamNextChunk, which are shared by completion() and chatCompletion() workflow

Low-level changes

These are the changes revolving engine.ts and llm_chat.ts

Implement src/openai_api_protocols/completion.ts for the completions API, following openai-node
Implement completion() in engine.ts, which shares the same asyncGenerate() (renamed from chatCompletionAsyncChunkGenerator()) for streaming, and _generate() for non-streaming
Overload and modify asyncGenerate() so that both completion and chat completion can be taken care of
Add fields isTextCompletion and prompt to Conversation, such that it does not get formatted like a conversation in llm_chat.ts
Move getFunctionCallUsage() and getConversationFromChatCompletionRequest() from engine.ts to conversation.ts since it is not dependent on engine

Tested

WebLLMChat
example/get-started, with both chat completion and text completion
example/streaming, with both chat completion and text completion
example/get-started-web-worker, with both chat completion and text completion
example/service-worker, with both chat completion and text completion

### Changes - Add API `engine.completions()` in addition to `engine.chat.completions()`: - #534 - Reload model when web worker terminated (in addition to service worker): - #533 ### TVMjs - Still compiled at apache/tvm@1fcb620, no change

CharlieFRuan added 2 commits August 10, 2024 02:52

[OpenAI] Support text completion via engine.completions()

0494ac8

Add tests for conversation for text completion

98225b2

CharlieFRuan force-pushed the pr-0810-text-completion branch from 359564e to 98225b2 Compare August 10, 2024 07:17

CharlieFRuan added 2 commits August 10, 2024 03:21

Read frequency_penalty and presence_penalty from chat config

bbd52d0

Remove unused imports

09f99a4

CharlieFRuan merged commit 41e786b into mlc-ai:main Aug 10, 2024
1 check passed

CharlieFRuan mentioned this pull request Aug 10, 2024

[Version] Bump version to 0.2.56 #535

Merged

CharlieFRuan mentioned this pull request Aug 10, 2024

Sending raw text to the model #507

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenAI] Support text completion via engine.completions.create() #534

[OpenAI] Support text completion via engine.completions.create() #534

CharlieFRuan commented Aug 10, 2024 •

edited

Loading

[OpenAI] Support text completion via engine.completions.create() #534

[OpenAI] Support text completion via engine.completions.create() #534

Conversation

CharlieFRuan commented Aug 10, 2024 • edited Loading

User-facing changes

WebWorker changes

Low-level changes

Tested

CharlieFRuan commented Aug 10, 2024 •

edited

Loading