Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenAI] Support text completion via engine.completions.create() #534

Merged
merged 4 commits into from
Aug 10, 2024

Conversation

CharlieFRuan
Copy link
Contributor

@CharlieFRuan CharlieFRuan commented Aug 10, 2024

completions.create(), as opposed to chat.completions.create(), is something we have not supported prior to this PR. Compared to chat.completions, completions is pure text completion with no conversation. That is, given the user's input prompt, the model autoregressively generates, ignoring any chat template. For more, see examples/text-completion and https://platform.openai.com/docs/api-reference/completions/object

Example usage:

  const engine: webllm.MLCEngineInterface = await webllm.CreateMLCEngine(
   "Llama-3.1-8B-Instruct-q4f32_1-MLC",
  );
  const reply0 = await engine.completions.create({  // as opposed to `chat.completions.create`
    prompt: "List 3 US states: ",  // as opposed to `messages`
  });
  console.log(reply0);

To introduce this, we needed various changes to the internal code structure for code reuse. We split the change into 3 parts.

User-facing changes

  • Only change is user now has access to engine.completions.create() in addition to engine.chat.completions.create() (the behavior of which should not change)

WebWorker changes

These are the changes to the interaction between WebWorker and MLCEngine.

  • Implement completion() in addition to chatCompletion(), which are two parallel entry points
  • Add messages completionNonStreaming and completionStreamInit, parallel to their chatCompletion counterpart
  • Rename message chatCompletionStreamNextChunk to completionStreamNextChunk, which are shared by completion() and chatCompletion() workflow

Low-level changes

These are the changes revolving engine.ts and llm_chat.ts

  • Implement src/openai_api_protocols/completion.ts for the completions API, following openai-node
  • Implement completion() in engine.ts, which shares the same asyncGenerate() (renamed from chatCompletionAsyncChunkGenerator()) for streaming, and _generate() for non-streaming
  • Overload and modify asyncGenerate() so that both completion and chat completion can be taken care of
  • Add fields isTextCompletion and prompt to Conversation, such that it does not get formatted like a conversation in llm_chat.ts
  • Move getFunctionCallUsage() and getConversationFromChatCompletionRequest() from engine.ts to conversation.ts since it is not dependent on engine

Tested

  • WebLLMChat
  • example/get-started, with both chat completion and text completion
  • example/streaming, with both chat completion and text completion
  • example/get-started-web-worker, with both chat completion and text completion
  • example/service-worker, with both chat completion and text completion

@CharlieFRuan CharlieFRuan merged commit 41e786b into mlc-ai:main Aug 10, 2024
1 check passed
CharlieFRuan added a commit that referenced this pull request Aug 10, 2024
### Changes
- Add API `engine.completions()` in addition to
`engine.chat.completions()`:
  - #534
- Reload model when web worker terminated (in addition to service
worker):
  - #533

### TVMjs
- Still compiled at
apache/tvm@1fcb620,
no change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant