[OpenAI] Support text completion via engine.completions.create() #534
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
completions.create()
, as opposed tochat.completions.create()
, is something we have not supported prior to this PR. Compared tochat.completions
,completions
is pure text completion with no conversation. That is, given the user's input prompt, the model autoregressively generates, ignoring any chat template. For more, seeexamples/text-completion
and https://platform.openai.com/docs/api-reference/completions/objectExample usage:
To introduce this, we needed various changes to the internal code structure for code reuse. We split the change into 3 parts.
User-facing changes
engine.completions.create()
in addition toengine.chat.completions.create()
(the behavior of which should not change)WebWorker changes
These are the changes to the interaction between WebWorker and MLCEngine.
completion()
in addition tochatCompletion()
, which are two parallel entry pointscompletionNonStreaming
andcompletionStreamInit
, parallel to theirchatCompletion
counterpartchatCompletionStreamNextChunk
tocompletionStreamNextChunk
, which are shared bycompletion()
andchatCompletion()
workflowLow-level changes
These are the changes revolving
engine.ts
andllm_chat.ts
src/openai_api_protocols/completion.ts
for the completions API, followingopenai-node
completion()
inengine.ts
, which shares the sameasyncGenerate()
(renamed fromchatCompletionAsyncChunkGenerator()
) for streaming, and_generate()
for non-streamingasyncGenerate()
so that both completion and chat completion can be taken care ofisTextCompletion
andprompt
toConversation
, such that it does not get formatted like a conversation inllm_chat.ts
getFunctionCallUsage()
andgetConversationFromChatCompletionRequest()
fromengine.ts
toconversation.ts
since it is not dependent on engineTested
example/get-started
, with both chat completion and text completionexample/streaming
, with both chat completion and text completionexample/get-started-web-worker
, with both chat completion and text completionexample/service-worker
, with both chat completion and text completion