You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currenly, all choice-less chunks (such as chunks which report usage and statistics fields) aren't immediately sent into a response stream, but their sending is delayed until a chunk with a choice is encountered.
These choice-less chunks are then merged together and sent along with this choice-having carrier chunk.
Since there is a chance that choice-less chunks will be received from the upstream as the very last chunks, the carrier chunk is chosen as the last choice closing chunk:
Thus, such choice-less chunks are reported at the very end, which precludes their streaming.
It's desirable to enable streaming of the statistics fields: #15
Imagine an application which calls model A, reports per-model usage for A, then fails to call model B.
Currently, the downstream won't see any per-model usage since they are reporting in the very last chunk.
A similar delaying technique is used in adapter-openai to eliminate choice-less chunks.
Q: Why do we delay sending choice-less chunks in the first place? A: Because we didn't know how to introduce the missing choices field the best. The possible solutions are:
In the case of an empty list of choices, it wasn't clear if it could be parsed correctly downstream.
However, since then OpenAI has introduced stream_option.include_usagefeature. It enables the generation of a chunk with an empty list of choices and a non-empty usage field.
This is correctly handled by popular OpenAI clients (openai and langchain libraries).
The proposal is to follow this empty list of choices convention.
TODO: ascertain that openai and langchain will be able to parse chunks with an empty list of choice and statistics field.
The text was updated successfully, but these errors were encountered:
Currenly, all choice-less chunks (such as chunks which report
usage
andstatistics
fields) aren't immediately sent into a response stream, but their sending is delayed until a chunk with a choice is encountered.These choice-less chunks are then merged together and sent along with this choice-having carrier chunk.
Since there is a chance that choice-less chunks will be received from the upstream as the very last chunks, the carrier chunk is chosen as the last choice closing chunk:
ai-dial-sdk/aidial_sdk/chat_completion/response.py
Lines 108 to 127 in 8abe579
Thus, such choice-less chunks are reported at the very end, which precludes their streaming.
It's desirable to enable streaming of the statistics fields: #15
Imagine an application which calls model A, reports per-model usage for A, then fails to call model B.
Currently, the downstream won't see any per-model usage since they are reporting in the very last chunk.
A similar delaying technique is used in adapter-openai to eliminate choice-less chunks.
Q: Why do we delay sending choice-less chunks in the first place?
A: Because we didn't know how to introduce the missing
choices
field the best. The possible solutions are:In the case of an empty list of choices, it wasn't clear if it could be parsed correctly downstream.
However, since then OpenAI has introduced
stream_option.include_usage
feature. It enables the generation of a chunk with an empty list of choices and a non-emptyusage
field.This is correctly handled by popular OpenAI clients (
openai
andlangchain
libraries).The proposal is to follow this empty list of choices convention.
TODO: ascertain that
openai
andlangchain
will be able to parse chunks with an empty list of choice andstatistics
field.The text was updated successfully, but these errors were encountered: