Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support streaming of choice-less chunks #199

Open
adubovik opened this issue Nov 29, 2024 · 0 comments
Open

Support streaming of choice-less chunks #199

adubovik opened this issue Nov 29, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@adubovik
Copy link
Collaborator

adubovik commented Nov 29, 2024

Currenly, all choice-less chunks (such as chunks which report usage and statistics fields) aren't immediately sent into a response stream, but their sending is delayed until a chunk with a choice is encountered.
These choice-less chunks are then merged together and sent along with this choice-having carrier chunk.

Since there is a chance that choice-less chunks will be received from the upstream as the very last chunks, the carrier chunk is chosen as the last choice closing chunk:

if isinstance(chunk, BaseChunk):
is_last_end_choice_chunk = (
isinstance(chunk, EndChoiceChunk)
and chunk.choice_index == self.n - 1
)
is_top_level_chunk = isinstance(
chunk,
(
UsageChunk,
UsagePerModelChunk,
DiscardedMessagesChunk,
),
)
if is_last_end_choice_chunk or is_top_level_chunk:
delayed_chunks.append(chunk)
else:
yield _create_chunk(chunk)

Thus, such choice-less chunks are reported at the very end, which precludes their streaming.

It's desirable to enable streaming of the statistics fields: #15
Imagine an application which calls model A, reports per-model usage for A, then fails to call model B.
Currently, the downstream won't see any per-model usage since they are reporting in the very last chunk.

A similar delaying technique is used in adapter-openai to eliminate choice-less chunks.

Q: Why do we delay sending choice-less chunks in the first place?
A: Because we didn't know how to introduce the missing choices field the best. The possible solutions are:

  1. Add a fake message with an empty string content:
{
  "choices": [{"index": 0, "delta": {"content": ""}}],
  "usage": {
    "prompt_tokens": 1,
    "completion_tokens": 2,
    "total_tokens": 3
  }
}
  1. Add an empty list of choices:
{
  "choices": [],
  "usage": {
    "prompt_tokens": 1,
    "completion_tokens": 2,
    "total_tokens": 3
  }
}

In the case of an empty list of choices, it wasn't clear if it could be parsed correctly downstream.
However, since then OpenAI has introduced stream_option.include_usage feature. It enables the generation of a chunk with an empty list of choices and a non-empty usage field.

Screenshot 2024-11-29 at 16 22 32

This is correctly handled by popular OpenAI clients (openai and langchain libraries).

The proposal is to follow this empty list of choices convention.

TODO: ascertain that openai and langchain will be able to parse chunks with an empty list of choice and statistics field.

@adubovik adubovik added the enhancement New feature or request label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

1 participant