Streaming async handler.
Implements the LangChain AsyncCallbackHandler, so it can be notified of new tokens. It also implements the AsyncIterator interface, so it can be used directly to stream back the response.
__init__(enable_print: bool = False, enable_buffer: bool = False)
Whether to ignore agent callbacks.
Whether to ignore chain callbacks.
Whether to ignore chat model callbacks.
Whether to ignore LLM callbacks.
Whether to ignore retriever callbacks.
Whether to ignore retry callbacks.
disable_buffering()
When we disable the buffer, we process the buffer as a chunk.
enable_buffering()
on_chat_model_start(
serialized: Dict[str, Any],
messages: List[List[langchain.schema.messages.BaseMessage]],
run_id: uuid.UUID,
parent_run_id: Optional[uuid.UUID] = None,
tags: Optional[List[str]] = None,
metadata: Optional[Dict[str, Any]] = None,
**kwargs: Any
) → Any
on_llm_end(
response: langchain.schema.output.LLMResult,
run_id: uuid.UUID,
parent_run_id: Optional[uuid.UUID] = None,
tags: Optional[List[str]] = None,
**kwargs: Any
) → None
Run when LLM ends running.
on_llm_new_token(
token: str,
chunk: Optional[langchain.schema.output.GenerationChunk, langchain.schema.output.ChatGenerationChunk] = None,
run_id: uuid.UUID,
parent_run_id: Optional[uuid.UUID] = None,
tags: Optional[List[str]] = None,
**kwargs: Any
) → None
Run on new LLM token. Only available when streaming is enabled.
push_chunk(
chunk: Optional[str, langchain.schema.output.GenerationChunk, langchain.schema.messages.AIMessageChunk]
)
Push a new chunk to the stream.
set_pattern(prefix: Optional[str] = None, suffix: Optional[str] = None)
Sets the patter that is expected.
If a prefix or a suffix are specified, they will be removed from the output.
set_pipe_to(another_handler)
wait()
Waits until the stream finishes and returns the full completion.
wait_top_k_nonempty_lines(k: int)
Waits for top k non-empty lines from the LLM.
When k lines have been received (and k+1 has been started) it will return and remove them from the buffer