Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance LLM Streaming Response Handling and Event System #2266

Merged
merged 30 commits into from
Mar 7, 2025

Conversation

bhancockio
Copy link
Collaborator

@bhancockio bhancockio commented Mar 3, 2025

Proof of work: https://www.loom.com/share/bdf457728ea04336a32867525fbdb818?sid=a3c71c25-cafa-479b-b963-0c6cf73da45c

Description

This PR improves the handling of streaming responses from LLMs in the CrewAI framework, addressing issues with empty responses and enhancing error handling. The changes include:

Core Improvements

  • Enhanced the _handle_streaming_response method to properly extract content from various chunk formats
  • Added robust error handling for streaming responses with fallback to non-streaming
  • Implemented proper handling of empty responses with appropriate default messages
  • Added support for tool calling in streaming mode

Event System Enhancements

  • Added LLMStreamChunkEvent to the event system to track streaming chunks
  • Updated the event listener to handle streaming events
  • Updated documentation to include information about the new event type

Testing

  • Added comprehensive tests for streaming functionality:
    • Basic streaming response handling
    • Tool calling with streaming enabled
    • Fallback mechanism when streaming fails
    • Empty response handling

These changes make the LLM streaming functionality more robust and reliable, especially when dealing with different LLM providers that may have varying response formats.

@joaomdmoura
Copy link
Collaborator

Disclaimer: This review was made by a crew of AI Agents.

Code Review for PR #2266: Initial Stream Working

Overview

This pull request introduces streaming functionality for LLM responses, aiming to enhance the system’s capability to handle real-time data from language models. The changes are significant and involve several files primarily centered around the LLM class, which now includes new methods and event handling for streamed responses.

1. Structural Improvements

LLM Class Organization

To enhance maintainability, it’s suggested to create a dedicated StreamingHandler class to encapsulate streaming logic, making it easier to read and manage:

class StreamingHandler:
    def __init__(self, llm_instance):
        self.llm = llm_instance
        
    def handle_streaming_response(self, params, available_functions=None):
        # Move streaming logic here
        
    def process_chunk(self, chunk):
        # Move chunk processing logic here

Type Definitions

Consolidating type definitions into a separate types.py file would improve code organization:

# types.py
from typing import TypedDict, Optional

class Delta(TypedDict):
    content: Optional[str]
    role: Optional[str]

class StreamingChoices(TypedDict):
    delta: Delta
    index: int
    finish_reason: Optional[str]

2. Error Handling Improvements

Robust Error Handling

Implementing custom exceptions for streaming errors enhances the resilience of the streaming feature:

class StreamingError(Exception):
    pass

def _handle_streaming_response(self, params, available_functions=None):
    try:
        for chunk in litellm.completion(**params):
            if not self._is_valid_chunk(chunk):
                raise StreamingError("Invalid chunk format")
            # Process chunk
    except StreamingError as e:
        logging.error(f"Streaming error: {str(e)}")
        return self._handle_fallback(params)

3. Code Duplication Issues

Redundant Response Handling

There's an opportunity to reduce duplicated code between streaming and non-streaming responses. A common processing method can streamline this:

def _process_llm_response(self, content, call_type):
    """Common response processing logic"""
    self._handle_emit_call_events(content, call_type)
    return content

4. Testing Improvements

Enhanced Test Coverage

Testing should cover more edge cases to ensure reliability in various scenarios:

@pytest.mark.parametrize("error_scenario", [
    "empty_chunk",
    "invalid_chunk_format",
    "network_error"
])
def test_streaming_error_scenarios(error_scenario):
    # Implement tests for specified error scenarios

5. Documentation Suggestions

Improved Inline Documentation

Adding detailed docstrings to new methods will help with future maintenance:

def _handle_streaming_response(
    self, params: Dict[str, Any], available_functions: Optional[Dict[str, Any]] = None
) -> str:
    """
    Handle streaming responses from the LLM.
    
    Args:
        params: Configuration parameters for the LLM call
        available_functions: Dictionary of available tool functions
    
    Returns:
        str: Concatenated response from streaming chunks.
    """

6. Performance Considerations

Memory Efficiency

Using a generator for streaming responses reduces memory overhead:

def stream_response(self, params):
    """Stream response chunks without storing them all in memory"""
    yield from (
        chunk.content
        for chunk in litellm.completion(**params)
        if self._is_valid_chunk(chunk)
    )

7. Security Recommendations

Input Validation

To prevent misuse, robust validation for input parameters related to streaming must be implemented:

def _validate_streaming_params(self, params):
    """Validate streaming-specific parameters"""
    if not isinstance(params.get("stream"), bool):
        raise ValueError("Stream parameter must be boolean")

Overall Recommendations

  1. Create dedicated classes for streaming and tool handling.
  2. Enhance error handling with custom exceptions specific to streaming.
  3. Reduce complexity and redundancy through better abstraction.
  4. Ensure comprehensive input validation for streaming parameters.
  5. Improve memory management when processing streamed data.
  6. Expand testing to include more edge cases for streaming scenarios.
  7. Enhance documentation for clarity and usability.

The overall quality of the code is commendable, and implementing these suggestions will significantly improve maintainability and functionality, enhancing the LLM's streaming capabilities.

@bhancockio bhancockio merged commit a1f35e7 into main Mar 7, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants