Skip to content

Backends don't handle exceptions properly during generation #432

@planetf1

Description

@planetf1

Backends don't handle exceptions properly during generation

Description

When generation fails with an exception (e.g., OOM, CUDA error, model failure), backends crash with confusing errors instead of propagating the original exception cleanly.

Root cause: The core framework (base.py:323-331) correctly detects and stores exceptions in the chunk stream during astream() processing. It's designed to re-raise the exception after cleanup (base.py:360-361). However, before it can do that, it passes ALL chunks (including exceptions) to the backend's processing() function. None of the backend implementations check for exception chunks - they all assume chunks are valid response objects and try to access attributes that don't exist on Exception objects.

Affected backends: HuggingFace, Ollama, OpenAI, vLLM, Watsonx, LiteLLM (all 6 backends)

Example Error

When a generation fails with Exception("Oops!"), users see:

AttributeError: 'Exception' object has no attribute 'sequences'
  File "mellea/backends/huggingface.py", line 896, in processing
    chunk.sequences[0, input_ids.shape[1] :], skip_special_tokens=True

Instead of the actual error:

Exception: Oops!

This makes debugging real failures (OOM, CUDA errors, model issues) much harder since the root cause is hidden behind the AttributeError.

Technical Details

The framework's astream() loop processes chunks like this:

  1. Detects exception in chunk stream (base.py:323-327)
  2. Stores it to re-raise later
  3. Calls processing() on ALL chunks including the exception (base.py:329-331) ← crashes here
  4. Never reaches the re-raise logic (base.py:360-361)

Steps to Reproduce

Run test_error_during_generate_with_lock in test/backends/test_huggingface.py. The test expects to see "Oops!" but instead gets "'Exception' object has no attribute 'sequences'".

Expected Behavior

The original exception should propagate cleanly to the caller with its original message intact.

Proposed Fix

Add exception handling to processing() in all backend files to skip exception chunks (they'll be re-raised by the framework). Example for HuggingFace:

async def processing(
    self, mot: ModelOutputThunk, chunk: str | GenerateDecoderOnlyOutput, input_ids
):
    """Process the returned chunks or the complete response."""
    # Skip exception chunks - they'll be re-raised later by the framework
    if isinstance(chunk, Exception):
        return
        
    if mot._underlying_value is None:
        mot._underlying_value = ""
    # ... rest of function

Similar changes needed in:

  • mellea/backends/huggingface.py (line 881)
  • mellea/backends/ollama.py (line 546)
  • mellea/backends/openai.py (line 688)
  • mellea/backends/vllm.py (line 387)
  • mellea/backends/watsonx.py (line 403)
  • mellea/backends/litellm.py (line 360)

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions