-
Notifications
You must be signed in to change notification settings - Fork 79
Description
Backends don't handle exceptions properly during generation
Description
When generation fails with an exception (e.g., OOM, CUDA error, model failure), backends crash with confusing errors instead of propagating the original exception cleanly.
Root cause: The core framework (base.py:323-331) correctly detects and stores exceptions in the chunk stream during astream() processing. It's designed to re-raise the exception after cleanup (base.py:360-361). However, before it can do that, it passes ALL chunks (including exceptions) to the backend's processing() function. None of the backend implementations check for exception chunks - they all assume chunks are valid response objects and try to access attributes that don't exist on Exception objects.
Affected backends: HuggingFace, Ollama, OpenAI, vLLM, Watsonx, LiteLLM (all 6 backends)
Example Error
When a generation fails with Exception("Oops!"), users see:
AttributeError: 'Exception' object has no attribute 'sequences'
File "mellea/backends/huggingface.py", line 896, in processing
chunk.sequences[0, input_ids.shape[1] :], skip_special_tokens=True
Instead of the actual error:
Exception: Oops!
This makes debugging real failures (OOM, CUDA errors, model issues) much harder since the root cause is hidden behind the AttributeError.
Technical Details
The framework's astream() loop processes chunks like this:
- Detects exception in chunk stream (
base.py:323-327) - Stores it to re-raise later
- Calls
processing()on ALL chunks including the exception (base.py:329-331) ← crashes here - Never reaches the re-raise logic (
base.py:360-361)
Steps to Reproduce
Run test_error_during_generate_with_lock in test/backends/test_huggingface.py. The test expects to see "Oops!" but instead gets "'Exception' object has no attribute 'sequences'".
Expected Behavior
The original exception should propagate cleanly to the caller with its original message intact.
Proposed Fix
Add exception handling to processing() in all backend files to skip exception chunks (they'll be re-raised by the framework). Example for HuggingFace:
async def processing(
self, mot: ModelOutputThunk, chunk: str | GenerateDecoderOnlyOutput, input_ids
):
"""Process the returned chunks or the complete response."""
# Skip exception chunks - they'll be re-raised later by the framework
if isinstance(chunk, Exception):
return
if mot._underlying_value is None:
mot._underlying_value = ""
# ... rest of functionSimilar changes needed in:
mellea/backends/huggingface.py(line 881)mellea/backends/ollama.py(line 546)mellea/backends/openai.py(line 688)mellea/backends/vllm.py(line 387)mellea/backends/watsonx.py(line 403)mellea/backends/litellm.py(line 360)
Additional Context
- Bug introduced in fix: add simple lock to hf generation to prevent using incorrect weights #237 (Dec 2025) when lock mechanism was added to HuggingFace backend
- Affects all real generation failures across all backends, not just test scenarios
- Test currently fails: 35/36 HuggingFace tests pass
- Discovered while validating https://github.com/planetf1/mellea/tree/feat/issue-344 (Granite 4 migration)