-
Notifications
You must be signed in to change notification settings - Fork 65
feat(rlsapi): implement LLM integration for v1 /infer endpoint #916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdded default-model resolution and a stateless retrieval flow for the /infer endpoint: new helpers fetch configured provider/model, create a temporary Llama Stack agent, send a UserMessage, extract the response (with safety/fallbacks), and map API connection/configuration failures to HTTP 503. Changes
Sequence DiagramsequenceDiagram
actor Client
participant Endpoint as /infer Endpoint
participant Config as Configuration
participant Agent as Temp Agent (Llama Stack)
participant LLM as LLM Service
Client->>Endpoint: POST /infer (question)
Endpoint->>Endpoint: Validate & strip question
Endpoint->>Config: Resolve default model/provider
alt Config failure
Config-->>Endpoint: Error
Endpoint-->>Client: HTTP 503 (ServiceUnavailable)
else Config success
Config-->>Endpoint: provider:model
Endpoint->>Agent: Create temporary agent
Endpoint->>Agent: Send UserMessage(question)
Agent->>LLM: Forward user message
LLM-->>Agent: Return Turn/response
Agent-->>Endpoint: Response content
alt Content present
Endpoint-->>Client: HTTP 200 + LLM response
else Empty content
Endpoint-->>Client: HTTP 200 + UNABLE_TO_PROCESS_RESPONSE
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @major. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Wire up the /infer endpoint to Llama Stack for actual inference: - Handle empty LLM responses with fallback message - Include request_id in all log statements for tracing Signed-off-by: Major Hayden <major@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/unit/app/endpoints/test_rlsapi_v1.py (1)
46-75: Consider usingAsyncMockfor async method mocks.
mock_agent.create_turnis awaited in the implementation, but the mock uses a regularMock. While Python 3.8+ handles this, usingAsyncMockis more explicit and aligns with the coding guidelines.def mock_llm_response_fixture( mocker: MockerFixture, prepare_agent_mocks: AgentFixtures ) -> None: """Mock the LLM integration for successful responses.""" mock_client, mock_agent = prepare_agent_mocks + + # Use AsyncMock for async methods + mock_agent.create_turn = mocker.AsyncMock() # Create mock output message with content mock_output_message = mocker.Mock() mock_output_message.content = "This is a test LLM response." # Create mock turn response mock_turn = mocker.Mock() mock_turn.output_message = mock_output_message mock_turn.steps = [] mock_agent.create_turn.return_value = mock_turnAs per coding guidelines, async methods should use
AsyncMockfor mocking.src/app/endpoints/rlsapi_v1.py (1)
82-119: Consider documentingHTTPExceptionin the Raises section.The function can also raise
HTTPException(503) via_get_default_model_id()if configuration is missing. Adding this to the docstring would provide complete documentation of possible exceptions.Raises: APIConnectionError: If the Llama Stack service is unreachable. + HTTPException: 503 if no model is configured. """
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/app/endpoints/rlsapi_v1.py(4 hunks)tests/unit/app/endpoints/test_rlsapi_v1.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_rlsapi_v1.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_rlsapi_v1.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/rlsapi_v1.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/rlsapi_v1.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/rlsapi_v1.py
🧬 Code graph analysis (2)
tests/unit/app/endpoints/test_rlsapi_v1.py (4)
src/app/endpoints/rlsapi_v1.py (3)
_get_default_model_id(49-79)infer_endpoint(124-182)retrieve_simple_response(82-119)src/configuration.py (3)
configuration(73-77)AppConfig(39-181)inference(134-138)src/utils/suid.py (1)
check_suid(19-54)src/models/rlsapi/requests.py (1)
RlsapiV1InferRequest(123-200)
src/app/endpoints/rlsapi_v1.py (5)
src/client.py (2)
AsyncLlamaStackClientHolder(18-55)get_client(49-55)src/models/config.py (1)
Action(755-808)src/models/rlsapi/requests.py (1)
RlsapiV1InferRequest(123-200)src/utils/endpoints.py (1)
get_temp_agent(387-423)src/utils/types.py (1)
content_to_str(20-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-pr
🔇 Additional comments (12)
tests/unit/app/endpoints/test_rlsapi_v1.py (8)
1-32: LGTM!Imports are well-organized, and the pylint disables are appropriate for testing private functions and fixture-based side effects.
35-43: LGTM!The configuration fixture correctly extends the minimal config and patches it at the point of use.
78-103: LGTM!The fixture correctly simulates an empty LLM response for testing the fallback path. The same
AsyncMocksuggestion from the previous fixture applies here as well.
106-116: LGTM!The fixture properly simulates
APIConnectionErrorfor testing error handling. While the real exception would originate from network operations rather thanget_client(), this approach is valid for unit testing the exception handling flow.
119-163: LGTM!Good test coverage for
_get_default_model_id()with parameterized error cases. The tests properly verify both success and failure paths with appropriate assertions.
166-193: LGTM!Comprehensive async tests for
retrieve_simple_response()covering success, empty output, and error propagation scenarios.
196-279: LGTM!Excellent test coverage for
infer_endpoint()including minimal/full requests, unique request ID generation, 503 error handling, and fallback behavior for empty LLM responses.
282-295: LGTM!Good validation tests ensuring the request model properly rejects empty/whitespace questions and strips whitespace from valid questions.
src/app/endpoints/rlsapi_v1.py (4)
1-35: LGTM!Module setup follows coding guidelines with proper docstring, logger pattern, and organized imports.
38-46: LGTM!The 503 response correctly documents the service unavailability scenario in the OpenAPI schema.
49-79: LGTM!Well-structured helper function with proper error handling, descriptive docstring, and appropriate HTTP 503 responses for configuration issues.
122-182: LGTM!Solid endpoint implementation with:
- Proper async handling and type annotations
- Unique request_id generation for tracing
- Comprehensive error handling (503 for
APIConnectionError)- Fallback for empty LLM responses
- Good logging with request_id context throughout
Signed-off-by: Major Hayden <major@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/app/endpoints/rlsapi_v1.py (1)
49-80: Use ServiceUnavailableResponse for consistency.The function manually constructs HTTPException with a custom detail dict, while
infer_endpointusesServiceUnavailableResponsefor similar 503 errors. This inconsistency makes error handling harder to maintain and may result in different response formats.Apply this diff to use
ServiceUnavailableResponseconsistently:def _get_default_model_id() -> str: """Get the default model ID from configuration. Returns the model identifier in Llama Stack format (provider/model). Returns: The model identifier string. Raises: HTTPException: If no model can be determined from configuration. """ if configuration.inference is None: msg = "No inference configuration available" logger.error(msg) - raise HTTPException( - status_code=503, - detail={"response": "Service configuration error", "cause": msg}, - ) + response = ServiceUnavailableResponse( + backend_name="Configuration", + cause=msg, + ) + raise HTTPException(**response.model_dump()) model_id = configuration.inference.default_model provider_id = configuration.inference.default_provider if model_id and provider_id: return f"{provider_id}/{model_id}" msg = "No default model configured for rlsapi v1 inference" logger.error(msg) - raise HTTPException( - status_code=503, - detail={"response": "Service configuration error", "cause": msg}, - ) + response = ServiceUnavailableResponse( + backend_name="Configuration", + cause=msg, + ) + raise HTTPException(**response.model_dump())
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/app/endpoints/rlsapi_v1.py(4 hunks)tests/unit/app/endpoints/test_rlsapi_v1.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_rlsapi_v1.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_rlsapi_v1.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/rlsapi_v1.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/rlsapi_v1.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/rlsapi_v1.py
🧬 Code graph analysis (1)
tests/unit/app/endpoints/test_rlsapi_v1.py (1)
src/app/endpoints/rlsapi_v1.py (3)
_get_default_model_id(49-79)infer_endpoint(125-183)retrieve_simple_response(82-120)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-pr
🔇 Additional comments (8)
tests/unit/app/endpoints/test_rlsapi_v1.py (5)
3-4: LGTM: Appropriate pylint disables for unit tests.The disables for
protected-accessandunused-argumentare standard for unit tests that need to test private functions and use fixtures.
35-43: LGTM: Well-structured configuration fixture.The fixture properly extends the minimal config with inference defaults and patches it into the module under test.
126-198: LGTM: Comprehensive test coverage for helper functions.The tests cover success paths, error conditions, and edge cases for both
_get_default_model_idandretrieve_simple_response. Good use of parametrization and proper async test markers.
203-299: LGTM: Thorough endpoint and validation testing.The test suite comprehensively covers the endpoint behavior including:
- Minimal and full-context requests
- Unique request ID generation
- API connection error handling (503 response)
- Empty LLM response fallback
- Input validation and whitespace stripping
110-121: Add# type: ignorecomment to APIConnectionError instantiation for type checking consistency.This fixture instantiates
APIConnectionError(request=mocker.Mock())without the# type: ignorecomment that all other similar usages in the codebase include. Add the comment on line 114-115 to align with the pattern used throughout the test suite (e.g.,test_info.py,test_models.py,test_rags.py).⛔ Skipped due to learnings
Learnt from: CR Repo: lightspeed-core/lightspeed-stack PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-11-24T16:58:04.410Z Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration codeLearnt from: CR Repo: lightspeed-core/lightspeed-stack PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-11-24T16:58:04.410Z Learning: Applies to tests/**/*.py : Use `pytest-mock` with AsyncMock objects for mocking in testssrc/app/endpoints/rlsapi_v1.py (3)
8-46: LGTM: Imports and response documentation updated appropriately.The new imports support LLM integration and the
infer_responsesmapping now includes the 503 ServiceUnavailableResponse for API unavailability scenarios.
82-121: LGTM: Well-structured stateless retrieval function.The function properly handles LLM communication with defensive checks for missing attributes. The use of
getattrpreventsAttributeErrorwhen the response structure is incomplete.
123-183: LGTM: Robust endpoint implementation with proper error handling.The endpoint correctly:
- Generates unique request IDs for tracing
- Handles
APIConnectionErrorby returning HTTP 503 withServiceUnavailableResponse- Provides fallback text for empty LLM responses
- Includes comprehensive logging at all stages
As per coding guidelines, the endpoint properly handles
APIConnectionErrorfrom Llama Stack.
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems legit, thank you
|
/ok-to-test |
Description
Wire up the /infer endpoint to Llama Stack for actual inference of RHEL Lightspeed requests:
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
No changes to how tests are run.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.