-
Notifications
You must be signed in to change notification settings - Fork 95
Add multi-model support with configuration system #3087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Pipeline controller notification For optional jobs, comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: stbenjam The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Note
|
| Cohort / File(s) | Summary |
|---|---|
Env / Examples chat/.env.example, chat/.gitignore, chat/models.yaml.example |
Consolidates LLM config into a unified Sippy AI Agent section with new globals (TEMPERATURE, SIPPY_API_URL, MAX_ITERATIONS, MAX_EXECUTION_TIME, PERSONA), de-emphasizes SIPPY_READ_ONLY_DATABASE_DSN, adds models.yaml ignore and provides chat/models.yaml.example schema and entries. |
Docs chat/README.md |
Adds Claude/Vertex AI setup, authentication options, "Claude Extended Thinking" guidance, and a "Multiple Model Configuration (Optional)" section with examples and server/CLI usage for models.yaml. |
CLI / Bootstrap chat/main.py |
Adds CLI options --google-project, --google-location, --thinking-budget, --models-config; applies overrides to Config (google_project_id, google_location, extended_thinking_budget) and passes models_config_path to the web server. |
Config & Loader chat/sippy_agent/config.py |
Adds ModelConfig (per-model overrides + to_config()), extends Config with google_project_id, google_location, extended_thinking_budget, is_claude_model(), and adds load_models_config() to parse/validate YAML registry and determine default model. |
Agent & Manager chat/sippy_agent/agent.py |
Defers heavy setup via async _initialize(), extends _create_llm() to support ChatAnthropicVertex (Claude) with extended-thinking handling and streaming thinking, emits thinking callbacks, improves tool-loading warnings, and introduces AgentManager to load/cache per-model SippyAgent instances and expose listing/getter APIs. |
API Models chat/sippy_agent/api_models.py |
Adds model_id to ChatRequest and ChatResponse; introduces ModelInfo and ModelsResponse to expose available models and default selection. |
Web Server / Routes chat/sippy_agent/web_server.py |
Replaces single-agent wiring with AgentManager, adds GET /chat/models, resolves per-request agent via model_id (fallback to default), propagates model_id in HTTP/WebSocket/thinking callbacks, and accepts models_config_path in constructor. |
Requirements chat/requirements.txt |
Adds dependencies: langchain-google-vertexai>=3.0.0, anthropic>=0.20.0, and pyyaml>=6.0.0 (and bumps several langchain-related packages). |
Frontend — Store sippy-ng/src/chat/store/modelsSlice.js, sippy-ng/src/chat/store/settingsSlice.js, sippy-ng/src/chat/store/useChatStore.js |
Adds modelsSlice (models, defaultModel, loading/error + loadModels() fetching /api/chat/models), exposes useModels selector, and adds modelId to settings (initial null). |
Frontend — WebSocket sippy-ng/src/chat/store/webSocketSlice.js |
Includes model_id in outgoing user payload and assistant final message; moves adding the user message to history after WS send; preserves per-message error recording. |
Frontend — UI attribution sippy-ng/src/chat/ChatMessage.js |
Resolves modelName via useModels, shows tooltip "AI-generated by {modelName}" when available, replaces static "AI" chip with model name fallback, and adds model_id PropType. |
Frontend — Settings UI sippy-ng/src/chat/ChatSettings.js |
Adds "AI Model" section using useModels, lazy-loads models, renders Select dropdown with model descriptions and icons, updates settings.modelId, and handles loading/error states. |
Sequence Diagram(s)
sequenceDiagram
participant User
participant Frontend
participant ModelsStore as Frontend:Models Store
participant WebServer as Backend:Web Server
participant AgentManager as Backend:Agent Manager
participant Agent as Backend:SippyAgent
participant LLM as Backend:LLM (Vertex AI / Claude)
User->>Frontend: Open chat UI
Frontend->>ModelsStore: loadModels()
ModelsStore->>WebServer: GET /chat/models
WebServer->>AgentManager: list_models()
AgentManager-->>WebServer: {models, default_model}
ModelsStore->>Frontend: update models state
User->>Frontend: Select model + send message
Frontend->>WebServer: POST /chat (model_id, message)
WebServer->>AgentManager: get_agent(model_id)
AgentManager->>Agent: return cached or create + Agent._initialize()
Agent->>Agent: _create_llm() (Claude -> ChatAnthropicVertex)
Agent->>LLM: send message (project/location, thinking budget)
LLM->>Agent: stream thinking & content
Agent->>WebServer: thinking_callback(model_id, partial_thought)
Agent->>WebServer: final response(model_id, content)
WebServer->>Frontend: ChatResponse(model_id, content)
Frontend->>User: render message with model attribution
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~60 minutes
Areas to focus on:
- Claude/Vertex AI initialization, credential handling, project/location validation, and langchain integration.
- Extended thinking logic (budgeting, token limits, temperature adjustments) and streaming correctness.
- Async initialization and AgentManager caching for race conditions under concurrent requests.
- models.yaml parsing/validation (duplicate/default handling) and fallback to env-based config.
- End-to-end propagation of model_id across HTTP, WebSocket, backend responses, and frontend UI.
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Single Responsibility And Clear Naming | AgentManager violates naming guidelines by using generic 'Manager' term. Config class accumulates multiple distinct concerns exceeding single responsibility principle. | Rename AgentManager to specific name like MultiModelAgentRegistry or AgentFactory. Refactor Config class into focused sub-types: GoogleVertexConfig and ExtendedThinkingConfig. |
✅ Passed checks (6 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title accurately and concisely describes the main change: adding multi-model support with a configuration system, which is the primary focus across backend (agents, server, config) and frontend (chat settings, model store) changes. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%. |
| Go Error Handling | ✅ Passed | The custom check for Go Error Handling is not applicable to this pull request as no Go source files are present in the changes. |
| Sql Injection Prevention | ✅ Passed | PR adds multi-model LLM support with configuration management. No SQL operations, database queries, or SQL injection vulnerabilities detected in modified files. |
| Excessive Css In React Should Use Styles | ✅ Passed | React components use Material-UI and utility classes rather than large inline style objects, adhering to styling guidelines. |
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
✨ Finishing touches
- 📝 Generate docstrings
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Post copyable unit tests in a comment
Comment @coderabbitai help to get the list of available commands and usage tips.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
chat/sippy_agent/web_server.py (2)
226-261: Avoid mutating shared agent config across concurrent requests
AgentManagercaches a singleSippyAgentper model, but both the HTTP and WebSocket paths mutateagent.config.show_thinking,agent.config.persona, and even rebuildagent.graphon that shared instance. With concurrent requests targeting the same model, those mutations race: persona or thinking overrides from request A can leak into request B before eitherfinallyblock restores the original settings. A third request arriving in that window will run with the wrong persona/model behavior. Please isolate per-request overrides (e.g., clone the agent/config, add an agent-level async lock around these mutations, or extendSippyAgent.achatto accept override args so we never touch shared state).Also applies to: 330-475
153-163: Return status from the actual default agent config
/statusnow resolves the default agent viaAgentManager, but the response still reportsmodel_name,endpoint,show_thinking, andpersonafrom the baseConfig, which will be wrong whenever the default model comes frommodels.yaml. Surface the values fromdefault_agent.configinstead so the endpoint reflects the real model in use.- return AgentStatus( - available_tools=default_agent.list_tools(), - model_name=self.config.model_name, - endpoint=self.config.llm_endpoint, - thinking_enabled=self.config.show_thinking, - current_persona=self.config.persona, + agent_config = default_agent.config + return AgentStatus( + available_tools=default_agent.list_tools(), + model_name=agent_config.model_name, + endpoint=agent_config.llm_endpoint, + thinking_enabled=agent_config.show_thinking, + current_persona=agent_config.persona,
🧹 Nitpick comments (6)
chat/.gitignore (1)
142-145: LGTM!Correctly ignores the instance-specific
models.yamlconfiguration file while keeping the example template in version control.Minor: Consider using a single blank line instead of two (lines 142-143) for consistency with the rest of the file, though this is purely stylistic.
chat/requirements.txt (1)
5-13: Dependencies align with multi-model support.The three new dependencies correctly support Claude via Vertex AI and YAML-based configuration.
Consider tightening the
anthropic>=0.20.0constraint toanthropic>=0.20.0,<1.0.0to avoid potential breaking changes in major version updates, as the Anthropic SDK has had breaking API changes in the past.chat/.env.example (1)
1-92: Consider documenting the models.yaml approach in comments.The restructured configuration is well-organized and clearly documents different model provider options. However, since this PR's main feature is the multi-model
models.yamlconfiguration system, consider adding a comment at the top mentioning that:
- These environment variables provide defaults/fallbacks when
models.yamlisn't present- For multi-model support, users should create a
models.yamlfile (seemodels.yaml.example)Additionally, line 66's
SIPPY_API_URLdefaults to production. Consider whether this should be commented out or point to localhost to avoid accidental production API calls during development.sippy-ng/src/chat/store/webSocketSlice.js (1)
225-246: Review the UX implications of delayed message display.The reordering makes sense for chat history management—the current message shouldn't be included in its own history context. However, this creates a delay where the user's message only appears in the UI after the network send completes.
Traditional chat UX typically shows the user's message immediately (optimistic UI) and displays an error if the send fails. The current approach may feel less responsive since users won't see their message until after the network round-trip.
Consider:
- Testing the perceived responsiveness with this change
- If the delay is noticeable, consider adding the message optimistically before send, then removing/marking it as failed if send fails
- The current approach does guarantee consistency between what the user sees and what was actually sent
sippy-ng/src/chat/store/modelsSlice.js (1)
12-49: Consider adding concurrent call protection.The
loadModelsfunction correctly fetches models and sets the default, but doesn't protect against concurrent calls. IfloadModelsis called multiple times in quick succession (e.g., on reconnect or user navigation), multiple fetches could race and cause state inconsistencies.Consider one of these approaches:
- Track in-flight requests and return early if already loading
- Abort previous requests using
AbortControllerwhen a new request starts- Ensure the UI only calls
loadModelsonce on mount/initializationExample with early return:
loadModels: () => { + // Skip if already loading + if (get().modelsLoading) { + return + } + const apiUrl = process.env.REACT_APP_CHAT_API_URL || window.location.origin + '/api/chat'chat/sippy_agent/agent.py (1)
708-716: Drop the redundant f-string
description=f"Model from environment configuration"has no interpolation. Please remove thefprefix for clarity.- description=f"Model from environment configuration", + description="Model from environment configuration",
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (16)
chat/.env.example(1 hunks)chat/.gitignore(1 hunks)chat/README.md(2 hunks)chat/main.py(3 hunks)chat/models.yaml.example(1 hunks)chat/requirements.txt(1 hunks)chat/sippy_agent/agent.py(8 hunks)chat/sippy_agent/api_models.py(3 hunks)chat/sippy_agent/config.py(5 hunks)chat/sippy_agent/web_server.py(10 hunks)sippy-ng/src/chat/ChatMessage.js(5 hunks)sippy-ng/src/chat/ChatSettings.js(6 hunks)sippy-ng/src/chat/store/modelsSlice.js(1 hunks)sippy-ng/src/chat/store/settingsSlice.js(1 hunks)sippy-ng/src/chat/store/useChatStore.js(3 hunks)sippy-ng/src/chat/store/webSocketSlice.js(4 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
chat/README.md
46-46: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.3)
chat/sippy_agent/agent.py
710-710: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (7)
chat/README.md (1)
46-162: Excellent documentation of new multi-model features.The documentation comprehensively covers:
- Claude/Vertex AI authentication options
- Extended thinking feature with important caveats (temperature=1.0 requirement, regional availability)
- Multi-model configuration via
models.yaml- Clear usage examples
The explanations are clear and include important warnings about limitations and requirements.
chat/models.yaml.example (1)
16-41: Well-structured model configuration examples.The configuration examples are clear and demonstrate key features:
- Multiple models from different providers
- Default model selection
- Thinking-enabled variant with required
temperature: 1.0- Optional fields (description, temperature, extended_thinking_budget)
Ensure that the configuration loading code (likely in
chat/sippy_agent/config.py) validates that only one model hasdefault: true. The YAML format itself doesn't enforce this constraint.sippy-ng/src/chat/store/settingsSlice.js (1)
10-10: LGTM!Clean addition of
modelIdto the settings state. The comment clearly indicates the initialization strategy, and this integrates well with themodelsSlicewhich loads and sets the default model.sippy-ng/src/chat/store/webSocketSlice.js (2)
55-55: LGTM!Adding
model_idto the assistant message metadata enables proper model attribution in the UI, aligning with the backend's multi-model support.
234-234: LGTM!Correctly includes
model_idin the outgoing payload, enabling the backend to use the user-selected model.sippy-ng/src/chat/store/modelsSlice.js (2)
13-15: Base URL construction handles edge cases correctly.The chained
.replace()calls properly handle both trailing slashes and the/streamsuffix.
33-37: Good UX: auto-setting default model.Automatically setting the default model when the user hasn't selected one provides a smooth initial experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
sippy-ng/src/chat/ChatSettings.js (1)
339-417: Extract duplicated model resolution logic.The model ID resolution logic is duplicated between the Select value (lines 362-376) and the description rendering (lines 389-414). This violates the DRY principle and creates a maintenance risk if one copy is updated but not the other.
Extract the resolution logic into a helper function as suggested in the past review:
+ const resolveSelectedModelId = () => { + if (models.length === 0) { + return '' + } + + if (settings.modelId && models.find((m) => m.id === settings.modelId)) { + return settings.modelId + } + + if (defaultModel && models.find((m) => m.id === defaultModel)) { + return defaultModel + } + + return models[0].id + } + + const selectedModelId = resolveSelectedModelId() + const selectedModel = models.find((m) => m.id === selectedModelId) || nullThen update the render to use these computed values:
<Select labelId="model-select-label" - value={ - (() => { - // Resolve the effective model ID before rendering - if ( - settings.modelId && - models.find((m) => m.id === settings.modelId) - ) { - return settings.modelId - } - if (defaultModel && models.find((m) => m.id === defaultModel)) { - return defaultModel - } - return models.length > 0 ? models[0].id : '' - })() - } + value={selectedModelId} onChange={handleModelChange} ... - {(() => { - // Resolve model ID and look up the selected model once - const resolvedModelId = - settings.modelId && models.find((m) => m.id === settings.modelId) - ? settings.modelId - : defaultModel && models.find((m) => m.id === defaultModel) - ? defaultModel - : models.length > 0 - ? models[0].id - : null - - const selectedModel = resolvedModelId - ? models.find((m) => m.id === resolvedModelId) - : null - - return ( - selectedModel && - selectedModel.description && ( - <Box className={classes.personaDescription}> - <Typography variant="body2" color="textPrimary"> - {selectedModel.description} - </Typography> - </Box> - ) - ) - })()} + {selectedModel?.description && ( + <Box className={classes.personaDescription}> + <Typography variant="body2" color="textPrimary"> + {selectedModel.description} + </Typography> + </Box> + )}
🧹 Nitpick comments (2)
sippy-ng/src/chat/ChatSettings.js (2)
187-189: Remove unused helper or refactor to use it.The
getSelectedModel()function is defined but never used in the render. The Model Selection UI (lines 362-376 and 389-414) uses inline resolution logic instead.Either remove this unused function or refactor the duplicated resolution logic to use a proper helper (see next comment).
- const getSelectedModel = () => { - return models.find((m) => m.id === settings.modelId) || models[0] - }
358-358: Consider renaming the reused style class.The model FormControl reuses
classes.personaSelect, which works functionally but is slightly misleading. Consider renaming it to a more generic name likeselectControlor adding a separatemodelSelectclass for clarity.This is a minor naming issue and can be deferred, but it would improve code clarity.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (3)
chat/README.md(3 hunks)chat/sippy_agent/config.py(5 hunks)sippy-ng/src/chat/ChatSettings.js(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- chat/sippy_agent/config.py
🔇 Additional comments (8)
chat/README.md (4)
46-46: Previous markdown issue resolved.The MD036 violation flagged in the prior review (bold emphasis instead of heading) has been corrected. Line 46 now uses a proper level-4 heading (
#### Optional: Claude Models via Google Vertex AI), which is appropriate as a sub-section within the level-3 "Configuration" section.
46-92: Comprehensive Claude/Vertex AI integration documentation.The new section clearly documents setup requirements, authentication options (gcloud vs service account), extended thinking configuration, budget controls, and important caveats around temperature and regional availability. The examples with
--thinking-budgetand error-handling guidance are particularly helpful.
94-128: Well-structured models.yaml configuration documentation.The section clearly explains the YAML configuration format, lists all relevant options (id, name, model_name, endpoint, temperature, extended_thinking_budget, default), and documents the fallback behavior when models.yaml is absent. The auto-loading behavior (line 128) is valuable context for users.
159-162: Clear example coverage for Claude/Vertex AI usage.The new example at line 161 (
python main.py serve --model claude-3-5-sonnet@20240620) complements the existing OpenAI and Gemini examples, making multi-model support discoverable to users.sippy-ng/src/chat/ChatSettings.js (4)
27-27: LGTM - Imports follow existing patterns.The new imports for
ModelIconanduseModelsare consistent with the existing persona implementation.Also applies to: 40-40
114-115: LGTM - Consistent with persona loading pattern.The
useModelshook usage follows the same pattern asusePersonas, including loading state, error state, and load action.
135-139: LGTM - Correct on-demand loading.The effect correctly loads models on demand using the same pattern as persona loading.
177-181: LGTM - Handler matches persona pattern.The model change handler correctly updates
settings.modelId, consistent with the persona handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
sippy-ng/src/chat/ChatSettings.js (1)
298-349: LGTM: AI Model UI is well-structured with proper state handling.The model selection section correctly handles loading, error, and active states. The Select is properly bound to the resolved model ID, ensuring synchronization between the dropdown and description.
Optional: Consider simplifying the description rendering to match the persona pattern.
The IIFE pattern (lines 334-346) works correctly but is more verbose than necessary. For consistency with the persona section (line 389), consider this refactor:
- {(() => { - const selectedModel = getSelectedModel() - return ( - selectedModel && - selectedModel.description && ( - <Box className={classes.personaDescription}> - <Typography variant="body2" color="textPrimary"> - {selectedModel.description} - </Typography> - </Box> - ) - ) - })()} + {getSelectedModel()?.description && ( + <Box className={classes.personaDescription}> + <Typography variant="body2" color="textPrimary"> + {getSelectedModel().description} + </Typography> + </Box> + )}This matches the persona rendering style and eliminates the IIFE wrapper.
chat/sippy_agent/config.py (2)
28-42: Clarify the endpoint handling logic.Line 34 uses
if self.endpointto check whether to override the endpoint. Sinceendpointdefaults to""(empty string for Vertex AI), this condition is falsy for Vertex AI models, causing them to fall back tobase_config.llm_endpoint. While this may be intentional (Vertex AI doesn't use OpenAI-compatible endpoints), the logic would be clearer with an explicit empty-string check.Consider making the intent more explicit:
- config_dict["llm_endpoint"] = self.endpoint if self.endpoint else base_config.llm_endpoint + config_dict["llm_endpoint"] = self.endpoint if self.endpoint != "" else base_config.llm_endpointAlternatively, document that empty string means "use base endpoint" or make endpoint Optional with None as the sentinel value.
174-225: Consider preserving exception context in error handling.The function has excellent validation logic and structure. However, the broad
except Exception as ecatch on line 224 wraps all errors inValueError, which can hide useful debugging information from different exception types (e.g.,yaml.YAMLError,ValidationError).Consider preserving the exception chain:
except Exception as e: - raise ValueError(f"Error loading models configuration: {e}") + raise ValueError(f"Error loading models configuration: {e}") from eThis maintains the original traceback while still providing a clear high-level error message.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (3)
chat/README.md(3 hunks)chat/sippy_agent/config.py(5 hunks)sippy-ng/src/chat/ChatSettings.js(8 hunks)
🔇 Additional comments (14)
chat/README.md (3)
36-93: Documentation update is well-structured and comprehensive.The additions for Claude/Vertex AI setup, extended thinking configuration, and multi-model support are clearly organized, with options for both local development (
gcloud auth) and production (service account). The past markdown linting issue has been properly resolved with heading-level formatting.
94-129: Multi-model configuration section clearly documents new feature.The
models.yamlschema documentation, backward-compatibility note (line 120), and auto-loading behavior (line 128) provide good context for users. The distinction between per-model configuration and shared environment variables is clear.
145-162: Usage examples cover key scenarios effectively.Examples progress logically from OpenAI, Gemini, to Claude/Vertex AI models, maintaining consistent command-line syntax. The final example (lines 160–161) demonstrates multi-model support integration.
sippy-ng/src/chat/ChatSettings.js (6)
27-27: LGTM: Icon and hook imports are correct.The ModelIcon and useModels additions properly support the new AI model selection feature.
Also applies to: 40-40
91-93: LGTM: Consistent styling applied.The fullWidthSelect style properly ensures consistent width for both model and persona selectors.
Also applies to: 317-317, 372-372
114-115: LGTM: Model loading logic mirrors personas pattern.The useModels hook integration and lazy-loading useEffect are implemented correctly and consistently with the existing personas approach.
Also applies to: 135-139
177-181: LGTM: Model change handler is correct.The handleModelChange implementation properly updates settings when the user selects a different model.
187-202: LGTM: Fallback logic correctly resolves model selection.The helper functions properly implement a three-tier fallback (user setting → default → first available) and handle all edge cases including empty model lists and stale settings. This addresses the previous review concern about Select/description synchronization.
351-351: LGTM: Section divider maintains visual consistency.The divider properly separates the AI Model and AI Persona sections, consistent with the rest of the settings drawer.
chat/sippy_agent/config.py (5)
6-8: LGTM!The new imports are appropriate for the multi-model configuration functionality.
16-27: LGTM!The
ModelConfigfields are well-defined. Thetemperatureandextended_thinking_budgetare correctly declared asOptionalwithdefault=None, which properly addresses the past review concern about clobbering base configuration defaults.
66-74: LGTM!The new Config fields for Google/Vertex AI integration are well-defined with appropriate defaults and environment variable bindings.
Also applies to: 119-122
138-140: LGTM!The
is_claude_model()method follows the established pattern and correctly identifies Claude models.
157-164: LGTM!The validation for Claude/Vertex AI configuration requirements is appropriate and provides clear error messaging.
|
Scheduling required tests: |
- Add support for multiple LLM models including ChatVertexAnthropic - Implement models.yaml configuration system for model management - Add model selection UI in chat settings - Update environment configuration and documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- ChatSettings.js: Resolve effective model ID before rendering to prevent unbound Select value warnings and keep picker/description in sync - config.py: Make ModelConfig temperature and extended_thinking_budget optional to avoid clobbering environment/CLI defaults in to_config - README.md: Replace bold emphasis with proper Markdown headings for better document structure and accessibility - Extract duplicated model resolution into getResolvedModelId() helper - Update getSelectedModel() to use the resolution helper - Rename personaSelect to fullWidthSelect for semantic clarity - Move AI Model section above AI Persona (more important) Addresses additional CodeRabbitAI feedback for DRY principle and code clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
chat/sippy_agent/web_server.py (1)
218-265: Shared agent config mutation is not request‑safe (pre‑existing, now per‑model)
chat()mutatesagent.config.show_thinkingandagent.config.personaper request, then restores them infinally. Because the sameSippyAgentinstance is reused across concurrent HTTP requests for a model, overlapping requests can race on these fields, leading to cross‑request leakage of persona or thinking settings (this pattern already existed with the single global agent;AgentManagernow applies it per model). To make overrides request‑local, avoid mutating shared config and instead pass persona/show‑thinking as parameters into the graph/LLM layer, or gate access to a given agent behind anasyncio.Lockso only one request at a time can use it. The former keeps concurrency, the latter preserves current behavior but serializes usage.
🧹 Nitpick comments (4)
sippy-ng/src/chat/ChatSettings.js (2)
187-202: Consider memoizing model resolution helpers.The helper functions
getResolvedModelId()andgetSelectedModel()are called inline during render (lines 321, 335), causing them to re-execute on every render. For better performance, consider computing the resolved model ID once outside the render path.Example approach using useMemo:
+ const resolvedModelId = React.useMemo(() => { + if (settings.modelId && models.find((m) => m.id === settings.modelId)) { + return settings.modelId + } + if (defaultModel && models.find((m) => m.id === defaultModel)) { + return defaultModel + } + return models.length > 0 ? models[0].id : '' + }, [settings.modelId, defaultModel, models]) + + const selectedModel = React.useMemo(() => { + return resolvedModelId ? models.find((m) => m.id === resolvedModelId) : null + }, [resolvedModelId, models])
334-346: Simplify description rendering.The IIFE pattern here is unconventional and adds cognitive overhead. Since
getSelectedModel()already handles the null case, you can simplify this block.- {(() => { - const selectedModel = getSelectedModel() - return ( - selectedModel && - selectedModel.description && ( - <Box className={classes.personaDescription}> - <Typography variant="body2" color="textPrimary"> - {selectedModel.description} - </Typography> - </Box> - ) - ) - })()} + {getSelectedModel()?.description && ( + <Box className={classes.personaDescription}> + <Typography variant="body2" color="textPrimary"> + {getSelectedModel().description} + </Typography> + </Box> + )}Note: If you implement the memoization suggestion from the previous comment, you would use the memoized
selectedModelvariable instead of callinggetSelectedModel()multiple times.chat/sippy_agent/web_server.py (1)
17-31: Multi‑model wiring and /chat/models endpoint look coherentThe switch to
AgentManagerplus the/chat/modelsendpoint is wired cleanly: models are listed viaAgentManager.list_models(), wrapped intoModelInfo, and the default model ID is exposed viaModelsResponse. Usingagent_manager.get_agent()in/statusensures tools are initialized before callinglist_tools(). One minor design question:metrics.agent_infostill usesconfig.model_name/config.llm_endpoint, which may not reflect the default model oncemodels.yamlis in use. Consider either documenting that these metrics refer to the base config only, or updating them to reflectget_default_model_id()/its config if that’s what you care about operationally.Also applies to: 88-112, 139-147, 153-163
chat/sippy_agent/agent.py (1)
40-47: Async initialization pattern looks good;list_toolsguard prevents misuseDeferring tools/graph creation into
_initialize()and calling it from bothAgentManager.get_agent()andSippyAgent.achat()gives you lazy startup while keeping callers safe. The new_initializedcheck plus thelist_tools()guard (returning[]when uninitialized) avoids attribute errors on status calls. If you ever expect heavy parallel startup, you may want anasyncio.Lockaround the_initializebody to avoid duplicate tool/graph construction, but that’s an optimization rather than a correctness issue.Also applies to: 48-54, 670-675
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (16)
chat/.env.example(1 hunks)chat/.gitignore(1 hunks)chat/README.md(3 hunks)chat/main.py(3 hunks)chat/models.yaml.example(1 hunks)chat/requirements.txt(1 hunks)chat/sippy_agent/agent.py(8 hunks)chat/sippy_agent/api_models.py(3 hunks)chat/sippy_agent/config.py(5 hunks)chat/sippy_agent/web_server.py(10 hunks)sippy-ng/src/chat/ChatMessage.js(5 hunks)sippy-ng/src/chat/ChatSettings.js(8 hunks)sippy-ng/src/chat/store/modelsSlice.js(1 hunks)sippy-ng/src/chat/store/settingsSlice.js(1 hunks)sippy-ng/src/chat/store/useChatStore.js(3 hunks)sippy-ng/src/chat/store/webSocketSlice.js(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- sippy-ng/src/chat/store/modelsSlice.js
- sippy-ng/src/chat/store/settingsSlice.js
- sippy-ng/src/chat/store/webSocketSlice.js
- sippy-ng/src/chat/ChatMessage.js
- chat/.gitignore
- chat/requirements.txt
🧰 Additional context used
🪛 Ruff (0.14.5)
chat/sippy_agent/agent.py
711-711: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (4)
chat/models.yaml.example (1)
34-46: No changes needed. Model version is current and temperature constraint is properly enforced.The claude-sonnet-4-5@20250929 version is correct and current as of November 2025. The temperature constraint concern is mitigated by the agent initialization logic in
chat/sippy_agent/agent.py(line 80-86): extended thinking is only enabled whenshow_thinking=TrueANDextended_thinking_budget > 0, and when enabled, temperature is automatically set to 1.0. This prevents runtime errors and ensures the API requirement is satisfied.chat/sippy_agent/agent.py (3)
176-204: Tool setup change (logging when DB DSN missing) is reasonableEmitting a warning when
SIPPY_READ_ONLY_DATABASE_DSNis not configured makes the absence of the DB tool explicit without changing behavior when it is present. This is a nice observability improvement and should help diagnose missing configuration in lower envs.
485-555: ---Review comment contains incorrect assumptions about thinking content stream format
The original review assumes thinking content arrives as
chunk.contentparts with{"type": "thinking", "thinking": "..."}for both Gemini and Claude. This is not accurate per current LangChain documentation:
- Gemini: Streams thinking via parts marked with a
part.thoughtboolean flag, not atype: "thinking"key with a separatethinkingfield- Claude-on-Vertex (extended thinking): Uses
reasoning_contentblocks, nottype: "thinking"with athinkingfieldThe code at lines 485–555 checks
part.get("type") == "thinking" and "thinking" in part, which won't match either model's actual format. This logic requires correction to extract thinking content successfully for these models.Likely an incorrect or invalid review comment.
55-116: Concern unfounded: ChatAnthropicVertex supports per-call thinking overridesThe review comment's concern about extended thinking being "fixed for the lifetime of the agent" is not accurate. ChatAnthropicVertex exposes thinking options (thinking_budget, include_thoughts, etc.) that can be passed per-invocation via kwargs/invoke/predict/predict_messages. This means per-request control over extended thinking is available without needing separate agents or LLMs. The thinking configuration set at construction time serves as a default, but individual calls can override these settings, making the suggested workarounds unnecessary.
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
chat/sippy_agent/agent.py (1)
725-737: Fix Ruff F541 on synthetic model description; optional concurrency hardening inget_agentThe synthetic
ModelConfiguses an f-string without interpolation, which Ruff flags as F541. You can resolve this by dropping thefprefix as previously suggested in the earlier review. This also keeps CI green. As this repeats an earlier comment, tagging as duplicate.- synthetic_model = ModelConfig( - id="default", - name=base_config.model_name, - description=f"Model from environment configuration", - model_name=base_config.model_name, - endpoint=base_config.llm_endpoint, - temperature=base_config.temperature, - extended_thinking_budget=base_config.extended_thinking_budget, - default=True, - ) + synthetic_model = ModelConfig( + id="default", + name=base_config.model_name, + description="Model from environment configuration", + model_name=base_config.model_name, + endpoint=base_config.llm_endpoint, + temperature=base_config.temperature, + extended_thinking_budget=base_config.extended_thinking_budget, + default=True, + )Separately,
AgentManager.get_agent()is straightforward and readable, but if you expect high concurrency it might be worth guarding agent creation with a simple async lock or similar so two concurrent callers for the samemodel_iddon’t both pay the initialization cost before the cache is populated. Not required for correctness, just an efficiency/robustness improvement. (As per static analysis hints and past review comments.)Also applies to: 756-793
🧹 Nitpick comments (3)
chat/sippy_agent/agent.py (2)
40-55: Async lazy initialization looks good; consider guardingadd_toolusageThe
_initializedflag and_initialize()give you a clean lazy-init path for tools/graph, andlist_tools()now safely returns[]when uninitialized. The only remaining sharp edge isadd_tool(), which assumesself.toolsis non‑None; if someone calls it before the firstachat()/get_agent(), it will hit aNoneTypeappend. Consider either documenting thatadd_toolrequires a fully initialized agent or adding a defensive check there (e.g., raising a clear error if_initializedisFalse), so misuse fails loudly rather than with an attribute error.Also applies to: 678-691
494-571: Thought streaming logic cleanly separates Gemini vs Claude; small optional guardThe updated
_achat_streaming()correctly distinguishes Gemini’s complete-thought chunks from Claude’s token stream, buffers thoughts, and then injects each as a separatethinking_stepsentry at the front, preserving chronological order. As a micro‑optimization, you could early‑exit the “thinking” parsing paths when boththinking_callbackisNoneandshow_thinkingisFalse, since in that case the accumulatedthought_bufferis never surfaced. Not a blocker, just a small potential perf/clarity win.Also applies to: 661-668
chat/requirements.txt (1)
1-8: I need to verify the LangChain ecosystem's versioning practices and compatibility recommendations to determine if the upper-bound constraint concern is justified.Upper-bound version constraints are optional best practice, not required.
LangChain and LangGraph follow Semantic Versioning principles, with the first stable releases (1.0.0) providing production-ready APIs. These 1.0 releases mark a commitment to stability and no breaking changes until 2.0. The pip dependency resolution test confirmed no conflicts among the specified versions.
While adding upper-bound constraints (e.g.,
langchain>=1.0.0,<2.0.0) would provide ultra-conservative stability and prevent unexpected major-version jumps, they are not required for safe operation. The current lower-bound-only approach is defensible given that:
- With LangChain 1.0, minor releases do not include breaking changes
- All core packages (langchain, langchain-core, langchain-openai, langgraph) are at 1.0+ with semantic versioning guarantees
- Legacy versions (LangChain 0.3 and LangGraph 0.4) are in MAINTENANCE mode until December 2026
Adding upper bounds is a valid optimization for reproducibility and risk-averse deployments but remains optional given the ecosystem's versioning commitments. Consider upper bounds only if: (a) stricter repeatability is required, or (b) you want to enforce explicit major-version control across your team's deployments.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
chat/requirements.txt(1 hunks)chat/sippy_agent/agent.py(8 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
chat/sippy_agent/agent.py
728-728: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (3)
chat/sippy_agent/agent.py (2)
197-203: DB tool warning improves observabilityAdding an explicit warning when
SIPPY_READ_ONLY_DATABASE_DSNis not configured is a nice touch; it makes it much clearer why the DB query tool is missing instead of silently omitting it.
457-459: Ensuring_initialize()runs before handling chats is the right safeguardAwaiting
self._initialize()at the top ofachat()guarantees tools and graph are ready before streaming, and because_initialize()is idempotent, repeated calls are cheap. This aligns well with the new lazy-init pattern.chat/requirements.txt (1)
5-5: Let me check the pyyaml security status to complete the verification:Dependency versions are current, stable, and secure.
PyYAML 6.0.3 is the latest non-vulnerable version, fixing vulnerabilities that existed before version 5.4. Anthropic 0.20.0 has no known vulnerabilities, and the latest stable anthropic release is 0.73.0. PyYAML demonstrates a positive version release cadence with recent releases. All three dependencies use permissive
>=version constraints, allowing automatic updates to the latest secure releases. The specified versions are current as of the PR's timeline and free from known security vulnerabilities.
| def _create_llm(self) -> Union[ChatOpenAI, ChatGoogleGenerativeAI, ChatAnthropicVertex]: | ||
| """Create the language model instance.""" | ||
| if self.config.verbose: | ||
| logger.info(f"Creating LLM with endpoint: {self.config.llm_endpoint}") | ||
| logger.info(f"Using model: {self.config.model_name}") | ||
|
|
||
| # Use ChatAnthropicVertex for Claude models via Vertex AI | ||
| if self.config.is_claude_model(): | ||
| if not self.config.google_project_id: | ||
| raise ValueError( | ||
| "Google Cloud project ID is required for Claude models via Vertex AI" | ||
| ) | ||
|
|
||
| # Set credentials file if provided, otherwise use Application Default Credentials (gcloud auth) | ||
| if self.config.google_credentials_file: | ||
| import os | ||
| os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = self.config.google_credentials_file | ||
| if self.config.verbose: | ||
| logger.info(f"Using explicit credentials: {self.config.google_credentials_file}") | ||
| else: | ||
| if self.config.verbose: | ||
| logger.info("Using Application Default Credentials (gcloud auth)") | ||
|
|
||
| # Enable extended thinking for Claude if show_thinking is enabled | ||
| # Note: Extended thinking requires temperature=1 and max_tokens > budget_tokens | ||
| enable_extended_thinking = self.config.show_thinking and self.config.extended_thinking_budget > 0 | ||
|
|
||
| llm_kwargs = { | ||
| "model_name": self.config.model_name, | ||
| "project": self.config.google_project_id, | ||
| "location": self.config.google_location, | ||
| "temperature": 1.0 if enable_extended_thinking else self.config.temperature, | ||
| } | ||
|
|
||
| if enable_extended_thinking: | ||
| # max_tokens must be greater than thinking budget | ||
| # Claude's max output is 64K tokens | ||
| max_tokens = 64000 | ||
|
|
||
| llm_kwargs["max_tokens"] = max_tokens | ||
| llm_kwargs["model_kwargs"] = { | ||
| "thinking": { | ||
| "type": "enabled", | ||
| "budget_tokens": self.config.extended_thinking_budget | ||
| } | ||
| } | ||
| if self.config.verbose: | ||
| logger.info(f"Extended thinking enabled with budget: {self.config.extended_thinking_budget} tokens") | ||
| logger.info(f"Max tokens set to {max_tokens}") | ||
| logger.info("Temperature automatically set to 1.0 (required for extended thinking)") | ||
| elif self.config.show_thinking: | ||
| if self.config.verbose: | ||
| logger.info("Extended thinking disabled (budget=0)") | ||
|
|
||
| if self.config.verbose: | ||
| logger.info( | ||
| f"Using ChatAnthropicVertex for Claude model: {self.config.model_name} " | ||
| f"(project: {self.config.google_project_id}, location: {self.config.google_location})" | ||
| ) | ||
|
|
||
| return ChatAnthropicVertex(**llm_kwargs) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Claude Vertex AI integration and extended-thinking wiring look coherent; please verify constructor API
The new Claude/Vertex branch correctly enforces google_project_id, handles ADC vs explicit credentials, and toggles temperature/max tokens when extended thinking is enabled. The only thing I’d double‑check is that ChatAnthropicVertex indeed expects model_name, project, location, temperature, max_tokens, and model_kwargs["thinking"] in this shape and that max_tokens=64000 is within the current limits for the chosen Claude model variant.
You might also consider clamping or validating extended_thinking_budget against the chosen max_tokens to fail fast if someone misconfigures the budget.
🌐 Web query:
Verify the current constructor signature and supported keyword arguments for `langchain_google_vertexai.model_garden.ChatAnthropicVertex` (including `model_name`, `project`, `location`, `temperature`, `max_tokens`, and `model_kwargs`) and how to configure Claude “extended thinking”/thinking budgets correctly.
💡 Result:
Short answer:
-
Current constructor/kwargs for langchain_google_vertexai.model_garden.ChatAnthropicVertex include (explicit highlights): model_name (alias model), project, location (default "us-central1"), max_output_tokens (alias max_tokens), temperature, top_k, top_p, max_retries, tuned_model_name, thinking_budget, full_model_name, additional_headers, access_token, api_endpoint, api_transport, disable_streaming, safety_settings, etc. The class also accepts arbitrary provider-specific options via **kwargs (passed to the invoke call). See the API reference for the full parameter list. [1]
-
To enable Claude “extended thinking” you must pass Anthropic’s thinking block in the model request: set thinking.type = "enabled" and provide thinking.budget_tokens (minimum ~1,024 tokens; recommended to start there and increase as needed). Budget tokens control the model’s internal reasoning budget (billed) and should be set with max_tokens / max_output_tokens larger than or equal to the budget; follow Anthropic guidance on limits (practical suggestions around 16k+, and caution above ~32k). Example form (JSON): { "thinking": { "type": "enabled", "budget_tokens": 10000 } }. Anthropic docs also describe interleaved-thinking headers for tool use and other considerations (billing, latency, min budget, recommendations). [2][3]
Notes / pointers:
- Use ChatAnthropicVertex's model_name (or model) to select a Claude model (e.g., the Sonnet/Claude model id) and use project/location to target Vertex AI. The LangChain class documents the aliases: max_output_tokens == max_tokens and model_name == model. [1]
- Extended thinking parameters are part of Anthropic’s request payload (not a LangChain-specific named keyword). When using ChatAnthropicVertex you supply them via model kwargs / extra request fields that LangChain forwards to the provider (i.e., include the thinking object in the **kwargs passed to invoke/generate). Confirm exact forwarding in your LangChain version if you need programmatic assurance. [1][2]
Sources:
- LangChain ChatAnthropicVertex API reference (parameter list, aliases). [LangChain docs]. (api.python.langchain.com)
- Anthropic “Building with extended thinking” (how to enable thinking, budget_tokens, min/recommendations, interleaved thinking notes). (anthropic.mintlify.app)
If you want, I can show a concrete code example for LangChain (Python) that sets model_name/project/location/max_tokens/temperature and passes the thinking block.
Citations:
- 1: https://api.python.langchain.com/en/latest/google_vertexai/model_garden/langchain_google_vertexai.model_garden.ChatAnthropicVertex.html
- 2: https://anthropic.mintlify.app/en/docs/build-with-claude/extended-thinking?utm_source=openai
Based on the web search results, I can now verify the review comment:
ChatAnthropicVertex constructor API is compatible; however, hardcoded max_tokens and missing budget validation need attention
The web search confirms that ChatAnthropicVertex accepts model_name, project, location, temperature, and max_tokens (aliased max_output_tokens), and the code's usage is correct.
However, two concerns remain:
-
Hardcoded max_tokens=64000 may exceed limits. Anthropic recommends caution above ~32k tokens for thinking budgets. The hardcoded 64000 should be validated against the specific Claude model variant's limits and potentially made configurable.
-
Missing extended_thinking_budget validation. The code does not validate that:
extended_thinking_budgetmeets the minimum (~1,024 tokens per Anthropic guidance)extended_thinking_budgetdoes not exceedmax_tokens- The budget aligns with the chosen model's constraints
The original review's suggestion to validate extended_thinking_budget against max_tokens and fail fast on misconfiguration should be implemented before merging.
🤖 Prompt for AI Agents
In chat/sippy_agent/agent.py around lines 55-116, replace the hardcoded
max_tokens and add validation for extended_thinking_budget: derive max_tokens
from a configurable value (e.g., self.config.max_output_tokens or model-specific
limit) instead of 64000, ensure extended_thinking_budget is >=1024 and <=
max_tokens (and <= model limit if available), log the resolved max_tokens and
budget, and raise a clear ValueError if validation fails so misconfiguration
fails fast before constructing ChatAnthropicVertex.
|
Scheduling required tests: |
|
/hold I think its reviewable/testable but looking at one of the coderabbit comments about a race |
|
@stbenjam: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Chores