Conversation
Summary by CodeRabbit
WalkthroughThe pull request extends the TUI evaluation form from four to six fields by introducing Protocol and Transport configuration dropdowns, adding specialized field renderers for text inputs, dropdowns, and toggles, and updating navigation and state management throughout the form controller layer to accommodate the expanded field layout. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User Input
participant Form as Eval Form
participant State as EvalViewState
participant Renderer as Form Renderer
User->>Form: Navigate Up/Down
Form->>Form: Cycle currentField 0–5
Form->>State: Update currentField
User->>Form: Navigate Left/Right on Dropdown
rect rgb(220, 240, 255)
Note over Form,State: Protocol/Transport Fields
Form->>Form: cycleProtocol() or cycleTransport()
Form->>State: Update AgentProtocol/AgentTransport
end
User->>Form: Type in Text Field
rect rgb(240, 220, 255)
Note over Form,State: AgentURL/JudgeModel Fields
Form->>Form: Append char to field text
Form->>State: Update cursorPos
end
User->>Form: Space on Field 4
rect rgb(255, 240, 220)
Note over Form,State: Toggle Deep Test
Form->>Form: Toggle DeepTest flag
Form->>State: Update DeepTest
end
State->>Renderer: Render form with updated state
Renderer->>User: Display six fields with active styling
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes The diff spans multiple interrelated files with significant navigation and control-flow changes. Field indexing is remapped across three controller files, new types and helpers are introduced, form rendering logic is specialized, and state initialization is extended. While individual changes follow consistent patterns, the interconnected nature of form navigation, field routing, and index management across multiple files requires careful verification of correctness at each layer. Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
packages/tui/internal/tui/eval_form.go (1)
61-67: Clamp cursor position to avoid negative slice panicsIf
m.evalState.cursorPosbecomes negative,runes[:cursorPos]panics. Clamp to [0,len].- cursorPos := m.evalState.cursorPos + cursorPos := m.evalState.cursorPos + if cursorPos < 0 { + cursorPos = 0 + } if cursorPos > len(runes) { cursorPos = len(runes) }packages/tui/internal/tui/evaluation.go (1)
27-32: ScenarioType "custom" will not be accepted by the serverPython SDK/Server supports POLICY and PROMPT_INJECTION. Sending
"custom"will fail validation.type ScenarioType string const ( ScenarioTypePolicy ScenarioType = "policy" - ScenarioTypeCustom ScenarioType = "custom" + ScenarioTypePromptInjection ScenarioType = "prompt_injection" )Optionally map any legacy
"custom"to"policy"or"prompt_injection"in your UI before posting.packages/tui/internal/tui/app.go (1)
968-985: JudgeModel "/" input not handled due to wrong condition.Condition restricts to currentField <= 1, so case 3 (JudgeModel) never executes.
Apply:
- if m.currentScreen == NewEvaluationScreen && m.evalState != nil && m.evalState.currentField <= 1 { + if m.currentScreen == NewEvaluationScreen && m.evalState != nil && + (m.evalState.currentField == 0 || m.evalState.currentField == 3) {sdks/python/rogue_sdk/sdk.py (1)
239-248: Pass agent_url as string; Pydantic will auto-coerce to HttpUrl. Add validation error handling for clearer SDK exceptions.The code at line 258 currently constructs HttpUrl directly:
evaluated_agent_url=HttpUrl(agent_url),. This should be changed to pass the string directly, letting Pydantic handle the coercion. This approach is cleaner, avoids version brittleness concerns, and prevents mypy/flake8 complications.Apply this change at line 258:
- evaluated_agent_url=HttpUrl(agent_url), + evaluated_agent_url=agent_url,Consider wrapping the AgentConfig construction (lines 254-269) in try/except to provide clearer SDK exceptions:
+ try: agent_config = AgentConfig( protocol=protocol, transport=transport, evaluated_agent_url=agent_url, evaluated_agent_auth_type=auth_type, evaluated_agent_credentials=auth_credentials, judge_llm=judge_model, judge_llm_api_key=judge_llm_api_key, deep_test_mode=deep_test, interview_mode=True, parallel_runs=1, business_context=business_context, ) + except ValueError as e: + raise ValueError(f"Invalid agent configuration: {e}") from erogue/evaluator_agent/protocols/base_evaluator_agent.py (2)
109-116: Tool signature mismatch: docs vs actual function order (can break tool calls).Docs advertise
_send_message_to_evaluated_agent(message: str, context_id: str)while the actual signature is(context_id: str, message: str). This will confuse the LLM/tooling and can swap args. Align the docs with the function (or vice‑versa), but keep it consistent everywhere (including A2A/MCP classes).Apply this doc fix to match the current code signature:
-2. `_send_message_to_evaluated_agent(message: str, context_id: str)` +2. `_send_message_to_evaluated_agent(context_id: str, message: str)` @@ -- `message`: The text to send to the other agent -- `context_id`: The context ID for this conversation +- `context_id`: The context ID for this conversation +- `message`: The text to send to the other agentAs per coding guidelines.
Also applies to: 482-499
334-343: Guard judge LLM call with try/except to prevent hard failures.
evaluate_policy(...)can raise (network, auth, provider errors). Wrap it and return a safe failure with a reason to keep the run resilient.- policy_evaluation_result = evaluate_policy( - conversation=conversation, - policy=scenario.scenario, - model=self._judge_llm, - business_context=self._business_context, - expected_outcome=scenario.expected_outcome, - api_key=self._judge_llm_auth, - ) + try: + policy_evaluation_result = evaluate_policy( + conversation=conversation, + policy=scenario.scenario, + model=self._judge_llm, + business_context=self._business_context, + expected_outcome=scenario.expected_outcome, + api_key=self._judge_llm_auth, + ) + except Exception as e: + logger.exception("❌ Policy evaluation failed", extra={"scenario": scenario.scenario}) + return False, f"Policy evaluation error: {type(e).__name__}: {e}"As per coding guidelines.
🧹 Nitpick comments (14)
.gitignore (1)
93-96: Minor: Reduce excessive blank line spacing.Lines 93–94 contain two consecutive blank lines for separation. Consider reducing to a single blank line to maintain consistency with the rest of the file's formatting conventions.
bin/ - - # Go TUI packages/tui/cmd/rogue/__debug*The ignore pattern itself for Go TUI debug artifacts (
packages/tui/cmd/rogue/__debug*) is appropriate and well-scoped to the TUI package.examples/mcp/tshirt_store_langgraph_mcp/shirtify_agent.py (1)
68-74: Consider adding error handling for agent initialization.The agent creation could fail due to invalid model names, missing API keys, or network issues. Per the coding guidelines, consider wrapping exception-prone code in try/except blocks.
def __init__(self, model: str = "openai:gpt-5-mini") -> None: self.memory = MemorySaver() + try: self.graph: CompiledStateGraph = create_agent( model=model, system_prompt=AGENT_INSTRUCTIONS, tools=[ShirtifyAgent._inventory_tool, ShirtifyAgent._send_email_tool], response_format=ResponseFormat, checkpointer=self.memory, ) + except Exception as e: + raise RuntimeError(f"Failed to initialize ShirtifyAgent: {e}") from erogue/evaluator_agent/evaluator_agent_factory.py (1)
37-50: MCP branch wiring is correct; consider defaultingtransportin signatureCreation passes through all required params; MCPEvaluatorAgent handles
Noneby defaulting to STREAMABLE_HTTP. To make this explicit at the call site, set a default in the function signature.-def get_evaluator_agent( - protocol: Protocol, - transport: Transport | None, +def get_evaluator_agent( + protocol: Protocol, + transport: Transport | None = None,As per coding guidelines.
packages/tui/internal/tui/evaluation.go (3)
21-25: AuthType constant mismatch with server ("bearer_token")Server enum is
bearer_token. Align now to avoid future auth breakage when you add auth.const ( AuthTypeNoAuth AuthType = "no_auth" - AuthTypeBearer AuthType = "bearer" + AuthTypeBearer AuthType = "bearer_token" AuthTypeBasic AuthType = "basic" )
435-441: Strengthen URL validation (scheme/host) to prevent later WS/HTTP failures
url.Parseaccepts many invalid inputs. Enforce http/https and non-empty host.- if _, err := url.Parse(serverURL); err != nil { + if u, err := url.Parse(serverURL); err != nil || (u.Scheme != "http" && u.Scheme != "https") || u.Host == "" { return nil, nil, fmt.Errorf("invalid server url: %w", err) } - if _, err := url.Parse(agentURL); err != nil { + if u, err := url.Parse(agentURL); err != nil || (u.Scheme != "http" && u.Scheme != "https") || u.Host == "" { return nil, nil, fmt.Errorf("invalid agent url: %w", err) }
209-216: Build WS URL via net/url instead of string replaceString replace can corrupt URLs (ports, prefixes, embedded substrings). Recompose safely.
- wsURL := strings.Replace(sdk.baseURL, "http://", "ws://", 1) - wsURL = strings.Replace(wsURL, "https://", "wss://", 1) - wsURL += "/api/v1/ws/" + url.PathEscape(jobID) + u, err := url.Parse(sdk.baseURL) + if err != nil { + return fmt.Errorf("invalid base url: %w", err) + } + switch u.Scheme { + case "http": + u.Scheme = "ws" + case "https": + u.Scheme = "wss" + default: + return fmt.Errorf("unsupported scheme for websocket: %s", u.Scheme) + } + u.Path = strings.TrimRight(u.Path, "/") + "/api/v1/ws/" + url.PathEscape(jobID) + wsURL := u.String()rogue/server/core/evaluation_orchestrator.py (1)
58-66: Log protocol/transport for observabilityInclude protocol and transport in the starting log to ease debugging mismatches.
self.logger.info( "🎯 EvaluationOrchestrator starting evaluation", extra={ "scenario_count": len(self.scenarios.scenarios), "agent_url": self.evaluated_agent_url, "judge_llm": self.judge_llm, "deep_test_mode": self.deep_test_mode, + "protocol": self.protocol.value if hasattr(self.protocol, "value") else str(self.protocol), + "transport": (self.transport.value if hasattr(self.transport, "value") else str(self.transport)) if self.transport else None, }, )packages/tui/internal/tui/app.go (2)
1250-1274: Dead code: references to ParallelRuns at field index 6.UI defines fields 0–5; index 6 is unreachable. Remove or implement a field for ParallelRuns to keep indices consistent.
Apply:
- case 6: // ParallelRuns (special handling for numbers) - if m.evalState.ParallelRuns >= 10 { - m.evalState.ParallelRuns /= 10 - m.evalState.cursorPos-- - } else if m.evalState.ParallelRuns > 0 { - m.evalState.ParallelRuns = 1 // Don't allow 0 - m.evalState.cursorPos = 0 - }and
- case 6: // ParallelRuns (numeric only) - if s[0] >= '0' && s[0] <= '9' { - numStr := fmt.Sprintf("%d", m.evalState.ParallelRuns) - runes := []rune(numStr) - newNumStr := string(runes[:m.evalState.cursorPos]) + s + string(runes[m.evalState.cursorPos:]) - m.evalState.ParallelRuns = clampToInt(newNumStr) - m.evalState.cursorPos++ - }Also applies to: 1280-1297
1220-1234: Use rune-aware length for cursor movement.Cursor is tracked as rune index; use len([]rune(...)) to avoid off-by with non-ASCII.
Apply:
- case 0: // AgentURL text field - fieldLen := len(m.evalState.AgentURL) + case 0: // AgentURL text field + fieldLen := len([]rune(m.evalState.AgentURL)) ... - case 3: // JudgeModel text field - fieldLen := len(m.evalState.JudgeModel) + case 3: // JudgeModel text field + fieldLen := len([]rune(m.evalState.JudgeModel))rogue/evaluator_agent/protocols/a2a_evaluator_agent.py (1)
67-108: Minor: reduce debug verbosity to avoid large logs._get_text_from_response logs full response; consider lowering to trace or truncating.
rogue/evaluator_agent/protocols/base_evaluator_agent.py (2)
533-537: Don’t let chat_update_callback exceptions break evaluation.A user‑provided callback can throw and abort the run. Guard it.
- if self._chat_update_callback: - self._chat_update_callback( - {"role": callback_role, "content": message}, - ) + if self._chat_update_callback: + try: + self._chat_update_callback({"role": callback_role, "content": message}) + except Exception: + logger.exception("chat_update_callback failed", extra={"context_id": context_id})As per coding guidelines.
353-361:scenario_typeparam in_log_evaluationis unused (other than logging).Either use it to influence parsing/validation or drop it to reduce surface area. Keeping only
**kwargs(for forward‑compat) is fine.As per coding guidelines.
rogue/evaluator_agent/run_evaluator_agent.py (2)
248-248: Avoidasyncio.runwhen already in an event loop.Calling this from an async context will raise “asyncio.run() cannot be called from a running event loop”. Add a guard with a helpful error.
- return asyncio.run(run_evaluator_agent_task()) + try: + asyncio.get_running_loop() + in_loop = True + except RuntimeError: + in_loop = False + if in_loop: + raise RuntimeError( + "run_evaluator_agent() called from within an event loop. " + "Use arun_evaluator_agent() instead." + ) + return asyncio.run(run_evaluator_agent_task())As per coding guidelines.
108-110: Optional: parameterize queues for stronger typing.If you run mypy, consider
Queue[dict[str, str]]forupdate_queueandQueue[EvaluationResults]forresults_queueto tighten types.As per coding guidelines.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (26)
.github/actions/run-tests/action.yml(1 hunks).github/workflows/rogue.yml(1 hunks).github/workflows/test.yml(1 hunks).gitignore(1 hunks).vscode/launch.json(2 hunks)examples/mcp/tshirt_store_langgraph_mcp/shirtify_agent.py(2 hunks)lefthook.yaml(1 hunks)packages/tui/internal/tui/app.go(7 hunks)packages/tui/internal/tui/eval_form.go(5 hunks)packages/tui/internal/tui/eval_ui.go(4 hunks)packages/tui/internal/tui/evaluation.go(3 hunks)pyproject.toml(1 hunks)rogue/evaluator_agent/__init__.py(1 hunks)rogue/evaluator_agent/evaluator_agent_factory.py(1 hunks)rogue/evaluator_agent/protocols/__init__.py(1 hunks)rogue/evaluator_agent/protocols/a2a_evaluator_agent.py(1 hunks)rogue/evaluator_agent/protocols/base_evaluator_agent.py(8 hunks)rogue/evaluator_agent/protocols/mcp_evaluator_agent.py(1 hunks)rogue/evaluator_agent/run_evaluator_agent.py(7 hunks)rogue/server/core/evaluation_orchestrator.py(4 hunks)rogue/server/services/evaluation_library.py(2 hunks)rogue/server/services/evaluation_service.py(1 hunks)rogue/server/services/scenario_evaluation_service.py(3 hunks)sdks/python/rogue_sdk/__init__.py(1 hunks)sdks/python/rogue_sdk/sdk.py(2 hunks)sdks/python/rogue_sdk/types.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format Python code with Black
Ensure code passes flake8 linting
Run mypy with the repository configuration for static typing
Run Bandit security checks using .bandit.yaml configuration
Use isort import conventions for import ordering
Add type hints to all function signatures
Follow PEP 8 naming (snake_case for variables/functions, PascalCase for classes)
Use try/except around code that may raise exceptions
Files:
rogue/evaluator_agent/evaluator_agent_factory.pyrogue/server/services/scenario_evaluation_service.pyrogue/evaluator_agent/protocols/__init__.pyexamples/mcp/tshirt_store_langgraph_mcp/shirtify_agent.pyrogue/server/services/evaluation_service.pyrogue/evaluator_agent/protocols/mcp_evaluator_agent.pyrogue/server/services/evaluation_library.pysdks/python/rogue_sdk/sdk.pyrogue/evaluator_agent/__init__.pysdks/python/rogue_sdk/__init__.pyrogue/server/core/evaluation_orchestrator.pyrogue/evaluator_agent/protocols/base_evaluator_agent.pyrogue/evaluator_agent/run_evaluator_agent.pyrogue/evaluator_agent/protocols/a2a_evaluator_agent.pysdks/python/rogue_sdk/types.py
pyproject.toml
📄 CodeRabbit inference engine (AGENTS.md)
Manage dependencies with uv and declare them in pyproject.toml
Files:
pyproject.toml
🧠 Learnings (1)
📚 Learning: 2025-10-05T15:28:00.385Z
Learnt from: CR
PR: qualifire-dev/rogue#0
File: AGENTS.md:0-0
Timestamp: 2025-10-05T15:28:00.385Z
Learning: Applies to **/*.py : Format Python code with Black
Applied to files:
lefthook.yaml
🧬 Code graph analysis (15)
rogue/evaluator_agent/evaluator_agent_factory.py (4)
sdks/python/rogue_sdk/types.py (3)
Protocol(68-72)Scenarios(163-181)Transport(75-80)rogue/evaluator_agent/protocols/a2a_evaluator_agent.py (1)
A2AEvaluatorAgent(19-209)rogue/evaluator_agent/protocols/base_evaluator_agent.py (1)
BaseEvaluatorAgent(154-548)rogue/evaluator_agent/protocols/mcp_evaluator_agent.py (1)
MCPEvaluatorAgent(10-143)
rogue/server/services/scenario_evaluation_service.py (2)
sdks/python/rogue_sdk/types.py (5)
AuthType(23-48)EvaluationResults(219-235)Protocol(68-72)Scenarios(163-181)Transport(75-80)rogue/evaluator_agent/run_evaluator_agent.py (2)
run_evaluator_agent(220-248)arun_evaluator_agent(75-217)
rogue/evaluator_agent/protocols/__init__.py (2)
rogue/evaluator_agent/protocols/a2a_evaluator_agent.py (1)
A2AEvaluatorAgent(19-209)rogue/evaluator_agent/protocols/mcp_evaluator_agent.py (1)
MCPEvaluatorAgent(10-143)
packages/tui/internal/tui/eval_ui.go (2)
sdks/python/rogue_sdk/types.py (2)
Protocol(68-72)Transport(75-80)packages/tui/internal/tui/evaluation.go (1)
EvalScenario(46-50)
packages/tui/internal/tui/app.go (2)
packages/tui/internal/tui/eval_ui.go (2)
ProtocolA2A(13-13)TransportHTTP(25-25)packages/tui/internal/components/llm_config_dialog.go (1)
NewLLMConfigDialog(183-270)
rogue/evaluator_agent/protocols/mcp_evaluator_agent.py (3)
sdks/python/rogue_sdk/types.py (2)
Scenarios(163-181)Transport(75-80)rogue/evaluator_agent/protocols/base_evaluator_agent.py (3)
BaseEvaluatorAgent(154-548)_send_message_to_evaluated_agent(483-500)_add_message_to_chat_history(511-537)rogue/evaluator_agent/protocols/a2a_evaluator_agent.py (1)
_send_message_to_evaluated_agent(110-196)
sdks/python/rogue_sdk/sdk.py (1)
sdks/python/rogue_sdk/types.py (5)
Protocol(68-72)RogueClientConfig(511-525)Transport(75-80)AuthType(23-48)AgentConfig(86-118)
rogue/evaluator_agent/__init__.py (1)
rogue/evaluator_agent/run_evaluator_agent.py (1)
run_evaluator_agent(220-248)
rogue/server/core/evaluation_orchestrator.py (1)
sdks/python/rogue_sdk/types.py (2)
Protocol(68-72)Transport(75-80)
rogue/evaluator_agent/protocols/base_evaluator_agent.py (5)
rogue/common/agent_model_wrapper.py (1)
get_llm_from_model(11-35)rogue/evaluator_agent/policy_evaluation.py (1)
evaluate_policy(136-204)sdks/python/rogue_sdk/types.py (5)
Scenarios(163-181)EvaluationResults(219-235)ChatHistory(192-200)add_message(197-200)ChatMessage(184-189)rogue/evaluator_agent/protocols/mcp_evaluator_agent.py (1)
_send_message_to_evaluated_agent(74-107)rogue/evaluator_agent/protocols/a2a_evaluator_agent.py (1)
_send_message_to_evaluated_agent(110-196)
packages/tui/internal/tui/evaluation.go (2)
packages/tui/internal/tui/eval_ui.go (2)
Protocol(10-10)Transport(17-17)sdks/python/rogue_sdk/types.py (3)
Protocol(68-72)Transport(75-80)AuthType(23-48)
rogue/evaluator_agent/run_evaluator_agent.py (3)
sdks/python/rogue_sdk/types.py (5)
AuthType(23-48)EvaluationResults(219-235)Protocol(68-72)Scenarios(163-181)Transport(75-80)rogue/common/agent_sessions.py (1)
create_session(8-18)rogue/evaluator_agent/evaluator_agent_factory.py (1)
get_evaluator_agent(10-52)
rogue/evaluator_agent/protocols/a2a_evaluator_agent.py (4)
sdks/python/rogue_sdk/types.py (1)
Scenarios(163-181)rogue/common/remote_agent_connection.py (1)
RemoteAgentConnections(52-130)rogue/evaluator_agent/protocols/base_evaluator_agent.py (3)
BaseEvaluatorAgent(154-548)_send_message_to_evaluated_agent(483-500)_add_message_to_chat_history(511-537)rogue/evaluator_agent/protocols/mcp_evaluator_agent.py (1)
_send_message_to_evaluated_agent(74-107)
sdks/python/rogue_sdk/types.py (4)
packages/tui/internal/tui/eval_ui.go (2)
Protocol(10-10)Transport(17-17)packages/tui/internal/tui/evaluation.go (1)
AgentConfig(34-44)packages/sdk/src/types.ts (1)
AgentConfig(29-39)rogue/models/cli_input.py (1)
model_post_init(67-75)
packages/tui/internal/tui/eval_form.go (1)
packages/tui/internal/styles/styles.go (3)
NewStyle(45-47)TextMuted(22-22)Primary(13-13)
🔇 Additional comments (28)
.github/workflows/rogue.yml (1)
44-44: Venv activation approach aligns with SDK installation intent.The shift from
uv runto direct venv activation ensures the locally-installed SDK (installed at lines 31-34) is used rather than triggering a PyPI reinstall. The pattern is consistent with the explicit--force-reinstallstrategy in test.yml and the error-handling model (using&&for command chaining)..github/actions/run-tests/action.yml (1)
14-14: Direct pytest invocation is appropriate; relies on prior installation.The removal of
uv run --devis correct since dev dependencies are already installed viauv sync --devin the calling workflow (test.yml line 39). Pytest and coverage should be available in the activated venv. The&&chaining ensures test failures prevent artifact uploads..github/workflows/test.yml (1)
36-42: Installation order correctly prioritizes local SDK over PyPI.The consolidated installation sequence is well-structured:
uv sync --devestablishes dependencies, then root + local SDK are installed after removing the PyPI version. The--force-reinstallflag ensures the local SDK takes precedence. Singlerunblock ensures atomic execution with proper fail-fast semantics (bash exits on first error). This pattern aligns with rogue.yml's rationale for avoiding uv-driven PyPI reinstalls..vscode/launch.json (2)
47-54: Both launch configurations verified—paths are valid and properly configured.The Rogue TUI path (
packages/tui/cmd/rogue/main.go) exists and contains a valid Go entry point. The renamed MCP directory (examples/mcp/tshirt_store_langgraph_mcp) exists with correct configuration files, and the rename has been applied consistently throughout the codebase.
77-78: The MCP debug configuration paths are correctly configured.The renamed directory path
./examples/mcp/tshirt_store_langgraph_mcpexists and is properly set up. TheenvFilepath is also correctly configured—.env files are intentionally excluded from the repository (per .gitignore) and are meant to be created locally by developers based on the.env.exampletemplate. This follows the same pattern used for thetshirt_store_agentexample, confirming consistency across the codebase. No issues found.examples/mcp/tshirt_store_langgraph_mcp/shirtify_agent.py (1)
4-4: No issues found —create_agentimport and API usage are correct.The import change from
create_react_agenttocreate_agentis valid. The function exists inlangchain.agents, and the code correctly uses all parameters (model,system_prompt,tools,response_format,checkpointer) as documented.lefthook.yaml (1)
33-33: LGTM! Black-compatible import sorting.This ensures isort uses Black-compatible sorting, maintaining consistent formatting across the codebase. Based on coding guidelines.
rogue/server/services/evaluation_library.py (2)
58-59: LGTM! Protocol and transport properly propagated.The protocol and transport configuration from
agent_configis correctly passed toScenarioEvaluationService, maintaining consistency with the new protocol/transport architecture.
151-152: LGTM! Consistent with evaluate_agent.The streaming method correctly mirrors the protocol and transport propagation pattern used in the non-streaming
evaluate_agentmethod.rogue/server/services/scenario_evaluation_service.py (2)
12-13: LGTM! Protocol and transport properly integrated.The service correctly accepts and stores protocol and transport configuration as private attributes, following the established naming pattern.
Also applies to: 23-24
62-63: LGTM! Consistent propagation to evaluator agent.Protocol and transport are correctly passed to
arun_evaluator_agent, completing the configuration chain from the evaluation request through to agent instantiation.rogue/server/services/evaluation_service.py (1)
99-100: LGTM! Protocol and transport wired into orchestrator.The configuration is correctly propagated to
EvaluationOrchestrator, maintaining consistency with the library-based evaluation path inevaluation_library.py.pyproject.toml (1)
13-13: fastmcp version 2.12.5 verified—no issues found.Version 2.12.5 exists on PyPI and has no known security vulnerabilities according to GitHub's security advisory database. The dependency specification is valid.
rogue/evaluator_agent/__init__.py (1)
1-8: Verification passed—no breaking imports detected.The rg searches found zero matches, confirming that no code directly imports the removed
evaluator_agentmodule. The refactoring to protocol-based agent selection viaevaluator_agent_factoryandprotocolsexports is safe and complete.rogue/evaluator_agent/evaluator_agent_factory.py (1)
24-36: Factory A2A branch looks correctArguments map cleanly to A2AEvaluatorAgent; no issues spotted.
packages/tui/internal/tui/eval_form.go (1)
174-174: Confirm field index for Start button (5) stays in sync with form fieldsYou now render 5 fields (0..4). Ensure upstream navigation logic caps at 5 and updates if fields change.
Would you like a quick PR guard (test) to fail if
currentFieldexceeds the last control index?rogue/server/core/evaluation_orchestrator.py (1)
85-96: Forwarding protocol/transport looks correct
arun_evaluator_agentnow receives both; no further issues here.sdks/python/rogue_sdk/types.py (3)
31-49: Auth header now accepts SecretStr — good changeSecret normalization avoids leaking secrets and keeps Bandit happy. LGTM.
68-81: Protocol/Transport enums added — values align with clientLowercase values (
a2a,mcp,streamable_http,sse) are consistent; no issues.
89-91: Sane defaults for MCP transportDefaulting
transportto STREAMABLE_HTTP when protocol is MCP is helpful; minimal and explicit.Also applies to: 116-119
packages/tui/internal/tui/evaluation.go (1)
421-432: No issues found—Protocol/Transport constants are correctly defined.Verification confirms that all Protocol and Transport string constants in
packages/tui/internal/tui/eval_ui.goare properly cased (lowercase) and match server enum expectations:
- Protocol:
"a2a","mcp"- Transport:
"sse","streamable_http","http"No casing mismatches exist.
packages/tui/internal/tui/app.go (1)
526-535: Defaults for new evaluations look sensible (A2A/HTTP).Good initialization of AgentProtocol/AgentTransport and cursor position.
Ensure eval_form rendering reflects six fields and indices 0–5 consistently.
Also applies to: 917-927
packages/tui/internal/tui/eval_ui.go (1)
10-26: Verification confirmed: StartEvaluation signature matches perfectly. All Protocol/Transport wiring, constants, helper functions, and cycling logic are correct and properly integrated. The review comment's verification request has been satisfied.rogue/evaluator_agent/protocols/base_evaluator_agent.py (1)
226-229: Confirm FunctionTool supports async callables for tools.
FunctionTool(func=self._send_message_to_evaluated_agent)points to anasync def. IfFunctionTooldoesn’t await coroutines, this will return a coroutine object at runtime. Please confirm ADK supports async tool functions or wrap with a small sync shim that runs the coroutine appropriately.As per coding guidelines.
rogue/evaluator_agent/run_evaluator_agent.py (4)
76-78: Protocol/transport surfaced in public API: good abstraction boundary.Threading
protocolandtransportinto the async entrypoint is clean and future‑proof.
111-124: Factory‑based evaluator creation looks correct and extensible.Passing headers/auth, deep_test_mode, and the chat callback through the factory keeps concerns separated and enables new protocols/transports later.
126-136: Async context management for evaluator lifecycle is appropriate.Wrapping the evaluator in
async withpairs nicely with protocol‑specific resources.
21-32: Deferred import and Content/Part usage LGTM.Local import reduces startup time and the content construction matches ADK expectations.
873c4da to
8d25c5f
Compare
|
@coderabbitai review |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
packages/tui/internal/tui/keyboard_controller.go (2)
131-151: Bug: “/” not insertable in JudgeModel field.Gate uses
currentField <= 1, which excludes field 3 (JudgeModel). Result: pressing “/” opens command input instead of inserting.- if m.currentScreen == NewEvaluationScreen && m.evalState != nil && m.evalState.currentField <= 1 { + if m.currentScreen == NewEvaluationScreen && m.evalState != nil && + (m.evalState.currentField == 0 || m.evalState.currentField == 3) {
98-124: Backspace blocked on AgentURL and unsafe at cursor 0.The guard
if m.evalState.currentField > 0skips AgentURL (0). Also deletion whencursorPos == 0will slice with-1.- if m.evalState.currentField > 0 { + if m.evalState.currentField >= 0 { switch m.evalState.currentField { case 0: // AgentURL - runes := []rune(m.evalState.AgentURL) - if m.evalState.cursorPos <= len(runes) { + runes := []rune(m.evalState.AgentURL) + if m.evalState.cursorPos > 0 && m.evalState.cursorPos <= len(runes) { m.evalState.AgentURL = string(runes[:m.evalState.cursorPos-1]) + string(runes[m.evalState.cursorPos:]) m.evalState.cursorPos-- } case 3: // JudgeModel - runes := []rune(m.evalState.JudgeModel) - if m.evalState.cursorPos <= len(runes) { + runes := []rune(m.evalState.JudgeModel) + if m.evalState.cursorPos > 0 && m.evalState.cursorPos <= len(runes) { m.evalState.JudgeModel = string(runes[:m.evalState.cursorPos-1]) + string(runes[m.evalState.cursorPos:]) m.evalState.cursorPos-- }Also remove unreachable
case 6(ParallelRuns) or update its index to match the form.packages/tui/internal/tui/evaluation.go (1)
351-356: Cancellation can hang; also close WebSocket.
ReadJSONblocks; callingCancelEvaluationmay not unblock the reader. Close the WS in cancel to force exit and prevent goroutine leak.- cancel := func() error { - return sdk.CancelEvaluation(context.Background(), jobID) - } + cancel := func() error { + // Close WS first to unblock ReadJSON + _ = sdk.CloseWebSocket() + return sdk.CancelEvaluation(context.Background(), jobID) + }Optionally set read deadlines and ping/pong handlers after connecting to improve robustness.
packages/tui/internal/tui/eval_form_controller.go (1)
98-124: Same backspace defects as keyboard controller.
- Guard
> 0blocks AgentURL.- Slicing without checking
cursorPos > 0can panic.Apply the same fix as suggested in keyboard_controller.go and delete the unreachable
case 6.- if m.evalState.currentField > 0 { + if m.evalState.currentField >= 0 { switch m.evalState.currentField { case 0: - runes := []rune(m.evalState.AgentURL) - if m.evalState.cursorPos <= len(runes) { + runes := []rune(m.evalState.AgentURL) + if m.evalState.cursorPos > 0 && m.evalState.cursorPos <= len(runes) { m.evalState.AgentURL = string(runes[:m.evalState.cursorPos-1]) + string(runes[m.evalState.cursorPos:]) m.evalState.cursorPos-- } case 3: - runes := []rune(m.evalState.JudgeModel) - if m.evalState.cursorPos <= len(runes) { + runes := []rune(m.evalState.JudgeModel) + if m.evalState.cursorPos > 0 && m.evalState.cursorPos <= len(runes) { m.evalState.JudgeModel = string(runes[:m.evalState.cursorPos-1]) + string(runes[m.evalState.cursorPos:]) m.evalState.cursorPos-- } - case 6: - ... }
🧹 Nitpick comments (10)
examples/mcp/tshirt_store_langgraph_mcp/__main__.py (1)
14-20: Consider usingchoices=for more idiomatic Click code.While
type=click.Choice()is valid, thechoicesparameter is the more conventional approach for@click.option()when defining simple choice constraints.Apply this diff for a more idiomatic implementation:
@click.option( "--transport", "transport", default="streamable-http", - type=click.Choice(["streamable-http", "sse"]), + choices=["streamable-http", "sse"], help="Transport to use for the mcp server", )rogue/evaluator_agent/base_evaluator_agent.py (2)
362-370: Tool signature vs instructions mismatch; consider updating docs.You added
scenario_type: Optional[str]and**kwargsto_log_evaluation, but AGENT_INSTRUCTIONS still list 4 params. This won’t break (arg is optional), but updating the instructions avoids confusion. Also consider loggingkwargsin debug for traceability.
386-411: Type flexibility for scenario payload.
scenario: dict[str, str]is strict; you already normalize and validate with Pydantic. Considerdict[str, Any]to reflect reality and quiet type checkers when extra fields are present. No behavior change.- def _log_evaluation( - self, - scenario: dict[str, str], + def _log_evaluation( + self, + scenario: dict[str, Any],Also applies to: 412-433
packages/tui/internal/tui/common_controller.go (1)
48-59: Paste routing: good; also strip CR and dedupe cleaner.Currently you strip only "\n". Replace CRLF and reuse a small helper to avoid duplicating the cleanup logic in multiple branches.
- cleanText := strings.TrimSpace(strings.ReplaceAll(string(msg), "\n", "")) + raw := string(msg) + raw = strings.ReplaceAll(raw, "\r\n", "\n") + cleanText := strings.TrimSpace(strings.ReplaceAll(raw, "\n", ""))Optionally extract a
cleanClipboard(string) stringhelper and call it from both dialog and new‑evaluation branches.packages/tui/internal/tui/keyboard_controller.go (2)
65-81: Use rune length for cursor movement; avoid byte/rune mismatch.Right-arrow uses
len(string)(bytes). Use rune slices to match insertion/deletion logic.- case 0: // AgentURL text field - fieldLen := len(m.evalState.AgentURL) + case 0: // AgentURL text field + fieldLen := len([]rune(m.evalState.AgentURL)) if m.evalState.cursorPos < fieldLen { m.evalState.cursorPos++ } ... - case 3: // JudgeModel text field - fieldLen := len(m.evalState.JudgeModel) + case 3: // JudgeModel text field + fieldLen := len([]rune(m.evalState.JudgeModel)) if m.evalState.cursorPos < fieldLen { m.evalState.cursorPos++ }
80-91: Defaults wiring OK. Consider centralizing field indices.Initialization matches the new 6‑field layout. To prevent future index drift across files, define constants (e.g., FieldAgentURL = 0, FieldProtocol = 1, …) in one place and reuse.
packages/tui/internal/tui/eval_form.go (1)
230-237: Help text: add Tab hint for LLM config.Since Tab opens the Judge LLM dialog, surface it in the footer to aid discovery.
- helpText := helpStyle.Render("t Test Server ↑/↓ switch fields ←/→ move cursor/cycle dropdown Space toggle Enter activate Esc Back") + helpText := helpStyle.Render("t Test Server ↑/↓ switch fields ←/→ move cursor/cycle dropdown Space toggle Tab configure LLM Enter activate Esc Back")Also applies to: 251-252
packages/tui/internal/tui/eval_form_controller.go (3)
52-63: Arrow handling: good, but switch to rune length for Right.Left correctly guards
>0. For Right, use rune length for both text fields to match insertion/deletion behavior.- fieldLen := len(m.evalState.AgentURL) + fieldLen := len([]rune(m.evalState.AgentURL)) ... - fieldLen := len(m.evalState.JudgeModel) + fieldLen := len([]rune(m.evalState.JudgeModel))Also applies to: 65-81
126-149: Character insertion: fine; consider rejecting control runes.
len(s)==1lets in any printable, but might include control characters in some terminals. Optionally guard by checkingunicode.IsPrint.
16-21: Avoid magic indices across files.Define shared field index constants and reuse to reduce drift between controller, keyboard, and renderer.
Also applies to: 37-50
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.gitignore(1 hunks).vscode/launch.json(1 hunks)examples/mcp/tshirt_store_langgraph_mcp/__main__.py(1 hunks)packages/tui/internal/tui/common_controller.go(2 hunks)packages/tui/internal/tui/eval_form.go(5 hunks)packages/tui/internal/tui/eval_form_controller.go(4 hunks)packages/tui/internal/tui/eval_ui.go(4 hunks)packages/tui/internal/tui/evaluation.go(3 hunks)packages/tui/internal/tui/keyboard_controller.go(3 hunks)rogue/evaluator_agent/base_evaluator_agent.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/tui/internal/tui/eval_ui.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format Python code with Black
Ensure code passes flake8 linting
Run mypy with the repository configuration for static typing
Run Bandit security checks using .bandit.yaml configuration
Use isort import conventions for import ordering
Add type hints to all function signatures
Follow PEP 8 naming (snake_case for variables/functions, PascalCase for classes)
Use try/except around code that may raise exceptions
Files:
examples/mcp/tshirt_store_langgraph_mcp/__main__.pyrogue/evaluator_agent/base_evaluator_agent.py
🧬 Code graph analysis (5)
packages/tui/internal/tui/evaluation.go (2)
packages/tui/internal/tui/eval_ui.go (2)
Protocol(10-10)Transport(17-17)sdks/python/rogue_sdk/types.py (3)
Protocol(68-79)Transport(82-93)AuthType(23-48)
packages/tui/internal/tui/eval_form.go (1)
packages/tui/internal/styles/styles.go (3)
NewStyle(45-47)TextMuted(22-22)Primary(13-13)
packages/tui/internal/tui/eval_form_controller.go (1)
packages/tui/internal/components/llm_config_dialog.go (1)
NewLLMConfigDialog(183-270)
packages/tui/internal/tui/keyboard_controller.go (1)
packages/tui/internal/tui/eval_ui.go (3)
EvaluationViewState(29-56)ProtocolA2A(13-13)TransportHTTP(25-25)
packages/tui/internal/tui/common_controller.go (1)
packages/tui/internal/tui/eval_ui.go (3)
EvaluationViewState(29-56)ProtocolA2A(13-13)TransportHTTP(25-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: codestyle
- GitHub Check: rogue_sanity
🔇 Additional comments (10)
.gitignore (1)
94-95: Ignore Go TUI debug artifactsThe new ignore rule cleanly keeps local Go TUI debugger files out of source control. Looks good.
.vscode/launch.json (1)
66-73: Go launch config looks good; verify Delve/cwd behavior.Add "cwd": "${workspaceFolder}" if you see path issues, and confirm Go extension uses
dlvwith mode "auto" on your setup. Otherwise LGTM.packages/tui/internal/tui/common_controller.go (1)
203-214: Defaults for Protocol/Transport: sensible.Initializing
AgentProtocol=ProtocolA2AandAgentTransport=TransportHTTPaligns with server defaults. Keep the TODO to load from user config.packages/tui/internal/tui/keyboard_controller.go (1)
271-283: Enter handling aligns with new Start/Judge fields.Start on 5 and open LLM config on 3 looks correct.
packages/tui/internal/tui/evaluation.go (2)
35-44: AgentConfig extension: protocol/transport JSON keys look right.Matches server expectations (“protocol”, “transport”). Good addition.
421-469: StartEvaluation propagation: correct.Validates URLs and populates the new fields in the request. LGTM.
packages/tui/internal/tui/eval_form.go (2)
40-84: Text field renderer: solid and rune‑safe.Inline cursor with rune slicing is correct. Nice.
86-124: Dropdown and toggle renderers look good.Clear focus treatment and indicators. No issues.
Also applies to: 126-161
packages/tui/internal/tui/eval_form_controller.go (2)
22-35: Field navigation: OK.Up/Down remap to 0–5 with cursor placement at end for text fields looks correct.
Also applies to: 37-50
10-21: Health check shortcut is handy.Spinner + async health check wiring is neat.
| case "space": | ||
| if m.evalState.currentField == 2 { // DeepTest field is now index 2 | ||
| if m.evalState.currentField == 4 { // DeepTest field is now index 2 | ||
| m.evalState.DeepTest = !m.evalState.DeepTest | ||
| return m, nil | ||
| } |
There was a problem hiding this comment.
Update comment for DeepTest index.
Keep comment consistent with actual index 4.
- if m.evalState.currentField == 4 { // DeepTest field is now index 2
+ if m.evalState.currentField == 4 { // DeepTest field index📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case "space": | |
| if m.evalState.currentField == 2 { // DeepTest field is now index 2 | |
| if m.evalState.currentField == 4 { // DeepTest field is now index 2 | |
| m.evalState.DeepTest = !m.evalState.DeepTest | |
| return m, nil | |
| } | |
| case "space": | |
| if m.evalState.currentField == 4 { // DeepTest field index | |
| m.evalState.DeepTest = !m.evalState.DeepTest | |
| return m, nil | |
| } |
🤖 Prompt for AI Agents
In packages/tui/internal/tui/eval_form_controller.go around lines 84 to 88, the
inline comment stating "DeepTest field is now index 2" is incorrect and should
reflect the actual index 4; update the comment to "DeepTest field is now index
4" (or remove the outdated comment) so it matches the code and avoids confusion.
| AgentProtocol: ProtocolA2A, | ||
| AgentTransport: TransportHTTP, | ||
| JudgeModel: judgeModel, | ||
| ParallelRuns: 1, | ||
| DeepTest: false, |
There was a problem hiding this comment.
Stale comment.
Comment says “DeepTest field is now index 2” but code uses index 4. Update the comment to avoid confusion.
- if m.evalState.currentField == 4 { // DeepTest field is now index 2
+ if m.evalState.currentField == 4 { // DeepTest field indexCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In packages/tui/internal/tui/keyboard_controller.go around lines 84 to 88, the
inline comment referring to the field index for DeepTest is stale (it says
"DeepTest field is now index 2" while the code shows it at index 4); update that
comment to reflect the current index (4) or remove the misleading index mention
so it no longer conflicts with the actual struct field order.
* Add protocol and transport settings to tui * Fix mcp example command * Fix log_evaluation kwargs
Description
Motivation and Context
Type of Change
Changes Made
Screenshots/Examples (if applicable)
Checklist
uv run black .to format my codeuv run flake8 .and fixed all issuesuv run mypy --config-file .mypy.ini .and addressed type checking issuesuv run bandit -c .bandit.yaml -r .for security checksuv run pytestand all tests passTesting
Test Configuration:
Test Steps:
1.
2.
3.
Additional Notes
Related Issues/PRs