Skip to content

Conversation

@drorIvry
Copy link
Contributor

@drorIvry drorIvry commented Jan 4, 2026

  • added python entrypoint that works with policy
  • added python entrypoint that works with red teaming
  • fixed some bugz

Description

Motivation and Context

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🎨 Code style/refactoring (no functional changes)
  • 🧪 Test updates
  • 🔧 Configuration/build changes

Changes Made

Screenshots/Examples (if applicable)

Checklist

  • I have read the CONTRIBUTING.md guide
  • My code follows the code style of this project (PEP 8, type hints, docstrings)
  • I have run uv run black . to format my code
  • I have run uv run flake8 . and fixed all issues
  • I have run uv run mypy --config-file .mypy.ini . and addressed type checking issues
  • I have run uv run bandit -c .bandit.yaml -r . for security checks
  • I have added tests that prove my fix is effective or that my feature works
  • I have run uv run pytest and all tests pass
  • I have manually tested my changes
  • I have updated the documentation accordingly
  • I have added/updated type hints for new/modified functions
  • My changes generate no new warnings
  • I have checked my code for security issues
  • Any dependent changes have been merged and published

Testing

Test Configuration:

  • Python version:
  • OS:
  • Other relevant details:

Test Steps:
1.
2.
3.

Additional Notes

Related Issues/PRs

  • Fixes #
  • Related to #
  • Depends on #

@drorIvry drorIvry requested a review from yuval-qf as a code owner January 4, 2026 21:35
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

Summary by CodeRabbit

  • New Features

    • Evaluate locally-run Python agents via a new Python protocol (SDK, server, orchestrators, red-team flows).
    • CLI/TUI accept a Python entrypoint file instead of a network address for Python evaluations; SDK supports passing a Python entrypoint.
  • Improvements

    • TUI/CLI forms, navigation, validation, and persistence now remember and adapt for Python entrypoints.
    • Examples and README updated with Python entrypoint usage and templates.
  • Other

    • WebSocket manager adds a connection-check helper.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds a Python protocol that runs user-supplied Python entrypoint modules (call_agent), with CLI/TUI/SDK/server wiring, new Python evaluator and red-team agents, config persistence, UI form changes, and protocol-aware request/evaluation flow.

Changes

Cohort / File(s) Summary
Docs & Examples
README.md, examples/python_entrypoint_stub.py, examples/tshirt_store_langgraph_agent/python_entrypoint_shirtify.py
Add Python protocol docs and examples: entrypoint stub (sync/async), Shirtify integration, README protocol table and usage guidance.
TUI: forms, types, controllers
packages/tui/internal/screens/evaluations/form_view.go, packages/tui/internal/screens/evaluations/types.go, packages/tui/internal/tui/eval_form_view.go, packages/tui/internal/tui/eval_types.go, packages/tui/internal/tui/common_controller.go, packages/tui/internal/tui/form_controller.go, packages/tui/internal/tui/keyboard_controller.go, packages/tui/internal/tui/utils.go
Introduce PythonEntrypointFile, move Protocol field first, omit Transport for Python, route paste/typing to entrypoint, adjust navigation/cursor logic, persist user_config.json, and build protocol-aware eval requests.
Evaluator agent core & factory
rogue/evaluator_agent/base_evaluator_agent.py, rogue/evaluator_agent/evaluator_agent_factory.py, rogue/evaluator_agent/run_evaluator_agent.py
Allow missing transport for Python, thread python_entrypoint_file through APIs, add PYTHON handling in factory and factory signatures.
Python evaluator & red-team agents
rogue/evaluator_agent/python/__init__.py, rogue/evaluator_agent/python/python_evaluator_agent.py, rogue/evaluator_agent/red_team/python_red_team_attacker_agent.py, rogue/evaluator_agent/red_team/factory.py, rogue/evaluator_agent/red_team/base_red_team_attacker_agent.py
Add PythonEvaluatorAgent and PythonRedTeamAttackerAgent that dynamically load entrypoint modules, validate call_agent, manage per-session histories, support sync/async call_agent, and update red-team factory/dispatch and logging/streaming.
CLI & models
rogue/models/cli_input.py, rogue/run_cli.py
Add --python-entrypoint-file flag and ping/validate entrypoint, make URL/transport optional for PYTHON, and propagate python_entrypoint_file through CLI flows.
Server orchestration & services
rogue/server/core/evaluation_orchestrator.py, rogue/server/core/red_team_orchestrator.py, rogue/server/services/evaluation_service.py, rogue/server/services/red_team_service.py, rogue/server/websocket/manager.py
Thread python_entrypoint_file into orchestrators/services, add WebSocket has_connections accessor, small logging and connection-wait changes.
SDK & types
sdks/python/rogue_sdk/sdk.py, sdks/python/rogue_sdk/types.py
Add PYTHON protocol, make agent_url/scenarios optional, add python_entrypoint_file parameter, and enforce protocol-aware validation/request shaping.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant CLI_TUI as CLI / TUI
    participant Validator as ping_python_entrypoint
    participant SDK as RogueSDK
    participant Orchestrator as EvaluationOrchestrator
    participant Agent as PythonEvaluatorAgent
    participant Module as User Python Module

    User->>CLI_TUI: provide python_entrypoint_file
    CLI_TUI->>Validator: validate file & check call_agent
    Validator-->>CLI_TUI: validation result
    CLI_TUI->>SDK: run_evaluation(python_entrypoint_file)
    SDK->>Orchestrator: create orchestrator (python_entrypoint_file)
    Orchestrator->>Agent: arun_evaluator_agent(..., python_entrypoint_file)
    Agent->>Agent: __aenter__ → _load_python_module()
    Agent->>Module: import module, bind call_agent
    Module-->>Agent: call_agent available
    loop interactions
      Agent->>Agent: _send_message_to_evaluated_agent(messages)
      Agent->>Module: invoke call_agent(messages[, context_id])
      Module-->>Agent: string response (sync/async)
      Agent-->>Orchestrator: evaluation event / result
    end
    Agent->>Agent: __aexit__ cleanup
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • yuval-qf
  • osher-qualifire

Poem

🐰 I nibble on code beneath moonlight’s gleam,
A tiny call_agent that hops through a dream.
Sessions kept warm, messages loop and play,
I load your module and echo away.
Hooray for Python agents — let’s run and beam!

Pre-merge checks

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning Description is incomplete; required sections (Motivation/Context, Changes Made, Testing) are empty, and checklist items are unchecked. Complete all template sections with specific details: explain the motivation, list changes made, provide testing steps, link related issues, and verify checklist items are addressed.
Docstring Coverage ⚠️ Warning Docstring coverage is 71.93% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive Title is vague and unclear; 'python ts entrypoint' does not clearly convey what the feature does or its scope. Use a more descriptive title like 'Add Python entrypoint protocol support for policy and red-team evaluations' to clearly communicate the main change.

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0648511 and 90c2970.

📒 Files selected for processing (1)
  • README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: codestyle

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
rogue/server/websocket/manager.py (1)

46-91: Replace WebSocket ping implementation or clarify it as application-level heartbeat.

FastAPI/Starlette WebSocket API does not provide a method to send protocol-level ping/pong frames. The send_bytes(b"ping") at line 69 sends binary application data, not a WebSocket control frame. This means the keepalive mechanism won't prevent timeout errors at the protocol level as intended. Either:

  1. Document this as an application-level heartbeat and verify the gorilla/websocket client expects this pattern
  2. Switch to a WebSocket library that supports real ping/pong frames (e.g., websockets library)
  3. Update the docstring and comments to remove references to "WebSocket ping frames" and "protocol-level" ping/pong

Also remove or correct the misleading comment at line 66: "FastAPI/Starlette WebSocket uses send() with bytes for ping" — this is not accurate.

rogue/evaluator_agent/run_evaluator_agent.py (1)

270-300: Add python_entrypoint_file parameter to the synchronous wrapper.

The synchronous wrapper run_evaluator_agent doesn't accept the python_entrypoint_file parameter, making it unusable for Python protocol evaluations. This creates an inconsistency in the API surface where the async version supports Python protocol but the sync version does not.

🔎 Proposed fix
 def run_evaluator_agent(
     protocol: Protocol,
     transport: Optional[Transport],
     evaluated_agent_url: str,
     auth_type: AuthType,
     auth_credentials: Optional[str],
     judge_llm: str,
     judge_llm_api_key: Optional[str],
     scenarios: Scenarios,
     business_context: str,
     deep_test_mode: bool,
+    python_entrypoint_file: Optional[str] = None,
 ) -> EvaluationResults:
     """Synchronous wrapper for arun_evaluator_agent."""

     async def run_evaluator_agent_task():
         async for update_type, data in arun_evaluator_agent(
             protocol=protocol,
             transport=transport,
             evaluated_agent_url=evaluated_agent_url,
             auth_type=auth_type,
             auth_credentials=auth_credentials,
             judge_llm=judge_llm,
             judge_llm_api_key=judge_llm_api_key,
             scenarios=scenarios,
             business_context=business_context,
             deep_test_mode=deep_test_mode,
+            python_entrypoint_file=python_entrypoint_file,
         ):
             if update_type == "results":
                 return data
rogue/run_cli.py (1)

597-600: Error message may be confusing for Python protocol.

When using Python protocol, cli_input.evaluated_agent_url could be None, producing a confusing error message. Consider using evaluated_agent_identifier instead.

🔎 Suggested fix
     if not results:
         raise ValueError(
-            f"No scenarios were evaluated for {cli_input.evaluated_agent_url}",
+            f"No scenarios were evaluated for {evaluated_agent_identifier}",
         )
🧹 Nitpick comments (12)
README.md (1)

84-98: Document error handling and invocation contract for call_agent.

The function signature and basic example are clear, but the documentation should clarify:

  • Error handling: What happens if call_agent raises an exception?
  • Invocation contract: Is call_agent called once per user message or once per evaluation session?
  • State management: Should the function maintain state between calls, or is each call independent?
  • Timeout/performance: Are there timeout constraints or expected response times?

Consider adding these details to prevent misunderstandings when users implement their agents.

rogue/server/websocket/manager.py (1)

111-116: Simplify the connection check.

The length check is redundant because disconnect() (line 104) removes the job_id key when the list becomes empty. If the key exists in the dictionary, it will always have at least one connection.

🔎 Simplified implementation
 def has_connections(self, job_id: str) -> bool:
     """Check if there are any active WebSocket connections for a job."""
-    return (
-        job_id in self.active_connections
-        and len(self.active_connections[job_id]) > 0
-    )
+    return job_id in self.active_connections
examples/tshirt_store_langgraph_agent/python_entrypoint_shirtify.py (1)

51-52: Consider defensive access for message content.

If a malformed message dict without a "content" key is passed, this will raise a KeyError. While the contract specifies messages should have 'role' and 'content' keys, defensive access could provide a clearer error:

🔎 Optional defensive fix
     # Get the latest user message
-    latest_message = messages[-1]["content"]
+    latest_message = messages[-1].get("content", "")
rogue/evaluator_agent/red_team/base_red_team_attacker_agent.py (1)

202-218: Consider log level for message content.

The logger.info calls at lines 202 and 213 log message content previews. While helpful for debugging, INFO-level logging of attack messages and responses could be verbose in production and might inadvertently log sensitive content.

🔎 Optional: Use DEBUG level for message content
             # Queue outgoing message (attack) - use "Rogue" role for TUI display
-            logger.info(f"📤 Queueing outgoing attack message: {message[:50]}...")
+            logger.debug(f"📤 Queueing outgoing attack message: {message[:50]}...")
             await update_queue.put(
                 ChatUpdate(role="Rogue", content=message),
             )
             # Yield control to allow queue consumer to process
             await asyncio.sleep(0)

             # Send the actual message
             response = await self._send_message_to_evaluated_agent(message, session_id)

             # Queue response - use "Agent Under Test" role for TUI display
-            logger.info(f"📥 Queueing agent response: {response[:50]}...")
+            logger.debug(f"📥 Queueing agent response: {response[:50]}...")

The asyncio.sleep(0) calls are a good pattern for yielding control to allow the queue consumer to process updates in a timely manner.

packages/tui/internal/tui/keyboard_controller.go (1)

162-187: Consider refactoring to reduce code duplication.

The cursor clamping and string insertion logic is duplicated between the Python and non-Python branches. Extracting this into a helper function would improve maintainability.

🔎 Suggested refactor
// Helper function to insert character at cursor position
func insertCharAtCursor(s string, cursorPos *int, char string) string {
	runes := []rune(s)
	// Clamp cursor position to valid range
	if *cursorPos > len(runes) {
		*cursorPos = len(runes)
	}
	result := string(runes[:*cursorPos]) + char + string(runes[*cursorPos:])
	(*cursorPos)++
	return result
}

// Then in the switch case:
case EvalFieldAgentURL:
	// Use Python file or Agent URL based on protocol
	if m.evalState.AgentProtocol == ProtocolPython {
		m.evalState.PythonEntrypointFile = insertCharAtCursor(
			m.evalState.PythonEntrypointFile, 
			&m.evalState.cursorPos, 
			s,
		)
	} else {
		m.evalState.AgentURL = insertCharAtCursor(
			m.evalState.AgentURL, 
			&m.evalState.cursorPos, 
			s,
		)
	}
	// Save config after text change
	go saveUserConfig(...)
packages/tui/internal/screens/evaluations/form_view.go (1)

283-299: Consider using constants for field indices.

The hardcoded field indices (0, 1, 2, 3, 4, 5) make the code fragile and harder to maintain. If field ordering changes or new fields are added, these magic numbers would need to be updated in multiple places.

Consider defining constants or using the FormField enum values referenced elsewhere in the codebase.

Example approach
// Define constants at package level or reuse existing FormField enum
const (
    FieldIndexProtocol = 0
    FieldIndexAgentURLOrPython = 1
    FieldIndexTransport = 2
    FieldIndexJudgeLLM = 3
    FieldIndexDeepTest = 4
    FieldIndexMode = 5
)

// Then use them in rendering:
renderDropdownField(FieldIndexProtocol, "Protocol:", protocol),
renderTextField(FieldIndexAgentURLOrPython, "Python File:", pythonFile),
// ...
rogue/server/services/red_team_service.py (1)

107-124: Consider exponential backoff for WebSocket connection wait.

The WebSocket wait logic uses a fixed 0.1s interval in a busy-wait loop. While functional, exponential backoff would be more efficient and reduce CPU usage during the wait period.

🔎 Proposed refactor with exponential backoff
-            # Wait for WebSocket client to connect before sending updates
-            # This ensures chat messages aren't lost
-            max_wait = 5.0  # seconds
-            wait_interval = 0.1
-            waited = 0.0
-            while waited < max_wait:
-                if self.websocket_manager.has_connections(job_id):
-                    logger.info(
-                        f"WebSocket client connected for job {job_id}, starting scan",
-                    )
-                    break
-                await asyncio.sleep(wait_interval)
-                waited += wait_interval
+            # Wait for WebSocket client with exponential backoff
+            max_wait = 5.0  # seconds
+            wait_interval = 0.05
+            max_interval = 0.5
+            waited = 0.0
+            while waited < max_wait:
+                if self.websocket_manager.has_connections(job_id):
+                    logger.info(
+                        f"WebSocket client connected for job {job_id}, starting scan",
+                    )
+                    break
+                await asyncio.sleep(wait_interval)
+                waited += wait_interval
+                wait_interval = min(wait_interval * 1.5, max_interval)
             else:
                 logger.warning(
                     f"No WebSocket client connected after {max_wait}s, "
                     f"proceeding anyway for job {job_id}",
                 )
packages/tui/internal/tui/form_controller.go (1)

220-258: Consider extracting text field handling into a helper.

The backspace handling has duplicated logic for PythonEntrypointFile and AgentURL with identical patterns (cursor clamping, string manipulation, cursor decrement, config save). The same pattern appears in the default case for character insertion (lines 279-303).

🔎 Potential helper function extraction

A helper function could reduce duplication:

// Helper to handle text field editing
func (m *Model) editTextField(field *string, cursorPos *int, op func(runes []rune, pos int) ([]rune, int)) {
    runes := []rune(*field)
    if *cursorPos > len(runes) {
        *cursorPos = len(runes)
    }
    newRunes, newPos := op(runes, *cursorPos)
    *field = string(newRunes)
    *cursorPos = newPos
}

This is optional and can be deferred to a later refactor.

rogue/evaluator_agent/evaluator_agent_factory.py (1)

76-101: LGTM! Python protocol handling in factory.

The factory correctly:

  • Validates python_entrypoint_file is provided for PYTHON protocol
  • Creates PythonEvaluatorAgent with the correct parameters
  • Uses debug logging for troubleshooting

Minor suggestion: The from loguru import logger import at line 78 could be moved to module level (line 11) for consistency with other factory files, but this is a stylistic preference.

packages/tui/internal/tui/utils.go (1)

594-607: Variadic parameter for optional value is unconventional.

Using pythonEntrypointFile ...string for a single optional parameter works but is unusual in Go. More idiomatic approaches would be:

  • Using a pointer: pythonEntrypointFile *string
  • Using an options struct
  • Using empty string as "not provided" indicator

However, the current approach is functional and doesn't cause issues.

packages/tui/internal/tui/eval_types.go (1)

142-144: Consider logging errors for debuggability.

The os.Getwd() error is silently ignored. While returning an empty config is acceptable graceful degradation, logging this failure would help users debug config loading issues.

🔎 Suggested improvement
 func loadUserConfigFromWorkdir() UserConfigFromFile {
-	wd, _ := os.Getwd()
+	wd, err := os.Getwd()
+	if err != nil {
+		return UserConfigFromFile{}
+	}
 	dir := wd
rogue/run_cli.py (1)

481-486: Consider moving ast import to module level.

Per PEP 8 and isort conventions, imports should typically be at module level. Since ast is a standard library module with no overhead, it can be moved to the top.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19f577a and 9df9e2c.

📒 Files selected for processing (28)
  • README.md
  • examples/python_entrypoint_stub.py
  • examples/tshirt_store_langgraph_agent/python_entrypoint_shirtify.py
  • packages/tui/internal/screens/evaluations/form_view.go
  • packages/tui/internal/screens/evaluations/types.go
  • packages/tui/internal/tui/common_controller.go
  • packages/tui/internal/tui/eval_form_view.go
  • packages/tui/internal/tui/eval_types.go
  • packages/tui/internal/tui/form_controller.go
  • packages/tui/internal/tui/keyboard_controller.go
  • packages/tui/internal/tui/utils.go
  • rogue/evaluator_agent/base_evaluator_agent.py
  • rogue/evaluator_agent/evaluator_agent_factory.py
  • rogue/evaluator_agent/python/__init__.py
  • rogue/evaluator_agent/python/python_evaluator_agent.py
  • rogue/evaluator_agent/red_team/base_red_team_attacker_agent.py
  • rogue/evaluator_agent/red_team/factory.py
  • rogue/evaluator_agent/red_team/python_red_team_attacker_agent.py
  • rogue/evaluator_agent/run_evaluator_agent.py
  • rogue/models/cli_input.py
  • rogue/run_cli.py
  • rogue/server/core/evaluation_orchestrator.py
  • rogue/server/core/red_team_orchestrator.py
  • rogue/server/services/evaluation_service.py
  • rogue/server/services/red_team_service.py
  • rogue/server/websocket/manager.py
  • sdks/python/rogue_sdk/sdk.py
  • sdks/python/rogue_sdk/types.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use black for code formatting
Use type hints for all function signatures
Follow PEP 8 naming conventions (snake_case for variables and functions, PascalCase for classes)
Use try/except blocks for code that may raise exceptions
Use isort conventions for import organization

Files:

  • rogue/evaluator_agent/python/__init__.py
  • examples/tshirt_store_langgraph_agent/python_entrypoint_shirtify.py
  • rogue/evaluator_agent/run_evaluator_agent.py
  • sdks/python/rogue_sdk/sdk.py
  • rogue/server/services/evaluation_service.py
  • rogue/evaluator_agent/base_evaluator_agent.py
  • rogue/server/core/evaluation_orchestrator.py
  • sdks/python/rogue_sdk/types.py
  • rogue/evaluator_agent/red_team/base_red_team_attacker_agent.py
  • rogue/evaluator_agent/red_team/python_red_team_attacker_agent.py
  • rogue/run_cli.py
  • examples/python_entrypoint_stub.py
  • rogue/models/cli_input.py
  • rogue/server/websocket/manager.py
  • rogue/server/core/red_team_orchestrator.py
  • rogue/evaluator_agent/evaluator_agent_factory.py
  • rogue/evaluator_agent/python/python_evaluator_agent.py
  • rogue/evaluator_agent/red_team/factory.py
  • rogue/server/services/red_team_service.py
🧬 Code graph analysis (16)
rogue/evaluator_agent/python/__init__.py (1)
rogue/evaluator_agent/python/python_evaluator_agent.py (1)
  • PythonEvaluatorAgent (20-283)
packages/tui/internal/tui/form_controller.go (2)
packages/tui/internal/tui/eval_types.go (9)
  • ProtocolPython (17-17)
  • EvalFieldTransport (48-48)
  • EvalFieldAgentURL (47-47)
  • EvaluationMode (31-31)
  • EvalFieldEvaluationMode (51-51)
  • EvalFieldScanType (59-59)
  • EvaluationModeRedTeam (35-35)
  • EvalFieldProtocol (46-46)
  • EvalFieldJudgeModel (49-49)
sdks/python/rogue_sdk/types.py (1)
  • EvaluationMode (58-62)
packages/tui/internal/tui/keyboard_controller.go (2)
packages/tui/internal/tui/eval_types.go (2)
  • ProtocolPython (17-17)
  • EvaluationMode (31-31)
sdks/python/rogue_sdk/types.py (1)
  • EvaluationMode (58-62)
examples/tshirt_store_langgraph_agent/python_entrypoint_shirtify.py (1)
examples/python_entrypoint_stub.py (1)
  • call_agent (22-85)
sdks/python/rogue_sdk/sdk.py (1)
sdks/python/rogue_sdk/types.py (4)
  • Scenarios (393-411)
  • AgentConfig (284-348)
  • get_default_transport (82-90)
  • EvaluationRequest (743-770)
rogue/evaluator_agent/base_evaluator_agent.py (1)
sdks/python/rogue_sdk/types.py (1)
  • is_valid_for_protocol (103-104)
sdks/python/rogue_sdk/types.py (1)
rogue/models/cli_input.py (1)
  • check_protocol_requirements (68-92)
rogue/evaluator_agent/red_team/base_red_team_attacker_agent.py (2)
rogue/evaluator_agent/red_team/models.py (1)
  • ChatUpdate (33-40)
rogue/evaluator_agent/red_team/python_red_team_attacker_agent.py (1)
  • _send_message_to_evaluated_agent (160-218)
packages/tui/internal/tui/utils.go (2)
packages/tui/internal/tui/eval_types.go (5)
  • Protocol (12-12)
  • Transport (20-20)
  • ProtocolPython (17-17)
  • EvaluationMode (31-31)
  • RedTeamConfig (71-78)
sdks/python/rogue_sdk/types.py (6)
  • Protocol (75-90)
  • Transport (93-104)
  • AuthType (23-48)
  • AgentConfig (284-348)
  • EvaluationMode (58-62)
  • RedTeamConfig (151-179)
rogue/run_cli.py (1)
sdks/python/rogue_sdk/types.py (3)
  • Transport (93-104)
  • Protocol (75-90)
  • AuthType (23-48)
examples/python_entrypoint_stub.py (1)
examples/tshirt_store_langgraph_agent/python_entrypoint_shirtify.py (1)
  • call_agent (24-57)
packages/tui/internal/tui/eval_form_view.go (2)
packages/tui/internal/tui/eval_types.go (4)
  • Protocol (12-12)
  • Transport (20-20)
  • EvaluationMode (31-31)
  • ScanType (62-62)
sdks/python/rogue_sdk/types.py (5)
  • Protocol (75-90)
  • Transport (93-104)
  • Scenarios (393-411)
  • EvaluationMode (58-62)
  • ScanType (143-148)
rogue/evaluator_agent/evaluator_agent_factory.py (2)
rogue/evaluator_agent/python/python_evaluator_agent.py (1)
  • PythonEvaluatorAgent (20-283)
sdks/python/rogue_sdk/types.py (3)
  • Protocol (75-90)
  • Transport (93-104)
  • Scenarios (393-411)
packages/tui/internal/tui/eval_types.go (2)
sdks/python/rogue_sdk/types.py (2)
  • Protocol (75-90)
  • Transport (93-104)
packages/tui/internal/screens/redteam/catalog.go (4)
  • GetBasicScanVulnerabilities (387-402)
  • GetBasicScanAttacks (406-414)
  • GetFreeVulnerabilities (417-425)
  • GetFreeAttacks (428-436)
rogue/server/services/red_team_service.py (2)
rogue/server/websocket/manager.py (1)
  • has_connections (111-116)
rogue/server/core/red_team_orchestrator.py (1)
  • RedTeamOrchestrator (29-164)
packages/tui/internal/screens/evaluations/types.go (2)
packages/tui/internal/tui/eval_types.go (4)
  • Protocol (12-12)
  • Transport (20-20)
  • EvaluationMode (31-31)
  • ScanType (62-62)
sdks/python/rogue_sdk/types.py (4)
  • Protocol (75-90)
  • Transport (93-104)
  • EvaluationMode (58-62)
  • ScanType (143-148)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: rogue_sanity
🔇 Additional comments (63)
README.md (1)

74-74: All documented Python entrypoint features are correctly implemented.

Verification confirms:

  • Example file exists at ./examples/python_entrypoint_stub.py with correct usage documentation
  • CLI flag --python-entrypoint-file is properly implemented in rogue/run_cli.py and integrated throughout CLI, SDK, and service layers
  • Function signature call_agent(messages: list[dict[str, Any]]) -> str matches implementation in all example files
  • PYTHON protocol is properly integrated with validation and support across evaluation and red team agents

No issues found. Documentation is accurate and complete.

rogue/evaluator_agent/python/__init__.py (1)

1-5: LGTM!

Clean package initialization following Python conventions. The __all__ declaration properly controls the public API.

rogue/evaluator_agent/base_evaluator_agent.py (1)

180-184: LGTM!

The guard correctly handles the Python protocol case where transport is None. This prevents an AttributeError from calling is_valid_for_protocol() on None while maintaining validation for protocols that do use transports.

examples/python_entrypoint_stub.py (1)

1-117: LGTM!

Well-documented example file with clear usage instructions and integration examples. The code handles the empty messages case correctly with the ternary operator and follows Python type hint conventions.

rogue/evaluator_agent/python/python_evaluator_agent.py (5)

31-75: LGTM!

Constructor properly initializes the agent with Python-specific settings, passing transport=None and Protocol.PYTHON to the base class. The initialization of internal state (_module, _call_agent_fn, _session_histories, _added_sys_path) is well-organized.


104-112: Thread-safety consideration for sys.path modification.

Modifying sys.path is not thread-safe. If multiple PythonEvaluatorAgent instances run concurrently with different entrypoint directories, they could interfere with each other's import resolution.

For this use case (evaluation tooling, likely single-threaded), this is acceptable. If concurrent evaluation becomes a requirement, consider using importlib.util.spec_from_file_location with explicit submodule handling or process isolation.


147-171: LGTM!

Proper async context management with cleanup of sys.path and session histories. The try/except ValueError for sys.path.remove handles the edge case where the path was already removed.


173-258: LGTM!

The dual history tracking serves distinct purposes: _context_id_to_chat_history (inherited) for evaluation logging and _session_histories for the call_agent function. The error handling properly catches exceptions and returns a structured error response while logging details.


260-283: LGTM!

Clean handling of both synchronous and asynchronous call_agent functions using asyncio.iscoroutine. The string coercion at line 281 ensures a consistent return type regardless of what the user's function returns.

packages/tui/internal/screens/evaluations/types.go (2)

9-10: LGTM!

The dual-purpose comment for FormFieldAgentURL clearly documents its use for both the agent URL and Python entrypoint file path depending on the protocol. This is a pragmatic design that avoids proliferating form field indices.


31-45: LGTM!

The FormState struct cleanly separates AgentURL and PythonEntrypointFile as distinct fields while the editing state comments (lines 43-45) properly document how field navigation changes based on protocol and evaluation mode.

rogue/evaluator_agent/red_team/base_red_team_attacker_agent.py (1)

172-173: LGTM!

Debug-level logging for yielded updates helps trace the message flow during red team execution without being too verbose.

packages/tui/internal/tui/eval_form_view.go (1)

28-51: LGTM!

The addition of PythonEntrypointFile to the FormState is consistent with the Python protocol support being introduced. The field mapping is straightforward and correct.

rogue/server/services/evaluation_service.py (2)

119-136: LGTM! Python protocol correctly handles missing URL.

The change to return an empty string when evaluated_agent_url is None aligns with the Python protocol workflow, where no network URL is required (using the entrypoint file instead). The threading of python_entrypoint_file through to the orchestrator is implemented correctly.


145-149: Remove unnecessary None check on protocol field.

The agent_config.protocol field is not optional and always has a value (defaults to Protocol.A2A). The AgentConfig class validates protocol requirements via @model_validator, ensuring protocol is never None. The ternary expression agent_config.protocol.value if agent_config.protocol else None is unnecessary—simplify to agent_config.protocol.value.

Likely an incorrect or invalid review comment.

rogue/server/core/evaluation_orchestrator.py (2)

31-61: LGTM! Clean parameter threading.

The addition of python_entrypoint_file parameter is well-typed and consistently threaded through the orchestrator. The type hint str | None = None correctly indicates this is an optional parameter for Python protocol workflows.


82-83: Good defensive coding for protocol logging.

Using self.protocol.value if self.protocol else None prevents potential AttributeError if protocol is somehow None. This is consistent with the defensive coding pattern seen in the evaluation service.

packages/tui/internal/screens/evaluations/form_view.go (1)

275-300: No action needed—form navigation correctly handles the field index gap.

The navigation logic in form_controller.go explicitly detects and skips the Transport field (index 2) when Python protocol is selected, with matching skip conditions in both up/down navigation (lines 23–28 and 49–55). The implementation uses enum-based field identification (EvalFormField) rather than relying on contiguous indices, making the skip logic type-safe and maintainable. The form rendering in form_view.go also correctly omits Transport for Python protocol. This design is intentional and working as documented.

Likely an incorrect or invalid review comment.

rogue/server/core/red_team_orchestrator.py (2)

59-59: LGTM! Python entrypoint support properly threaded through.

The python_entrypoint_file parameter is correctly added to the constructor, stored as an instance variable, and passed to the attacker agent factory. This enables Python protocol support for red team scans.

Also applies to: 79-79, 102-102


144-147: LGTM! Improved chat update visibility.

Elevating ChatUpdate logging from debug to info level with structured messages improves observability during red team scans.

rogue/server/services/red_team_service.py (1)

150-150: LGTM! Python entrypoint support and enhanced logging.

The python_entrypoint_file parameter is correctly passed to the orchestrator, and the chat update logging enhancement improves debugging capabilities.

Also applies to: 162-165

rogue/models/cli_input.py (3)

15-19: LGTM! Field types correctly updated for Python protocol support.

The field type changes appropriately make transport and evaluated_agent_url optional (None for PYTHON protocol) and introduce the new python_entrypoint_file field. This aligns with the protocol-specific requirements enforced by the validator below.


67-92: LGTM! Comprehensive protocol-specific validation.

The check_protocol_requirements validator properly enforces protocol-specific constraints:

  • For PYTHON protocol: requires python_entrypoint_file and validates file existence
  • For other protocols: requires evaluated_agent_url and transport

The validation runs after auth and red-team validators, which is the correct order. File existence checks prevent runtime errors downstream.


107-107: LGTM! PartialCLIInput updated consistently.

The python_entrypoint_file field is correctly added to PartialCLIInput with proper typing and default value.

sdks/python/rogue_sdk/sdk.py (3)

242-244: LGTM! API signature correctly updated for Python protocol support.

The parameter changes appropriately make agent_url and scenarios optional while adding python_entrypoint_file. This enables the SDK to support Python protocol evaluations where no agent URL is needed.

Also applies to: 257-257


259-282: LGTM! Comprehensive docstring documentation.

The docstring thoroughly documents all parameters including the new python_entrypoint_file, clearly explaining when each parameter is required based on the protocol type. This provides excellent guidance for SDK users.


286-286: LGTM! Proper conditional handling for Python protocol.

The conditional logic correctly handles:

  • Line 286: Creates HttpUrl only when agent_url is provided, otherwise None
  • Line 296: Passes python_entrypoint_file to AgentConfig
  • Line 301: Safely accesses scenarios.scenarios with fallback to empty list

The AgentConfig validation will catch any protocol requirement violations, making this implementation safe.

Also applies to: 296-296, 301-301

packages/tui/internal/tui/common_controller.go (2)

41-77: LGTM! Protocol-aware paste handling.

The paste handling correctly distinguishes between Python protocol (routing to PythonEntrypointFile) and other protocols (routing to AgentURL). The cursor advancement and config persistence after paste are properly implemented.


250-365: LGTM! Comprehensive NewEvaluationScreen initialization with config loading.

The initialization flow properly:

  • Loads user config and applies defaults
  • Handles Python entrypoint file persistence
  • Loads red team config state and applies presets when no selections exist
  • Caches the redTeamConfigState for later use

The preset application logic for scan types (basic/full) is well-structured.

sdks/python/rogue_sdk/types.py (3)

80-90: LGTM! Python protocol enum and default transport handling.

The Protocol.PYTHON enum value and the get_default_transport() returning None for Python protocol are correctly implemented. The # type: ignore comment is appropriate since None is not a Transport type but is semantically correct for this protocol.


331-344: Validator doesn't check file existence (by design).

Unlike the CLIInput.check_protocol_requirements validator in rogue/models/cli_input.py (lines 67-91), this validator only checks for the presence of python_entrypoint_file but doesn't validate that the file exists or is a valid file. This is acceptable since:

  1. The SDK types are for API serialization/deserialization
  2. File existence validation happens at runtime in the evaluator agent classes

This design is consistent with separation of concerns.


836-849: LGTM! RedTeamRequest protocol validation.

The validator correctly enforces that python_entrypoint_file is required for PYTHON protocol and evaluated_agent_url is required for non-PYTHON protocols, mirroring the AgentConfig validation logic.

packages/tui/internal/tui/form_controller.go (2)

25-28: LGTM! Transport field skipping for Python protocol.

The navigation logic correctly skips the Transport field when Python protocol is selected, since Python protocol doesn't use network transport.

Also applies to: 52-55


82-127: LGTM! Config persistence on field changes.

The asynchronous saveUserConfig calls after protocol, transport, evaluation mode, and scan type changes ensure user preferences are persisted without blocking the UI.

rogue/evaluator_agent/evaluator_agent_factory.py (1)

103-113: LGTM! A2A/MCP protocol validation.

The validation ensures evaluated_agent_address is provided for non-Python protocols, and scenarios defaults to an empty Scenarios() if not provided.

rogue/evaluator_agent/red_team/factory.py (2)

15-36: LGTM! TypedDict for type safety.

The _CommonKwargs TypedDict provides clear documentation of the expected keyword arguments and improves type checking for the factory function.


135-151: LGTM! Protocol-specific agent creation.

The factory correctly:

  • Validates URL requirement for A2A and MCP protocols
  • Validates python_entrypoint_file requirement for PYTHON protocol
  • Passes python_entrypoint_file as additional kwarg to PythonRedTeamAttackerAgent

Note: common_kwargs includes evaluated_agent_address which will be None for PYTHON protocol, but PythonRedTeamAttackerAgent handles this correctly by using python_entrypoint_file as the file path (see lines 52-57 in python_red_team_attacker_agent.py).

rogue/evaluator_agent/red_team/python_red_team_attacker_agent.py (5)

17-84: LGTM! Well-structured constructor with validation.

The constructor correctly:

  • Accepts either python_entrypoint_file or falls back to evaluated_agent_address
  • Validates that at least one path is provided
  • Initializes instance state for module loading and session history tracking

86-142: Dynamic module loading with proper validation.

The module loading implementation:

  • Validates file existence and type
  • Adds parent directory to sys.path for local imports (tracks for cleanup)
  • Uses importlib.util for safe dynamic loading
  • Validates call_agent function exists and is callable

Security note: Dynamic module loading executes arbitrary code from the user-provided file. This is by design for the Python entrypoint feature, but ensure documentation warns users to only use trusted entrypoint files.


144-158: LGTM! Proper async context management.

The __aenter__ loads the module and __aexit__ properly cleans up:

  • Clears session histories
  • Removes the added path from sys.path
  • Calls parent's __aexit__ for base class cleanup

160-218: Error handling returns string instead of raising exception.

The _send_message_to_evaluated_agent method catches all exceptions and returns f"Error: {str(e)}" instead of propagating the exception. This is intentional for the red team flow where attack attempts should not crash the evaluation, but be aware that:

  • Error details are visible in the response
  • The red team orchestrator needs to handle these error responses appropriately

220-243: LGTM! Handles both sync and async call_agent.

The _invoke_call_agent method correctly handles both synchronous and asynchronous call_agent implementations by checking asyncio.iscoroutine() and awaiting if necessary. Result is coerced to string for consistent return type.

packages/tui/internal/tui/utils.go (3)

34-40: LGTM! AgentConfig struct updates for Python protocol.

The struct correctly adds:

  • omitempty to EvaluatedAgentURL and EvaluatedAgentTransport (optional for Python protocol)
  • New PythonEntrypointFile field with omitempty

This aligns with the Python SDK's AgentConfig where these fields are optional based on protocol.


161-170: LGTM! Protocol-aware red team request construction.

The CreateEvaluation correctly populates either:

  • python_entrypoint_file for Python protocol
  • evaluated_agent_url, evaluated_agent_transport, and credentials for other protocols

This ensures the API request matches the server's expectations based on protocol.


644-670: LGTM! Protocol-specific AgentConfig population.

The code correctly:

  • Sets PythonEntrypointFile for Python protocol (lines 663-665)
  • Omits transport for Python protocol (line 666 comment)
  • Sets EvaluatedAgentURL and EvaluatedAgentTransport for other protocols (lines 668-669)

This ensures the request payload matches what the server expects for each protocol type.

packages/tui/internal/tui/eval_types.go (10)

14-18: LGTM!

Protocol constants are well-aligned with the Python SDK's Protocol enum values (a2a, mcp, python).


41-48: LGTM!

Comments clearly document the dual-use of EvalFieldAgentURL for both agent URL and Python entrypoint file based on protocol context.


80-123: LGTM!

The EvaluationViewState struct is well-documented with clear comments explaining the form field layout for both Policy and Red Team modes.


131-139: LGTM!

The UserConfigFromFile struct provides clear JSON serialization tags for config persistence.


282-296: LGTM!

The PythonEntrypointFile is correctly threaded through to the StartEvaluation call.


346-349: LGTM!

ProtocolPython is correctly added to the available protocols list.


358-359: LGTM!

Correctly returns an empty transport list for Python protocol, aligning with the SDK's behavior where Python doesn't use network transport.


388-404: LGTM!

Good state management when cycling protocols - correctly resets transport and cursor position based on the target protocol.


524-530: LGTM!

Safe nil checks with a sensible default to ScanTypeBasic.


532-581: LGTM!

The preset application logic correctly handles all scan types:

  • Basic and Full apply their respective presets
  • Custom preserves user selections
  • State is properly synchronized between config objects
rogue/run_cli.py (8)

144-150: LGTM!

The CLI argument is well-documented with clear help text explaining when it's required.


164-181: LGTM!

Function signature correctly updated with optional types for transport and python_entrypoint_file.


214-231: LGTM!

Signature changes are consistent with the outer run_scenarios function.


246-263: LGTM!

Correctly passes None for agent_url when using Python protocol and converts the entrypoint path to string for SDK consumption.


506-527: LGTM!

Clean separation of Python protocol handling from network-based protocols with clear validation error messages.


546-561: LGTM!

Correctly handles agent_url as optional for Python protocol and passes through the entrypoint file to validation.


571-577: LGTM!

Good abstraction to create a consistent identifier across protocols for tracking/logging.


579-596: LGTM!

Parameters are correctly threaded through to run_scenarios.

Comment on lines 44 to 49
# Use hash of conversation history length as session key for consistency
session_key = len(messages)
if session_key not in _session_cache:
_session_cache[session_key] = uuid4().hex

session_id = _session_cache[session_key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Session key collision risk with parallel conversations.

Using len(messages) as the session key can cause different conversations to share the same session_id if they happen to have the same message count. In parallel evaluation scenarios, this could lead to cross-conversation state pollution.

Consider using a hash of the actual message content or passing a conversation identifier:

🔎 Proposed fix using content-based session key
-    # Use hash of conversation history length as session key for consistency
-    session_key = len(messages)
+    # Use hash of first message content + length for better session isolation
+    first_msg = messages[0]["content"] if messages else ""
+    session_key = hash((first_msg, len(messages)))
     if session_key not in _session_cache:
         _session_cache[session_key] = uuid4().hex

Alternatively, if the ShirtifyAgent's invoke method doesn't actually need session persistence across calls (since full message history is passed), you could simplify to generate a new session_id per call or derive it deterministically from the full conversation.

🤖 Prompt for AI Agents
In examples/tshirt_store_langgraph_agent/python_entrypoint_shirtify.py around
lines 44 to 49, using len(messages) as the session key risks collisions across
parallel conversations; replace it with a robust, content-derived or per-call
ID: either compute a deterministic hash (e.g., SHA-256) of the serialized
message contents/ids to derive session_key and store that in _session_cache, or
skip caching entirely and generate a fresh session_id per invocation using
uuid4().hex if persistence across calls is unnecessary; ensure you serialize
messages consistently before hashing (order and fields) and use the hash hex
string as the dictionary key.

Comment on lines +162 to +163
// findUserConfigPath finds the .rogue/user_config.json path, creating it if needed
func findUserConfigPath() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading comment: function doesn't create the directory.

The comment says "creating it if needed" but the function only finds an existing .rogue directory or returns a default path. It doesn't actually create anything.

🔎 Suggested fix
-// findUserConfigPath finds the .rogue/user_config.json path, creating it if needed
+// findUserConfigPath finds the .rogue/user_config.json path, defaulting to CWD if not found
 func findUserConfigPath() string {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// findUserConfigPath finds the .rogue/user_config.json path, creating it if needed
func findUserConfigPath() string {
// findUserConfigPath finds the .rogue/user_config.json path, defaulting to CWD if not found
func findUserConfigPath() string {
🤖 Prompt for AI Agents
In packages/tui/internal/tui/eval_types.go around lines 162-163, the comment
"finds the .rogue/user_config.json path, creating it if needed" is misleading
because the function does not create the directory or file; update the comment
to accurately state that it only locates/returns the expected path (or returns a
default path) without creating anything, or alternatively modify the function to
create the .rogue directory (use os.MkdirAll with appropriate permissions) and
ensure the user_config.json file exists before returning; pick one approach and
make the code and comment consistent.

Comment on lines +208 to +214
// Marshal and save
data, err := json.MarshalIndent(existingData, "", " ")
if err != nil {
return err
}

return os.WriteFile(configPath, data, 0644)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ensure parent directory exists before writing.

If the .rogue directory doesn't exist (e.g., fresh project), os.WriteFile will fail. Consider creating the parent directory.

🔎 Suggested fix
 	// Marshal and save
 	data, err := json.MarshalIndent(existingData, "", "  ")
 	if err != nil {
 		return err
 	}

+	// Ensure parent directory exists
+	if err := os.MkdirAll(filepath.Dir(configPath), 0755); err != nil {
+		return err
+	}
+
 	return os.WriteFile(configPath, data, 0644)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Marshal and save
data, err := json.MarshalIndent(existingData, "", " ")
if err != nil {
return err
}
return os.WriteFile(configPath, data, 0644)
// Marshal and save
data, err := json.MarshalIndent(existingData, "", " ")
if err != nil {
return err
}
// Ensure parent directory exists
if err := os.MkdirAll(filepath.Dir(configPath), 0755); err != nil {
return err
}
return os.WriteFile(configPath, data, 0644)
🤖 Prompt for AI Agents
In packages/tui/internal/tui/eval_types.go around lines 208 to 214, the code
calls os.WriteFile(configPath, ...) without ensuring the parent directory
exists, causing failures on fresh projects; before writing, create the parent
directory for configPath (use os.MkdirAll on filepath.Dir(configPath) with
appropriate permissions), handle and return any error from MkdirAll, then
proceed to write the file as before.

Comment on lines +180 to +187
go saveUserConfig(
m.evalState.AgentProtocol,
m.evalState.AgentTransport,
m.evalState.AgentURL,
m.evalState.PythonEntrypointFile,
m.evalState.EvaluationMode,
m.getScanType(),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd packages/tui/internal/tui && pwd && ls -la keyboard_controller.go 2>/dev/null | head -20

Repository: qualifire-dev/rogue

Length of output: 180


🏁 Script executed:

#!/bin/bash
# Search for mutex/lock usage around evalState field access
rg -n -C3 'evalState\.(AgentURL|PythonEntrypointFile|cursorPos)' --type=go

Repository: qualifire-dev/rogue

Length of output: 23883


🏁 Script executed:

# First, find the evalState struct definition
rg -n 'type.*evalState|struct.*evalState' --type=go packages/tui/internal/tui/

Repository: qualifire-dev/rogue

Length of output: 45


🏁 Script executed:

# Search for EvalState struct (capitalized)
rg -n 'type EvalState struct' --type=go packages/tui/internal/tui/

Repository: qualifire-dev/rogue

Length of output: 45


🏁 Script executed:

# Look at the model struct and how evalState is defined
rg -n 'evalState' --type=go packages/tui/internal/tui/keyboard_controller.go | head -5

Repository: qualifire-dev/rogue

Length of output: 377


🏁 Script executed:

# Get the first part of keyboard_controller.go to see struct definition
head -100 packages/tui/internal/tui/keyboard_controller.go

Repository: qualifire-dev/rogue

Length of output: 2725


🏁 Script executed:

# Find EvaluationViewState struct definition
rg -n 'type EvaluationViewState struct' --type=go packages/tui/internal/tui/

Repository: qualifire-dev/rogue

Length of output: 139


🏁 Script executed:

# Search for EvaluationViewState in all go files
fd --type f -name "*.go" packages/tui/internal/tui/ -exec grep -l "EvaluationViewState" {} \;

Repository: qualifire-dev/rogue

Length of output: 293


🏁 Script executed:

# Look for saveUserConfig function definition
rg -n -A10 'func saveUserConfig' --type=go packages/tui/internal/tui/

Repository: qualifire-dev/rogue

Length of output: 1008


🏁 Script executed:

# Read the EvaluationViewState struct and saveUserConfig function
sed -n '81,200p' packages/tui/internal/tui/eval_types.go

Repository: qualifire-dev/rogue

Length of output: 4243


🏁 Script executed:

# Also check for any sync.Mutex in the model or related structs
rg -n 'sync\.Mutex|sync\.RWMutex' --type=go packages/tui/internal/tui/

Repository: qualifire-dev/rogue

Length of output: 45


🏁 Script executed:

# Get the rest of saveUserConfig function
sed -n '182,230p' packages/tui/internal/tui/eval_types.go

Repository: qualifire-dev/rogue

Length of output: 1750


🏁 Script executed:

# Check the Model type and its Update method
rg -n 'type Model struct' --type=go packages/tui/internal/tui/ | head -5

Repository: qualifire-dev/rogue

Length of output: 120


🏁 Script executed:

# Check if bubble tea is being used and what the Update pattern looks like
rg -n 'func \(m Model\) Update' --type=go packages/tui/internal/tui/ | head -10

Repository: qualifire-dev/rogue

Length of output: 156


🏁 Script executed:

# Read the Model struct definition and Update method
sed -n '61,150p' packages/tui/internal/tui/types.go

Repository: qualifire-dev/rogue

Length of output: 2568


🏁 Script executed:

# Check the Update method and how it routes events
sed -n '92,120p' packages/tui/internal/tui/app.go

Repository: qualifire-dev/rogue

Length of output: 760


🏁 Script executed:

# Look for where saveUserConfig is called and check the context
rg -n 'go saveUserConfig' --type=go packages/tui/internal/tui/ | wc -l

Repository: qualifire-dev/rogue

Length of output: 65


🏁 Script executed:

# Check how many times saveUserConfig is spawned in keyboard_controller.go
rg -n 'go saveUserConfig' packages/tui/internal/tui/keyboard_controller.go

Repository: qualifire-dev/rogue

Length of output: 88


🏁 Script executed:

# Check the full keyboard_controller.go Update method to understand the flow
sed -n '155,200p' packages/tui/internal/tui/keyboard_controller.go

Repository: qualifire-dev/rogue

Length of output: 2056


🏁 Script executed:

# Check if there's any debouncing or synchronization mechanism elsewhere
rg -n 'debounce|time\.After|Debounce' --type=go packages/tui/internal/tui/

Repository: qualifire-dev/rogue

Length of output: 45


Add file-level synchronization or debounce saveUserConfig calls.

Multiple saveUserConfig goroutines can execute concurrently when keys are pressed rapidly. Since each performs an unprotected read-modify-write on user_config.json, concurrent writes risk configuration loss. Either protect file access with a mutex, use a single-writer pattern, or implement debouncing to ensure only the latest changes are saved.

🤖 Prompt for AI Agents
In packages/tui/internal/tui/keyboard_controller.go around lines 180 to 187,
multiple goroutines call saveUserConfig concurrently which can cause race
conditions and lost writes; protect file access by serializing saves: either add
a package-level sync.Mutex that saveUserConfig locks/unlocks around the
read-modify-write of user_config.json, or implement a single-writer goroutine
pattern (publish config changes to a buffered channel and have one goroutine
perform writes), or add a short debounce (coalesce rapid calls into one write).
Ensure whichever approach you pick actually serializes file I/O and handles
errors, and remove the fire-and-forget goroutine calls in favor of the
serialized/debounced mechanism.

judge_llm_aws_secret_access_key: Optional[str] = None,
judge_llm_aws_region: Optional[str] = None,
evaluation_mode: EvaluationMode = EvaluationMode.POLICY,
python_entrypoint_file: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the new python_entrypoint_file parameter in the docstring.

The python_entrypoint_file parameter was added to the function signature but is not documented in the Args section of the docstring (lines 113-128). This makes the API harder to use and understand.

🔎 Proposed fix

Add the parameter documentation in the Args section around line 127:

         judge_llm_aws_region: AWS region for judge LLM
         evaluation_mode: Should be POLICY for this function
+        python_entrypoint_file: Path to Python file with call_agent function (for PYTHON protocol)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In rogue/evaluator_agent/run_evaluator_agent.py around line 104 (docstring spans
~113-128), the new python_entrypoint_file parameter is missing from the Args
section; update the function docstring by adding a concise Args entry for
python_entrypoint_file describing its type (Optional[str]), purpose (path to a
Python entrypoint file to run or None), and default/behavior when None, placing
it near line 127 so the Args list remains ordered and complete.

Comment on lines +488 to +498
# Check for call_agent function
has_call_agent = False
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef) and node.name == "call_agent":
has_call_agent = True
break

if not has_call_agent:
raise ValueError(
f"Python entrypoint file must contain a 'call_agent' function: {python_file}", # noqa: E501
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Async function definitions are not detected.

The AST check only looks for ast.FunctionDef, but if the user defines async def call_agent(...), it won't be found (it would be ast.AsyncFunctionDef).

🔎 Suggested fix
         # Check for call_agent function
         has_call_agent = False
         for node in ast.walk(tree):
-            if isinstance(node, ast.FunctionDef) and node.name == "call_agent":
+            if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)) and node.name == "call_agent":
                 has_call_agent = True
                 break
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Check for call_agent function
has_call_agent = False
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef) and node.name == "call_agent":
has_call_agent = True
break
if not has_call_agent:
raise ValueError(
f"Python entrypoint file must contain a 'call_agent' function: {python_file}", # noqa: E501
)
# Check for call_agent function
has_call_agent = False
for node in ast.walk(tree):
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)) and node.name == "call_agent":
has_call_agent = True
break
if not has_call_agent:
raise ValueError(
f"Python entrypoint file must contain a 'call_agent' function: {python_file}", # noqa: E501
)
🤖 Prompt for AI Agents
In rogue/run_cli.py around lines 488 to 498, the AST check only looks for
ast.FunctionDef so async functions like "async def call_agent"
(ast.AsyncFunctionDef) aren't detected; update the check to consider both
ast.FunctionDef and ast.AsyncFunctionDef (e.g., isinstance(node,
(ast.FunctionDef, ast.AsyncFunctionDef))) so synchronous and asynchronous
call_agent definitions are recognized, leaving the rest of the logic (setting
has_call_agent and raising the ValueError) unchanged.

Comment on lines +130 to +134
evaluated_agent_url=(
str(job.request.evaluated_agent_url)
if job.request.evaluated_agent_url
else ""
), # noqa: E501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for evaluated_agent_url usage across red team components

echo "=== Searching for evaluated_agent_url in red_team directory ==="
rg -nP --type=py 'evaluated_agent_url' rogue/evaluator_agent/red_team/ 2>/dev/null | head -50

echo -e "\n=== Searching for create_red_team_attacker_agent function ==="
rg -nP --type=py 'def create_red_team_attacker_agent' rogue/ 2>/dev/null

echo -e "\n=== Searching for RedTeamOrchestrator class and initialization ==="
rg -nP --type=py 'class RedTeamOrchestrator' rogue/ 2>/dev/null

echo -e "\n=== Looking for evaluated_agent_url in red_team_orchestrator.py ==="
rg -nP --type=py 'evaluated_agent_url' rogue/server/core/red_team_orchestrator.py 2>/dev/null | head -30

Repository: qualifire-dev/rogue

Length of output: 1544


🏁 Script executed:

#!/bin/bash
# Look for Python red team agent implementations
echo "=== Finding python_red_team files ==="
fd -t f 'python.*red.*team.*agent' rogue/ 2>/dev/null

echo -e "\n=== Searching for attacker agent factory ==="
fd -t f '.*attacker.*agent.*factory' rogue/ 2>/dev/null

echo -e "\n=== Checking red team directory structure ==="
ls -la rogue/evaluator_agent/red_team/ 2>/dev/null | head -20

Repository: qualifire-dev/rogue

Length of output: 883


🏁 Script executed:

#!/bin/bash
# Check how evaluated_agent_url is validated or used in orchestrator
echo "=== Checking RedTeamOrchestrator signature and usage ==="
rg -nP --type=py 'RedTeamOrchestrator\(' rogue/server/services/red_team_service.py -A5 -B5

echo -e "\n=== Checking how evaluated_agent_url flows through orchestrator ==="
rg -nP --type=py 'create_red_team_attacker_agent' rogue/server/core/red_team_orchestrator.py -A10 -B2

Repository: qualifire-dev/rogue

Length of output: 1735


🏁 Script executed:

#!/bin/bash
# Read factory.py to see complete validation logic
cat -n rogue/evaluator_agent/red_team/factory.py

Repository: qualifire-dev/rogue

Length of output: 7682


🏁 Script executed:

#!/bin/bash
# Check python_red_team_attacker_agent.py for evaluated_agent_url usage
cat -n rogue/evaluator_agent/red_team/python_red_team_attacker_agent.py | head -100

Repository: qualifire-dev/rogue

Length of output: 4891


🏁 Script executed:

#!/bin/bash
# Check what protocol values are being passed and if Python protocol is used
rg -nP --type=py 'evaluated_agent_protocol' rogue/server/services/red_team_service.py -B5 -A5

Repository: qualifire-dev/rogue

Length of output: 580


Pass None instead of an empty string for evaluated_agent_url when falsy.

The code converts falsy job.request.evaluated_agent_url to an empty string, but the factory's validation logic uses truthiness checks (if not evaluated_agent_url:) for A2A and MCP protocols. An empty string will trigger the same ValueError as None, making the conversion semantically confusing. For clarity and consistency with the API's Optional[str] type signature, pass None directly when the value is falsy rather than converting it to an empty string.

🤖 Prompt for AI Agents
In rogue/server/services/red_team_service.py around lines 130 to 134, the code
coerces a falsy job.request.evaluated_agent_url to an empty string before
passing it to the factory, but the factory expects an Optional[str] and checks
truthiness (treating "" like None). Change the call to pass None when
evaluated_agent_url is falsy (i.e., use job.request.evaluated_agent_url if
truthy, otherwise None) so the value remains an actual None for clarity and
correct semantics.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Fix all issues with AI Agents 🤖
In @rogue/evaluator_agent/python/python_evaluator_agent.py:
- Line 105: Remove the redundant local "import sys" inside
python_evaluator_agent.py (the duplicate import at the location shown) since sys
is already imported at the module top; delete that line and ensure imports
follow isort-style grouping (standard, third-party, local) so the file has a
single top-level import for sys and no in-function/module-body duplicates.
🧹 Nitpick comments (4)
rogue/evaluator_agent/python/python_evaluator_agent.py (4)

128-140: Consider validating the call_agent signature.

While the code checks that call_agent is callable, it doesn't validate the expected signature call_agent(messages: list[dict]) -> str. Using inspect.signature() could provide earlier, clearer error messages if the user's function has an incompatible signature.

🔎 Example signature validation
import inspect

# After line 135
sig = inspect.signature(self._call_agent_fn)
params = list(sig.parameters.values())
if len(params) != 1:
    raise TypeError(
        f"call_agent must accept exactly 1 parameter (messages), "
        f"but has {len(params)} parameters"
    )

146-149: Consider potential event loop blocking during module load.

_load_python_module() performs synchronous file I/O and module loading, which could briefly block the event loop. For typical use cases this is likely acceptable since initialization happens once, but if loading large modules or working with slow file systems, consider whether this warrants investigation.


204-212: Clarify the fallback logic.

The fallback to [{"role": "user", "content": message}] appears unnecessary since _add_message_to_chat_history was just called on line 201, which creates the ChatHistory if it doesn't exist. While this defensive programming doesn't cause issues, it may indicate uncertainty about the state management flow.


246-269: Consider refining the type hint for _call_agent_fn.

The current type hint Optional[Callable[[list[dict[str, Any]]], str]] doesn't capture that the function can also be async. While the runtime handling is correct, a more precise type hint would be:

from typing import Union, Coroutine

_call_agent_fn: Optional[
    Union[
        Callable[[list[dict[str, Any]]], str],
        Callable[[list[dict[str, Any]]], Coroutine[Any, Any, str]]
    ]
]

However, this may be overly complex. The current implementation handles both cases correctly at runtime.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9df9e2c and 57a02a5.

📒 Files selected for processing (1)
  • rogue/evaluator_agent/python/python_evaluator_agent.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use black for code formatting
Use type hints for all function signatures
Follow PEP 8 naming conventions (snake_case for variables and functions, PascalCase for classes)
Use try/except blocks for code that may raise exceptions
Use isort conventions for import organization

Files:

  • rogue/evaluator_agent/python/python_evaluator_agent.py
🧬 Code graph analysis (1)
rogue/evaluator_agent/python/python_evaluator_agent.py (3)
sdks/python/rogue_sdk/types.py (1)
  • Protocol (75-90)
rogue/evaluator_agent/base_evaluator_agent.py (2)
  • _send_message_to_evaluated_agent (555-572)
  • _add_message_to_chat_history (583-617)
rogue/evaluator_agent/red_team/python_red_team_attacker_agent.py (2)
  • _load_python_module (86-142)
  • _send_message_to_evaluated_agent (160-218)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: codestyle
  • GitHub Check: rogue_sanity
🔇 Additional comments (5)
rogue/evaluator_agent/python/python_evaluator_agent.py (5)

1-19: LGTM - Module structure and imports are well-organized.

The module docstring clearly explains the purpose, and imports are properly organized following conventions (stdlib → external → local).


21-30: LGTM - Clear class documentation.

The docstring effectively communicates the expected call_agent function signature and message format.


32-74: LGTM - Initialization is well-structured.

The constructor properly initializes the agent, converts the file path to a Path object for robust path handling, and tracks necessary state variables for module loading and cleanup.


76-144: Verify security implications of arbitrary code execution.

This method loads and executes user-provided Python code without sandboxing, timeouts, or resource limits. The loaded module runs with full privileges of the rogue process and can access files, network, environment variables, etc.

Confirm this security model is acceptable for the intended use case. Consider documenting security expectations in the module docstring or user-facing documentation.


151-167: LGTM - Proper cleanup implementation.

The exit handler correctly cleans up the sys.path modification and handles edge cases gracefully with the try-except block.


# Add the entrypoint file's directory to sys.path so imports work
# This allows the entrypoint to import sibling modules
import sys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove redundant import statement.

sys is already imported at the top of the file (line 9). This local import is unnecessary and violates DRY principle.

🔎 Proposed fix
-        import sys
-
         entrypoint_dir = str(self._python_file_path.parent.resolve())

Based on coding guidelines, use isort conventions for import organization.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import sys
entrypoint_dir = str(self._python_file_path.parent.resolve())
🤖 Prompt for AI Agents
In @rogue/evaluator_agent/python/python_evaluator_agent.py around line 105,
Remove the redundant local "import sys" inside python_evaluator_agent.py (the
duplicate import at the location shown) since sys is already imported at the
module top; delete that line and ensure imports follow isort-style grouping
(standard, third-party, local) so the file has a single top-level import for sys
and no in-function/module-body duplicates.

Comment on lines 169 to 244
async def _send_message_to_evaluated_agent(
self,
context_id: str,
message: str,
) -> dict[str, str]:
"""
Sends a message to the evaluated agent by calling the Python function.

:param message: the text to send to the agent.
:param context_id: The context ID of the conversation.
Each conversation has a unique context_id. All messages in the conversation
have the same context_id.
:return: A dictionary containing the response from the evaluated agent.
- "response": the response string. If there is no response
from the agent, the string is empty.
"""
if self._call_agent_fn is None:
raise RuntimeError(
"Python module not loaded. Call __aenter__ first or use async with.",
)

try:
logger.info(
"🐍 Calling Python entrypoint function",
extra={
"message": message[:100] + "..." if len(message) > 100 else message,
"context_id": context_id,
"python_file": str(self._python_file_path),
},
)

# Add the user message to chat history
self._add_message_to_chat_history(context_id, "user", message)

# Build messages list from chat history (single source of truth)
chat_history = self._context_id_to_chat_history.get(context_id)
messages = (
[
{"role": msg.role, "content": msg.content}
for msg in chat_history.messages
]
if chat_history
else [{"role": "user", "content": message}]
)

# Call the user's function
response = await self._invoke_call_agent(messages)

# Add the assistant response to chat history
self._add_message_to_chat_history(context_id, "assistant", response)

logger.info(
"✅ Python entrypoint call successful",
extra={
"response_length": len(response),
"response_preview": (
response[:100] + "..." if len(response) > 100 else response
),
"context_id": context_id,
},
)

return {"response": response}

except Exception as e:
logger.exception(
"❌ Python entrypoint call failed",
extra={
"message": message[:100] + "..." if len(message) > 100 else message,
"context_id": context_id,
"python_file": str(self._python_file_path),
"error": str(e),
"error_type": type(e).__name__,
},
)
return {"response": "", "error": str(e)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout protection for user code execution.

There's no timeout when calling _invoke_call_agent (line 215), so a runaway or hanging user function could block indefinitely. Consider wrapping the invocation with asyncio.wait_for() to enforce a reasonable timeout.

🔎 Proposed fix with configurable timeout
+            # Call the user's function with timeout protection
+            timeout_seconds = 300  # 5 minutes default, consider making configurable
-            # Call the user's function
-            response = await self._invoke_call_agent(messages)
+            try:
+                response = await asyncio.wait_for(
+                    self._invoke_call_agent(messages),
+                    timeout=timeout_seconds
+                )
+            except asyncio.TimeoutError:
+                logger.error(
+                    f"Python entrypoint call timed out after {timeout_seconds}s",
+                    extra={"context_id": context_id}
+                )
+                return {"response": "", "error": f"Call timed out after {timeout_seconds}s"}

@drorIvry drorIvry requested a review from yuval-qf January 5, 2026 21:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Fix all issues with AI Agents 🤖
In @examples/python_entrypoint_stub.py:
- Line 115: The line setting latest_message assumes messages[-1] has a "content"
key and can raise KeyError; update the assignment for latest_message to
defensively access the key by checking that messages is non-empty and that
"content" exists on messages[-1] (e.g., use dict.get or an "in" check) and fall
back to an empty string if missing so latest_message never raises.
- Around line 54-55: The current extraction of latest_message assumes
messages[-1] is a dict with a "content" key and can raise KeyError; update the
logic around latest_message to defensively access the key (e.g., check messages
is non-empty, ensure messages[-1] is a dict, and use dict.get("content", "") or
fallback to an empty string) so malformed entries won’t raise and the template
demonstrates safe access to the "content" field.
- Around line 1-15: The module docstring is missing the primary usage
instructions before the "Or via TUI:" section and contains an extra blank line;
update the top-level docstring (the usage block describing how to run this
example) to include a primary invocation example (e.g., a CLI command or minimal
Python snippet showing how to import and call call_agent) placed before the "Or
via TUI:" paragraph, remove the stray blank line, and ensure it reiterates that
the call_agent function receives the full conversation history and must return a
string.

In @packages/tui/internal/tui/common_controller.go:
- Around line 63-70: The saveUserConfig call is currently launched in a
goroutine (go saveUserConfig(...)) so any returned error is ignored; change this
to either call saveUserConfig synchronously and check its returned error, or
keep it async but capture errors via a channel and dispatch them to the UI
(e.g., call a controller method to show a dialog/status update). Locate the
invocation in common_controller.go and replace the fire-and-forget goroutine
with a direct call that handles the error or add an error channel and a
goroutine that reads the error and invokes your existing UI error/display method
to notify the user when persistence fails.

In @packages/tui/internal/tui/eval_types.go:
- Around line 189-190: The code currently ignores errors from json.Unmarshal
when reading an existing config (the os.ReadFile block that reads configPath
into b and unmarshals into existingData), which can silently overwrite user
data; update that block to capture and handle the unmarshal error: after
json.Unmarshal(b, &existingData) check the returned error, log or return the
error (or bail out from the save/merge operation) instead of proceeding with an
empty map so corrupted/invalid JSON is not silently overwritten. Ensure you
reference the same variables (b, err, existingData, configPath) and keep
existing behavior for successful unmarshalling.
♻️ Duplicate comments (4)
rogue/evaluator_agent/python/python_evaluator_agent.py (2)

107-107: Remove redundant local import.

The import sys statement at line 107 is redundant since sys is already imported at the top of the file (line 10). This violates the DRY principle.

Based on coding guidelines, use isort conventions for import organization with all imports at the module top.

🔎 Proposed fix
-        import sys
-
         entrypoint_dir = str(self._python_file_path.parent.resolve())

216-217: Consider adding timeout protection for user code execution.

The call to _invoke_call_agent at line 217 has no timeout, so a hanging or runaway user function could block indefinitely. Consider wrapping the invocation with asyncio.wait_for() to enforce a reasonable timeout.

This was previously flagged in past reviews and remains unaddressed.

packages/tui/internal/tui/eval_types.go (2)

164-181: Misleading comment: function doesn't create the directory.

The comment says "creating it if needed" but the function only finds an existing .rogue directory or returns a default path. It doesn't actually create anything.

🔎 Suggested fix
-// findUserConfigPath finds the .rogue/user_config.json path, creating it if needed
+// findUserConfigPath finds the .rogue/user_config.json path, defaulting to CWD if not found
 func findUserConfigPath() string {

210-216: Ensure parent directory exists before writing.

If the .rogue directory doesn't exist (e.g., fresh project), os.WriteFile will fail. Consider creating the parent directory.

🔎 Suggested fix
 	// Marshal and save
 	data, err := json.MarshalIndent(existingData, "", "  ")
 	if err != nil {
 		return err
 	}

+	// Ensure parent directory exists
+	if err := os.MkdirAll(filepath.Dir(configPath), 0755); err != nil {
+		return err
+	}
+
 	return os.WriteFile(configPath, data, 0644)
🧹 Nitpick comments (2)
rogue/evaluator_agent/python/python_evaluator_agent.py (1)

265-265: Move import to the top of the file.

The import inspect statement should be moved to the module-level imports at the top of the file (along with other standard library imports) rather than being imported locally within the function.

Based on coding guidelines, use isort conventions for import organization.

🔎 Proposed fix

At the top of the file, add inspect to the standard library imports:

 import asyncio
 import importlib.util
+import inspect
 import sys
 from pathlib import Path

Then remove the local import:

-        # Try to call with context_id first (new signature),
-        # fall back to messages-only for backward compatibility
-        import inspect
-
         sig = inspect.signature(self._call_agent_fn)
examples/python_entrypoint_stub.py (1)

110-110: Consider moving import to module level.

The asyncio import is placed inside the function. While functional, Python convention recommends placing imports at the module level for better readability and consistency.

🔎 Suggested fix
 from typing import Any, Optional
+import asyncio
 
 
 def call_agent(

Then remove the import from inside the function:

     Args:
         messages: List of message dicts with 'role' and 'content' keys.
         context_id: Optional unique conversation ID for session tracking.
     """
-    import asyncio
-
     # Simulate async processing
     await asyncio.sleep(0.1)
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57a02a5 and 0648511.

📒 Files selected for processing (5)
  • examples/python_entrypoint_stub.py
  • examples/tshirt_store_langgraph_agent/python_entrypoint_shirtify.py
  • packages/tui/internal/tui/common_controller.go
  • packages/tui/internal/tui/eval_types.go
  • rogue/evaluator_agent/python/python_evaluator_agent.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/tshirt_store_langgraph_agent/python_entrypoint_shirtify.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use black for code formatting
Use type hints for all function signatures
Follow PEP 8 naming conventions (snake_case for variables and functions, PascalCase for classes)
Use try/except blocks for code that may raise exceptions
Use isort conventions for import organization

Files:

  • examples/python_entrypoint_stub.py
  • rogue/evaluator_agent/python/python_evaluator_agent.py
🧠 Learnings (1)
📚 Learning: 2026-01-04T16:10:56.611Z
Learnt from: CR
Repo: qualifire-dev/rogue PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-04T16:10:56.611Z
Learning: Project structure should organize code into `rogue/` main package, `tests/` for tests, `examples/` for example implementations, and `.github/` for CI/CD workflows

Applied to files:

  • packages/tui/internal/tui/eval_types.go
🧬 Code graph analysis (2)
examples/python_entrypoint_stub.py (1)
examples/tshirt_store_langgraph_agent/python_entrypoint_shirtify.py (1)
  • call_agent (20-54)
packages/tui/internal/tui/common_controller.go (2)
packages/tui/internal/tui/eval_types.go (3)
  • ProtocolPython (19-19)
  • EvaluationMode (33-33)
  • LoadEvaluationStateFromConfig (273-401)
sdks/python/rogue_sdk/types.py (1)
  • EvaluationMode (58-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: rogue_sanity
  • GitHub Check: codestyle
🔇 Additional comments (14)
rogue/evaluator_agent/python/python_evaluator_agent.py (6)

1-19: LGTM! Well-organized imports and clear module documentation.

The imports follow isort conventions correctly (standard library, third-party, local), and the module docstring clearly explains the purpose.


21-30: LGTM! Clear class documentation.

The class structure and docstring clearly communicate the purpose and expected function signature.


32-76: LGTM! Well-structured initialization.

The __init__ method follows PEP 8 naming conventions, includes comprehensive type hints, and properly initializes all instance variables. Good use of Path for file path handling and informative debug logging.


148-151: LGTM! Proper async context manager entry.

The __aenter__ method correctly loads the Python module and delegates to the parent class.


153-169: LGTM! Proper cleanup in async context exit.

The __aexit__ method correctly cleans up the sys.path modification with defensive error handling to avoid issues if the path was already removed.


267-291: LGTM! Excellent backward compatibility and type handling.

The signature inspection logic elegantly handles both old (messages-only) and new (messages + context_id) function signatures with appropriate fallback. The async detection and type coercion ensure robust handling of different implementations.

packages/tui/internal/tui/eval_types.go (6)

17-19: LGTM: Clean protocol constant addition.

The addition of ProtocolPython follows the existing pattern and integrates well with the protocol cycling and transport logic throughout the file.


44-58: LGTM: Clear field constant documentation.

The updated comments and field constant structure clearly document the dual use of EvalFieldAgentURL for both Agent URL and Python entrypoint file, along with the protocol-specific field skipping behavior.


483-497: LGTM: Protocol-transport mapping correctly handles Python.

The updated getAllProtocols and getTransportsForProtocol functions correctly include Python protocol and return an empty transport list for it, since Python protocol doesn't use network transports.


533-539: LGTM: Cursor position correctly reset on protocol change.

When the protocol changes, the cursor position is properly reset to match the appropriate text field (PythonEntrypointFile or AgentURL), preventing cursor position bugs.


659-716: LGTM: Preset application logic is sound.

The applyPresetForScanType function correctly handles preset application for Basic and Full scan types while preserving custom selections for Custom scan type.


275-275: No action required—openai/gpt-4.1 is a valid OpenAI model identifier.

OpenAI introduced the GPT-4.1 model series in April 2025, which includes gpt-4.1 as a valid model identifier. The code's fallback default is correct.

Likely an incorrect or invalid review comment.

packages/tui/internal/tui/common_controller.go (2)

41-78: LGTM: Protocol-aware paste handling works correctly.

The paste logic correctly differentiates between Python entrypoint file and Agent URL based on the protocol, with appropriate cursor updates and config persistence.


238-239: LGTM: Clean config consolidation.

Replacing hard-coded defaults with LoadEvaluationStateFromConfig properly consolidates config loading from user_config.json, scenarios.json, and redteam.yaml.

Comment on lines +1 to +15
"""
Example Python entrypoint for Rogue agent evaluation.

This file demonstrates how to create a Python entrypoint for testing
agents without A2A or MCP protocols. Rogue will dynamically import
this file and call the `call_agent` function.


Or via TUI:
1. Select "Python" as the protocol
2. Enter the path to your Python file

The call_agent function receives the full conversation history and
should return the agent's response as a string.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incomplete usage documentation.

The docstring has a blank line at Line 8 and Line 9 begins with "Or via TUI:", suggesting missing usage instructions for a primary method (e.g., CLI command) before the TUI alternative.

🔎 Suggested fix
 This file demonstrates how to create a Python entrypoint for testing
 agents without A2A or MCP protocols. Rogue will dynamically import
 this file and call the `call_agent` function.
 
+Usage:
+    Via CLI:
+        rogue evaluate --protocol python --python-file path/to/this_file.py
 
 Or via TUI:
     1. Select "Python" as the protocol
     2. Enter the path to your Python file
🤖 Prompt for AI Agents
In @examples/python_entrypoint_stub.py around lines 1-15, The module docstring
is missing the primary usage instructions before the "Or via TUI:" section and
contains an extra blank line; update the top-level docstring (the usage block
describing how to run this example) to include a primary invocation example
(e.g., a CLI command or minimal Python snippet showing how to import and call
call_agent) placed before the "Or via TUI:" paragraph, remove the stray blank
line, and ensure it reiterates that the call_agent function receives the full
conversation history and must return a string.

Comment on lines +54 to +55
# Extract the latest user message
latest_message = messages[-1]["content"] if messages else ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add defensive key access to prevent KeyError.

The code assumes messages[-1] contains a "content" key. If the message structure is malformed, this will raise a KeyError. Since this is a template file, consider demonstrating defensive coding practices.

🔎 Suggested fix
-    # Extract the latest user message
-    latest_message = messages[-1]["content"] if messages else ""
+    # Extract the latest user message
+    latest_message = messages[-1].get("content", "") if messages else ""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Extract the latest user message
latest_message = messages[-1]["content"] if messages else ""
# Extract the latest user message
latest_message = messages[-1].get("content", "") if messages else ""
🤖 Prompt for AI Agents
In @examples/python_entrypoint_stub.py around lines 54-55, The current
extraction of latest_message assumes messages[-1] is a dict with a "content" key
and can raise KeyError; update the logic around latest_message to defensively
access the key (e.g., check messages is non-empty, ensure messages[-1] is a
dict, and use dict.get("content", "") or fallback to an empty string) so
malformed entries won’t raise and the template demonstrates safe access to the
"content" field.

# Simulate async processing
await asyncio.sleep(0.1)

latest_message = messages[-1]["content"] if messages else ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add defensive key access to prevent KeyError.

Same issue as the sync version: the code assumes messages[-1] contains a "content" key, which may raise a KeyError if the structure is unexpected.

🔎 Suggested fix
-    latest_message = messages[-1]["content"] if messages else ""
+    latest_message = messages[-1].get("content", "") if messages else ""
🤖 Prompt for AI Agents
In @examples/python_entrypoint_stub.py around line 115, The line setting
latest_message assumes messages[-1] has a "content" key and can raise KeyError;
update the assignment for latest_message to defensively access the key by
checking that messages is non-empty and that "content" exists on messages[-1]
(e.g., use dict.get or an "in" check) and fall back to an empty string if
missing so latest_message never raises.

Comment on lines +63 to +70
go saveUserConfig(
m.evalState.AgentProtocol,
m.evalState.AgentTransport,
m.evalState.AgentURL,
m.evalState.PythonEntrypointFile,
m.evalState.EvaluationMode,
m.getScanType(),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent failure: saveUserConfig errors are not handled.

The saveUserConfig call runs in a goroutine without error handling. If config persistence fails (e.g., permission denied, disk full), the user won't be notified and their settings won't be saved.

Consider either:

  1. Removing the goroutine if the operation is fast enough (file I/O is typically quick)
  2. Adding error handling that notifies the user via a dialog or status message
🔎 Option 1: Synchronous save with error handling
 			m.evalState.cursorPos += len([]rune(cleanText))
-			// Save config after paste
-			go saveUserConfig(
-				m.evalState.AgentProtocol,
-				m.evalState.AgentTransport,
-				m.evalState.AgentURL,
-				m.evalState.PythonEntrypointFile,
-				m.evalState.EvaluationMode,
-				m.getScanType(),
-			)
+			// Save config after paste
+			if err := saveUserConfig(
+				m.evalState.AgentProtocol,
+				m.evalState.AgentTransport,
+				m.evalState.AgentURL,
+				m.evalState.PythonEntrypointFile,
+				m.evalState.EvaluationMode,
+				m.getScanType(),
+			); err != nil {
+				// Optionally notify user of save failure
+				// (may choose to silently ignore if non-critical)
+			}
🤖 Prompt for AI Agents
In @packages/tui/internal/tui/common_controller.go around lines 63-70, The
saveUserConfig call is currently launched in a goroutine (go
saveUserConfig(...)) so any returned error is ignored; change this to either
call saveUserConfig synchronously and check its returned error, or keep it async
but capture errors via a channel and dispatch them to the UI (e.g., call a
controller method to show a dialog/status update). Locate the invocation in
common_controller.go and replace the fire-and-forget goroutine with a direct
call that handles the error or add an error channel and a goroutine that reads
the error and invokes your existing UI error/display method to notify the user
when persistence fails.

Comment on lines +189 to +190
if b, err := os.ReadFile(configPath); err == nil {
json.Unmarshal(b, &existingData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle JSON unmarshal errors to avoid silent data corruption.

When reading existing config, json.Unmarshal errors are ignored. If the existing config file is corrupted or invalid JSON, this could lead to silently overwriting user data with an empty map.

🔎 Proposed fix
 	// Read existing config to preserve other fields
 	existingData := make(map[string]interface{})
 	if b, err := os.ReadFile(configPath); err == nil {
-		json.Unmarshal(b, &existingData)
+		if err := json.Unmarshal(b, &existingData); err != nil {
+			// Log or return error instead of silently ignoring corrupted config
+			return fmt.Errorf("failed to parse existing config: %w", err)
+		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if b, err := os.ReadFile(configPath); err == nil {
json.Unmarshal(b, &existingData)
// Read existing config to preserve other fields
existingData := make(map[string]interface{})
if b, err := os.ReadFile(configPath); err == nil {
if err := json.Unmarshal(b, &existingData); err != nil {
// Log or return error instead of silently ignoring corrupted config
return fmt.Errorf("failed to parse existing config: %w", err)
}
}
🤖 Prompt for AI Agents
In @packages/tui/internal/tui/eval_types.go around lines 189-190, The code
currently ignores errors from json.Unmarshal when reading an existing config
(the os.ReadFile block that reads configPath into b and unmarshals into
existingData), which can silently overwrite user data; update that block to
capture and handle the unmarshal error: after json.Unmarshal(b, &existingData)
check the returned error, log or return the error (or bail out from the
save/merge operation) instead of proceeding with an empty map so
corrupted/invalid JSON is not silently overwritten. Ensure you reference the
same variables (b, err, existingData, configPath) and keep existing behavior for
successful unmarshalling.

@drorIvry drorIvry enabled auto-merge (squash) January 7, 2026 20:58
@drorIvry drorIvry merged commit 66dbf5c into main Jan 7, 2026
9 of 10 checks passed
@drorIvry drorIvry deleted the feature/python-ts-entrypoint branch January 7, 2026 20:59
@drorIvry drorIvry mentioned this pull request Feb 2, 2026
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants