Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds a FastAPI server with REST and WebSocket APIs, a full Python and TypeScript SDK, and migrates CLI/UI flows to use the server via SDK. Introduces structured logging, new evaluation orchestration/services, interview and LLM endpoints, packaging/CI updates, and significant model/type moves to rogue_sdk.types with corresponding test and doc updates. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant RogueSDK
participant Server as FastAPI Server
participant EvalAPI as /api/v1/evaluations
participant WS as WebSocket /ws/{job_id}
participant LLM as LLM API
User->>CLI: rogue run ... --rogue-server-url
CLI->>RogueSDK: run_evaluation(...)
RogueSDK->>Server: POST EvalAPI (create)
Server-->>RogueSDK: job_id (PENDING)
RogueSDK->>WS: connect(job_id)
Server->>WS: stream job_update/chat_update
RogueSDK-->>CLI: on_update/on_chat callbacks
Server->>LLM: judge/summary calls (as needed)
RogueSDK->>Server: GET EvalAPI/{job_id} (wait)
Server-->>RogueSDK: COMPLETED + results
RogueSDK-->>CLI: results
CLI->>RogueSDK: generate_summary(results, model)
RogueSDK->>Server: POST /api/v1/llm/summary
Server-->>RogueSDK: summary
RogueSDK-->>CLI: summary
CLI-->>User: Write JSON + print summary
sequenceDiagram
participant Client
participant EvalAPI as POST /evaluations
participant EvalSvc as EvaluationService
participant Orchestrator
participant Agent as EvaluatorAgent
participant WS as WebSocketManager
Client->>EvalAPI: create_evaluation(request)
EvalAPI->>EvalSvc: add_job + run_job(job_id)
EvalSvc->>Orchestrator: run_evaluation()
Orchestrator->>Agent: arun_evaluator_agent(...)
Agent-->>Orchestrator: stream updates (chat/status/results)
Orchestrator-->>EvalSvc: forward updates
EvalSvc->>WS: broadcast chat/job_update
EvalSvc-->>Client: GET job status/progress
EvalSvc-->>Client: final results (COMPLETED/FAILED)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (6)
rogue/server/__main__.py (1)
1-13: LGTM! Well-structured server entry point with minor suggestions.The implementation follows good practices with secure defaults and environment variable configuration.
Consider these improvements for production readiness:
import uvicorn import os +import sys from .main import app if __name__ == "__main__": host = os.getenv("HOST", "127.0.0.1") # Default to localhost for security - port = int(os.getenv("PORT", "8000")) + try: + port = int(os.getenv("PORT", "8000")) + except ValueError: + print("Error: PORT environment variable must be a valid integer") + sys.exit(1) + + # Disable reload in production + reload = os.getenv("ENVIRONMENT", "development") == "development" + uvicorn.run( app, host=host, port=port, - reload=True, + reload=reload, )rogue/ui/components/scenario_runner.py (1)
200-205: Clean refactor removing file operations from UI, but remove commented code.The removal of filesystem operations aligns well with the server-client architecture refactor. Results are properly maintained in memory and state for UI consumption.
Remove the commented-out code entirely rather than leaving it as comments:
- # workdir = state.get("workdir") - - # final_output_path = ( - # workdir / f"evaluation_results_{datetime.now().isoformat()}.json" - # ) - # final_output_path.write_text(all_results.model_dump_json(indent=2))- # state["evaluation_results_output_path"] = final_output_pathAlso applies to: 215-215
rogue/server/api/health.py (1)
23-23: Consider making the version configurable.The hardcoded version "1.0.0" should ideally be read from a configuration file, environment variable, or package metadata to avoid manual updates during releases.
You could replace the hardcoded version with:
- version="1.0.0", + version=os.getenv("APP_VERSION", "1.0.0"),Or read from package metadata if available.
rogue/server/services/evaluation_service.py (3)
21-32: Consider caching for better performance with large job datasets.The method recreates the jobs list, filters, and sorts on every call, which could become expensive with many jobs.
For better performance with large datasets, consider:
- Maintaining pre-sorted indices
- Using database-backed storage for persistence and efficient querying
- Caching frequently accessed job lists
def get_jobs( self, status: Optional[EvaluationStatus] = None, limit: int = 50, offset: int = 0, ) -> List[EvaluationJob]: - jobs = list(self.jobs.values()) + # Consider maintaining a sorted index or using database queries + jobs = sorted(self.jobs.values(), key=lambda x: x.created_at, reverse=True) if status: jobs = [job for job in jobs if job.status == status] - - jobs.sort(key=lambda x: x.created_at, reverse=True) return jobs[offset : offset + limit]
98-99: Consider error handling for fire-and-forget tasks.The async task is created without error handling, which could lead to silent failures in WebSocket broadcasts.
Add error handling for the async task:
def _notify_job_update(self, job: EvaluationJob): - asyncio.create_task(websocket_manager.broadcast_job_update(job)) + task = asyncio.create_task(websocket_manager.broadcast_job_update(job)) + task.add_done_callback(lambda t: logger.error(f"WebSocket broadcast failed: {t.exception()}") if t.exception() else None)Or use a more robust background task manager for production use.
101-109: Move import to module level and add error handling.The local import and fire-and-forget pattern have the same concerns as the previous method.
+from ..models.api_models import WebSocketMessage def _notify_chat_update(self, job_id: str, chat_data: Any): """Send real-time chat updates via WebSocket""" - from ..models.api_models import WebSocketMessage message = WebSocketMessage( type="chat_update", job_id=job_id, data={"message": str(chat_data)} ) - asyncio.create_task(websocket_manager.broadcast_to_job(job_id, message)) + task = asyncio.create_task(websocket_manager.broadcast_to_job(job_id, message)) + task.add_done_callback(lambda t: logger.error(f"Chat broadcast failed: {t.exception()}") if t.exception() else None)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (17)
.flake8(1 hunks)AGENTS.md(1 hunks)docs/README.md(0 hunks)pyproject.toml(2 hunks)refactor_plan.md(1 hunks)rogue/__init__.py(1 hunks)rogue/run_cli.py(1 hunks)rogue/server/__main__.py(1 hunks)rogue/server/api/evaluation.py(1 hunks)rogue/server/api/health.py(1 hunks)rogue/server/main.py(1 hunks)rogue/server/models/api_models.py(1 hunks)rogue/server/services/evaluation_service.py(1 hunks)rogue/server/websocket/manager.py(1 hunks)rogue/services/evaluation_library.py(1 hunks)rogue/services/scenario_evaluation_service.py(0 hunks)rogue/ui/components/scenario_runner.py(2 hunks)
💤 Files with no reviewable changes (2)
- docs/README.md
- rogue/services/scenario_evaluation_service.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/clean-code.mdc:0-0
Timestamp: 2025-07-01T11:52:43.736Z
Learning: Refactor continuously
pyproject.toml (5)
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-07-01T11:54:30.386Z
Learning: Applies to {requirements.txt,pyproject.toml} : Use proper package versions
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-07-01T11:54:30.386Z
Learning: Applies to {requirements.txt,pyproject.toml} : Pin dependency versions
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-07-01T11:54:30.386Z
Learning: Applies to {requirements.txt,pyproject.toml} : Separate dev dependencies
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-07-01T11:54:30.386Z
Learning: Applies to {requirements.txt,pyproject.toml} : Store requirements in requirements.txt or pyproject.toml
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/tech-stack.mdc:0-0
Timestamp: 2025-07-01T11:55:48.082Z
Learning: Use FastAPI exclusively
AGENTS.md (1)
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/self_improve.mdc:0-0
Timestamp: 2025-07-01T11:54:49.379Z
Learning: Document breaking changes
rogue/server/api/health.py (1)
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/tech-stack.mdc:0-0
Timestamp: 2025-07-01T11:55:48.082Z
Learning: Use FastAPI exclusively
rogue/server/main.py (1)
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/tech-stack.mdc:0-0
Timestamp: 2025-07-01T11:55:48.082Z
Learning: Use FastAPI exclusively
🧬 Code Graph Analysis (5)
rogue/run_cli.py (1)
rogue/services/scenario_evaluation_service.py (1)
evaluate_scenarios(35-66)
rogue/server/models/api_models.py (2)
rogue/models/scenario.py (1)
Scenario(12-51)rogue/models/config.py (1)
AgentConfig(14-45)
rogue/server/websocket/manager.py (1)
rogue/server/models/api_models.py (2)
EvaluationJob(26-35)WebSocketMessage(49-52)
rogue/services/evaluation_library.py (5)
rogue/models/config.py (2)
AgentConfig(14-45)AuthType(7-11)rogue/models/scenario.py (3)
Scenarios(54-70)Scenario(12-51)ScenarioType(7-9)rogue/models/evaluation_result.py (1)
EvaluationResults(21-38)rogue/services/scenario_evaluation_service.py (2)
ScenarioEvaluationService(13-66)evaluate_scenarios(35-66)rogue/server/services/evaluation_service.py (1)
progress_callback(65-76)
rogue/server/api/evaluation.py (2)
rogue/server/models/api_models.py (5)
EvaluationRequest(11-15)EvaluationResponse(38-41)EvaluationJob(26-35)EvaluationStatus(18-23)JobListResponse(44-46)rogue/server/services/evaluation_service.py (7)
EvaluationService(11-109)add_job(15-16)run_evaluation(51-96)get_jobs(21-32)get_job_count(34-37)get_job(18-19)cancel_job(39-49)
🔇 Additional comments (32)
.flake8 (1)
11-11: LGTM! Standard practice for black compatibility.Adding E203 to the ignore list is a common and recommended practice when using black formatter, as it conflicts with black's slice formatting style.
pyproject.toml (1)
10-10: LGTM! Appropriate dependencies for FastAPI server architecture.The addition of
fastapi>=0.115.0anduvicorn>=0.32.0aligns well with the server refactor objectives. The version constraints are reasonable and follow the project's dependency management practices.Also applies to: 19-19
rogue/run_cli.py (1)
119-134: LGTM! Clean separation of concerns in the refactor.The modification correctly separates evaluation logic from file output responsibilities. The CLI maintains backward compatibility by handling file output after evaluation completes, while the core service focuses purely on evaluation.
The logic flow is sound:
- Collect results from the async generator
- Write to file for CLI compatibility
- Return results for further processing
- Proper error handling when results are missing
rogue/server/api/health.py (1)
10-26: LGTM! Well-structured health check endpoint.The implementation follows FastAPI best practices with proper Pydantic modeling, async handler, and comprehensive system information in the response.
rogue/server/main.py (2)
13-17: LGTM! Proper lifespan management implementation.The async context manager for lifespan events is correctly implemented with appropriate logging.
20-41: LGTM! Well-structured FastAPI application factory.The create_app factory pattern is excellent for testing and modularity. Router integration and app configuration are properly structured.
AGENTS.md (1)
1-66: LGTM! Comprehensive and well-structured documentation.This documentation provides excellent guidance for AI agents and developers working in the repository. It covers all essential aspects including dependencies, tooling, code style, project structure, and execution commands. The consistent use of
uvand clear formatting makes it easy to follow.rogue/server/models/api_models.py (4)
11-16: LGTM! Well-structured request model with proper reuse.The EvaluationRequest model properly reuses existing domain models (AgentConfig, Scenario) and includes sensible defaults for retry and timeout parameters.
18-24: LGTM! Comprehensive status enumeration.The EvaluationStatus enum covers all necessary job lifecycle states including error and cancellation handling.
26-36: LGTM! Complete job model with proper timestamps and progress tracking.The EvaluationJob model includes all necessary fields for job lifecycle management, with proper Optional typing for fields that may not be set initially.
49-53: LGTM! Generic WebSocket message structure.The WebSocketMessage model provides a flexible structure for real-time communication with proper typing.
rogue/server/api/evaluation.py (4)
19-41: LGTM! Well-structured job creation endpoint.The endpoint properly creates jobs with UUID generation, uses background tasks for async execution, and returns appropriate response models.
43-53: LGTM! Good pagination and filtering support.The list endpoint includes proper pagination parameters and optional status filtering with sensible defaults.
55-61: LGTM! Proper error handling for job retrieval.The get endpoint includes appropriate 404 error handling when jobs are not found.
63-69: LGTM! Proper cancellation endpoint with error handling.The cancel endpoint correctly handles job cancellation with appropriate error responses for non-existent jobs.
rogue/server/services/evaluation_service.py (2)
39-49: LGTM! Clean cancellation logic.The method correctly validates job existence and cancellable states before updating status and timestamps. The boolean return value provides clear success indication.
91-96: LGTM! Comprehensive error handling.The exception handling properly updates job status, captures error messages, and ensures completion timestamps are always set via the finally block.
rogue/__init__.py (3)
1-15: LGTM! Clean modular API design.The restructured
__init__.pyprovides a clean, library-oriented API while maintaining backward compatibility by importing legacy submodules. The approach supports both new evaluation workflows and existing usage patterns.
27-29: Excellent use of function aliases for clean API.Creating top-level aliases for
EvaluationLibrarymethods provides a clean, functional API that's easy to use and discover.
32-57: Well-structured all list with clear categorization.The explicit
__all__list with categorized comments makes the public API surface clear and maintainable.refactor_plan.md (3)
1-48: Comprehensive architectural plan with clear inspiration.The refactor plan is well-structured and clearly inspired by OpenCode's successful patterns. The high-level architecture diagram effectively communicates the separation of concerns between frontends, server, and ADK agent.
122-176: Well-phased implementation approach.The implementation phases are logically sequenced, starting with core infrastructure and progressing through SDKs to frontends. The timeline appears realistic for the scope of work.
419-434: Thoughtful migration strategy prioritizing backward compatibility.The migration approach properly balances innovation with user continuity by maintaining existing functionality during the transition period.
rogue/services/evaluation_library.py (5)
14-23: LGTM! Clear class documentation and purpose.The docstring clearly explains the library's role as a core evaluation engine used by multiple components, which aligns well with the modular architecture goals.
25-67: Solid async evaluation method with good error handling.The method properly handles the async generator from the service, provides progress callbacks, and includes error handling for missing results.
69-98: Clean streaming interface.The streaming method provides a clean async generator interface that directly exposes the underlying service's updates.
119-126: Proper synchronous wrapper implementation.Using
asyncio.run()is the correct approach for providing a synchronous interface to async functionality.
152-175: No field name mismatch in AgentConfig instantiationThe aliases used in
quick_evaluate—evaluated_agent_url,evaluated_agent_auth_type,evaluated_agent_credentials,judge_llm_model, andjudge_llm_api_key—exactly match theField(alias=…)definitions inAgentConfig. You can safely ignore the original comment.Likely an incorrect or invalid review comment.
rogue/server/websocket/manager.py (4)
12-27: LGTM! Clean connection management with proper grouping.The connection management properly groups WebSockets by job ID and handles global connections. The logic for adding connections to the appropriate lists is correct.
47-59: Excellent error handling with connection cleanup.The broadcast methods properly handle connection failures and clean up disconnected WebSockets, preventing memory leaks and stale connections.
75-94: Well-structured job update broadcasting.The
broadcast_job_updatemethod properly constructs structured messages and broadcasts to both job-specific and global connections, enabling flexible client subscription patterns.
101-120: WebSocket endpoints follow FastAPI patterns correctly.Both endpoints properly handle connection lifecycle, use appropriate decorators, and implement the keep-alive pattern with proper exception handling.
| class EvaluationService: | ||
| def __init__(self): | ||
| self.jobs: Dict[str, EvaluationJob] = {} |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider thread safety for concurrent access.
The in-memory jobs dictionary lacks synchronization mechanisms, which could lead to race conditions when multiple concurrent requests access or modify jobs simultaneously.
Consider using asyncio.Lock or threading.RLock to protect the jobs dictionary:
class EvaluationService:
def __init__(self):
self.jobs: Dict[str, EvaluationJob] = {}
+ self._lock = asyncio.Lock()Then protect all dictionary operations:
def add_job(self, job: EvaluationJob):
+ async with self._lock:
self.jobs[job.job_id] = jobCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In rogue/server/services/evaluation_service.py around lines 11 to 13, the
in-memory jobs dictionary is accessed without synchronization, risking race
conditions in concurrent scenarios. Introduce a lock such as asyncio.Lock or
threading.RLock as a class attribute, then wrap all accesses and modifications
to the jobs dictionary within the lock's context to ensure thread safety.
There was a problem hiding this comment.
🧩 Analysis chain
I'll help implement thread safety for the EvaluationService. Let me first examine the complete implementation to identify all methods that need protection.
@yuval-qf Here's the complete implementation with thread safety using asyncio.Lock:
import asyncio
from typing import Dict, List, Optional, Any
from datetime import datetime, timezone
from ..models.api_models import EvaluationJob, EvaluationStatus
from ..websocket.manager import websocket_manager
from ...services.evaluation_library import EvaluationLibrary
from ...models.scenario import Scenarios
class EvaluationService:
def __init__(self):
self.jobs: Dict[str, EvaluationJob] = {}
self._lock = asyncio.Lock()
async def add_job(self, job: EvaluationJob):
async with self._lock:
self.jobs[job.job_id] = job
async def get_job(self, job_id: str) -> Optional[EvaluationJob]:
async with self._lock:
return self.jobs.get(job_id)
async def get_jobs(self, status: Optional[EvaluationStatus] = None, limit: int = 50, offset: int = 0) -> List[EvaluationJob]:
async with self._lock:
jobs = list(self.jobs.values())
if status:
jobs = [job for job in jobs if job.status == status]
return jobs[offset:offset + limit]
async def get_job_count(self, status: Optional[EvaluationStatus] = None) -> int:
async with self._lock:
jobs = list(self.jobs.values())
if status:
jobs = [job for job in jobs if job.status == status]
return len(jobs)
async def cancel_job(self, job_id: str) -> bool:
async with self._lock:
job = self.jobs.get(job_id)
if job and job.status in [EvaluationStatus.pending, EvaluationStatus.running]:
job.status = EvaluationStatus.cancelled
job.completed_at = datetime.now(timezone.utc)
return True
return False
async def run_evaluation(self, job_id: str):
# Get job outside the lock since we'll be updating it during evaluation
async with self._lock:
job = self.jobs.get(job_id)
if not job:
return
# Update job status to running
async with self._lock:
job.status = EvaluationStatus.running
job.started_at = datetime.now(timezone.utc)
try:
# Convert scenarios
scenarios = Scenarios([scenario for scenario in job.request.scenarios])
# Define progress callback
async def progress_callback(status: str, data: Any):
if status == "progress":
async with self._lock:
job.progress = data.get("progress", 0.0)
elif status == "chat":
await self._notify_chat_update(job_id, data)
await self._notify_job_update(job)
# Run evaluation
results = await EvaluationLibrary.evaluate_agent(
agent_config=job.request.agent_config,
scenarios=scenarios,
business_context="You are evaluating an AI agent's performance across various scenarios.",
progress_callback=progress_callback
)
# Update job with results
async with self._lock:
job.results = results
job.status = EvaluationStatus.completed
job.progress = 1.0
except Exception as e:
async with self._lock:
job.status = EvaluationStatus.failed
job.error_message = str(e)
finally:
async with self._lock:
job.completed_at = datetime.now(timezone.utc)
await self._notify_job_update(job)
async def _notify_job_update(self, job: EvaluationJob):
"""Send job update via WebSocket"""
asyncio.create_task(websocket_manager.broadcast_job_update(job))
async def _notify_chat_update(self, job_id: str, chat_data: Any):
"""Send chat update via WebSocket"""
from ..models.api_models import WebSocketMessage
message = WebSocketMessage(
type="chat",
job_id=job_id,
data=chat_data
)
asyncio.create_task(websocket_manager.broadcast_to_job(job_id, message))Key changes made:
- Added
asyncio.Lock: Addedself._lock = asyncio.Lock()to the constructor - Protected all dictionary access: Wrapped all reads/writes to
self.jobswithasync with self._lock: - Made methods async: All methods that access the jobs dictionary are now
async - Optimized filtering: In
get_jobsandget_job_count, we copy the jobs list inside the lock, then do filtering outside to minimize lock time - Careful lock management in
run_evaluation: Used multiple smaller lock acquisitions instead of holding the lock during the entire evaluation process
The API endpoints that call these methods will also need to be updated to use await when calling the service methods.
🏁 Script executed:
#!/bin/bash
# Get the complete EvaluationService implementation to see all methods that access the jobs dictionary
cat rogue/server/services/evaluation_service.py
Length of output: 9516
@yuval-qf Here's the complete updated EvaluationService implementation with thread safety using asyncio.Lock:
import asyncio
from typing import Dict, List, Optional, Any
from datetime import datetime, timezone
from ..models.api_models import EvaluationJob, EvaluationStatus
from ..websocket.manager import websocket_manager
from ...services.evaluation_library import EvaluationLibrary
from ...models.scenario import Scenarios
from ...common.logging import get_logger, set_job_context
class EvaluationService:
def __init__(self):
self.jobs: Dict[str, EvaluationJob] = {}
self.logger = get_logger(__name__)
self._lock = asyncio.Lock()
async def add_job(self, job: EvaluationJob):
async with self._lock:
self.jobs[job.job_id] = job
async def get_job(self, job_id: str) -> Optional[EvaluationJob]:
async with self._lock:
return self.jobs.get(job_id)
async def get_jobs(
self,
status: Optional[EvaluationStatus] = None,
limit: int = 50,
offset: int = 0,
) -> List[EvaluationJob]:
async with self._lock:
jobs = list(self.jobs.values())
if status:
jobs = [job for job in jobs if job.status == status]
jobs.sort(key=lambda x: x.created_at, reverse=True)
return jobs[offset : offset + limit]
async def get_job_count(self, status: Optional[EvaluationStatus] = None) -> int:
async with self._lock:
if status:
return len([job for job in self.jobs.values() if job.status == status])
return len(self.jobs)
async def cancel_job(self, job_id: str) -> bool:
async with self._lock:
job = self.jobs.get(job_id)
if not job:
return False
if job.status in [EvaluationStatus.PENDING, EvaluationStatus.RUNNING]:
job.status = EvaluationStatus.CANCELLED
job.completed_at = datetime.now(timezone.utc)
# Call _notify_job_update outside the lock since it doesn't access jobs dict
# Notify outside the lock to avoid deadlock
if job and job.status == EvaluationStatus.CANCELLED:
self._notify_job_update(job)
return True
async def run_evaluation(self, job_id: str):
# Get job with lock protection
async with self._lock:
job = self.jobs.get(job_id)
if not job:
return
try:
# Set job context for logging
set_job_context(
job_id=job_id,
agent_url=str(job.request.agent_config.evaluated_agent_url),
)
self.logger.info(
"Starting evaluation job",
extra={
"job_status": "running",
"scenario_count": len(job.request.scenarios),
"agent_url": str(job.request.agent_config.evaluated_agent_url),
"judge_llm": job.request.agent_config.judge_llm_model,
},
)
# Update job status with lock protection
async with self._lock:
job.status = EvaluationStatus.RUNNING
job.started_at = datetime.now(timezone.utc)
self._notify_job_update(job)
# Convert SDK scenarios to legacy scenarios for the evaluation service
from ...models.scenario import (
Scenario as LegacyScenario,
ScenarioType as LegacyScenarioType,
)
self.logger.info(
f"Converting {len(job.request.scenarios)} SDK scenarios to "
"legacy format"
)
legacy_scenarios = []
for i, sdk_scenario in enumerate(job.request.scenarios):
# Convert SDK scenario to legacy scenario
legacy_scenario_type = (
LegacyScenarioType.POLICY
if sdk_scenario.scenario_type.value == "policy"
else LegacyScenarioType.PROMPT_INJECTION
)
legacy_scenario = LegacyScenario(
scenario=sdk_scenario.scenario,
scenario_type=legacy_scenario_type,
dataset=sdk_scenario.dataset,
dataset_sample_size=sdk_scenario.dataset_sample_size,
expected_outcome=sdk_scenario.expected_outcome,
)
legacy_scenarios.append(legacy_scenario)
scenarios = Scenarios(scenarios=legacy_scenarios)
self.logger.info(
(
"Successfully created Scenarios object with "
f"{len(scenarios.scenarios)} scenarios"
)
)
# Create progress callback for real-time updates
def progress_callback(update_type: str, data: Any):
self.logger.debug(
"Evaluation progress update",
extra={
"update_type": update_type,
"data_preview": str(data)[:100] if data else None,
},
)
if update_type == "status":
# Update progress based on status messages
if "Running scenarios" in str(data):
# Use asyncio.create_task to handle async lock in callback
asyncio.create_task(self._update_job_progress(job_id, 0.1))
elif "conversation" in str(data).lower():
# Estimate progress based on conversations
asyncio.create_task(self._increment_job_progress(job_id, 0.1, 0.9))
elif update_type == "chat":
# Real-time chat updates via WebSocket
self.logger.info(f"Received chat update for job {job_id}: {data}")
self._notify_chat_update(job_id, data)
# Use the library interface directly
self.logger.info(
"Calling evaluation library",
extra={
"business_context": "The agent provides customer service.",
"agent_url": str(job.request.agent_config.evaluated_agent_url),
"judge_llm": job.request.agent_config.judge_llm_model,
"scenario_count": len(scenarios.scenarios),
},
)
evaluation_results = await EvaluationLibrary.evaluate_agent(
agent_config=job.request.agent_config,
scenarios=scenarios,
# TODO: FIX THIS!@!!!!!!
business_context="",
progress_callback=progress_callback,
)
self.logger.info("Evaluation library returned results")
# Update job with results using lock protection
async with self._lock:
job.results = evaluation_results.results
job.status = EvaluationStatus.COMPLETED
job.progress = 1.0
self.logger.info(
"Evaluation job completed successfully",
extra={
"job_status": "completed",
"results_count": len(evaluation_results.results),
"duration_seconds": (
(datetime.now(timezone.utc) - job.started_at).total_seconds()
if job.started_at
else None
),
},
)
except Exception as e:
error_type = type(e).__name__
error_msg = str(e)
# Provide user-friendly error messages
if "APIError" in error_type and "Connection error" in error_msg:
user_error = (
"LLM API Connection Error: Cannot connect to "
f"{job.request.agent_config.judge_llm_model}. "
"Please check your API key and network connection."
)
elif "APIError" in error_type and "authentication" in error_msg.lower():
user_error = (
"LLM Authentication Error: Invalid API key for "
f"{job.request.agent_config.judge_llm_model}. "
"Please verify your judge_llm_api_key."
)
elif "timeout" in error_msg.lower():
user_error = (
"Timeout Error: The evaluation took too long. "
"The agent under test may not be responding."
)
else:
user_error = f"Evaluation error: {error_msg}"
# Update job status with lock protection
async with self._lock:
job.status = EvaluationStatus.FAILED
job.error_message = user_error
self.logger.error(
"Evaluation job failed",
extra={
"job_status": "failed",
"error": error_msg,
"error_type": error_type,
"user_error": user_error,
"duration_seconds": (
(datetime.now(timezone.utc) - job.started_at).total_seconds()
if job.started_at
else None
),
},
)
finally:
# Update completion time with lock protection
async with self._lock:
job.completed_at = datetime.now(timezone.utc)
self._notify_job_update(job)
async def _update_job_progress(self, job_id: str, progress: float):
"""Helper method to update job progress with lock protection"""
async with self._lock:
job = self.jobs.get(job_id)
if job:
job.progress = progress
if job:
self._notify_job_update(job)
async def _increment_job_progress(self, job_id: str, increment: float, max_progress: float):
"""Helper method to increment job progress with lock protection"""
async with self._lock:
job = self.jobs.get(job_id)
if job:
job.progress = min(max_progress, job.progress + increment)
if job:
self._notify_job_update(job)
def _notify_job_update(self, job: EvaluationJob):
asyncio.create_task(websocket_manager.broadcast_job_update(job))
def _notify_chat_update(self, job_id: str, chat_data: Any):
"""Send real-time chat updates via WebSocket"""
from ..models.api_models import WebSocketMessage
# Ensure chat_data is in the expected format
if isinstance(chat_data, dict):
# Chat data is already a dict with role/content
data = chat_data
else:
# Convert string to dict format
data = {"role": "assistant", "content": str(chat_data)}
message = WebSocketMessage(
type="chat_update",
job_id=job_id,
data=data,
)
self.logger.debug(f"Sending chat update via WebSocket: {data}")
asyncio.create_task(websocket_manager.broadcast_to_job(job_id, message))Key changes made:
- Added
asyncio.Lock: Addedself._lock = asyncio.Lock()to the constructor - Made methods async: All methods that access
self.jobsare nowasyncand useawait - Protected all dictionary access: Wrapped all reads/writes to
self.jobswithasync with self._lock: - Added helper methods: Created
_update_job_progressand_increment_job_progressto handle progress updates in callbacks with proper locking - Optimized lock usage: In
get_jobs, we copy the jobs list inside the lock, then do filtering/sorting outside to minimize lock time - Careful notification handling: Moved
_notify_job_updatecalls outside the lock context to avoid potential deadlocks
Important: The API endpoints in rogue/server/api/evaluation.py will need to be updated to use await when calling these service methods since they're now async.
🧠 Learnings used
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/clean-code.mdc:0-0
Timestamp: 2025-07-01T11:52:43.736Z
Learning: Refactor continuously
| ### Custom HTTP Client | ||
|
|
||
| ```python | ||
| from rogue_client import RogueHttpClient | ||
|
|
||
| async with RogueHttpClient(config) as http_client: | ||
| health = await http_client.health() | ||
| response = await http_client.create_evaluation(request) | ||
| ``` |
There was a problem hiding this comment.
This means nothing without reading the code. What's custom about the HTTP client if it is imported from Rogue?
| ### Type Checking | ||
|
|
||
| ```bash | ||
| python -m mypy rogue_client/ | ||
| ``` | ||
|
|
||
| ### Code Formatting | ||
|
|
||
| ```bash | ||
| python -m black rogue_client/ | ||
| python -m flake8 rogue_client/ | ||
| ``` |
| class EvaluationService: | ||
| def __init__(self): | ||
| self.jobs: Dict[str, EvaluationJob] = {} |
Co-authored-by: yuval-qf <yuval@qualifire.ai>
There was a problem hiding this comment.
Actionable comments posted: 10
🔭 Outside diff range comments (9)
.github/actions/rogue/action.yml (1)
83-85: Avoid echoing sensitive arguments in CI logsThe echo prints the full command including potential secrets (e.g.,
--evaluated-agent-credentials,--judge-llm-api-key). Even with GitHub’s masking, inputs might not be provided as “secrets”, so values could leak.Apply this diff to prevent leakage and improve masking:
- echo "Running: 'uv run -m rogue cli "${ARGS[@]}"'" - uv run -m rogue cli "${ARGS[@]}" + # Mask sensitive inputs if provided + if [[ -n "${{ inputs.judge_llm_api_key }}" ]]; then + echo "::add-mask::${{ inputs.judge_llm_api_key }}" + fi + if [[ -n "${{ inputs.evaluated_agent_credentials }}" ]]; then + echo "::add-mask::${{ inputs.evaluated_agent_credentials }}" + fi + + echo "Running: 'uv run -m rogue cli [args redacted]'" + uv run -m rogue cli "${ARGS[@]}"rogue/models/cli_input.py (1)
40-44: Ensure non-empty credentials when auth is requiredThe current check allows SecretStr("") to pass, since a SecretStr instance is truthy. Validate the underlying secret string is not empty.
Apply this diff:
- if auth_type != AuthType.NO_AUTH and not auth_credentials: + if auth_type != AuthType.NO_AUTH and ( + not auth_credentials + or not auth_credentials.get_secret_value().strip() + ): raise ValueError( "Authentication Credentials cannot be empty for the selected auth type." )sdks/python/rogue_sdk/types.py (1)
382-387: Fix HttpUrl coercion:HttpUrl(v)is invalid in Pydantic v2 and will raise.
HttpUrlisn’t directly callable with a URL string. UseTypeAdapter(HttpUrl).validate_python(v)(or simplify by makingbase_url: HttpUrland dropping the validator). As-is, constructingRogueClientConfigwith a stringbase_urlis likely to crash.Apply this diff:
@@ - @field_validator("base_url", mode="after") - def validate_base_url(cls, v: str | HttpUrl) -> HttpUrl: - if isinstance(v, str): - return HttpUrl(v) - return v + @field_validator("base_url", mode="after") + def validate_base_url(cls, v: str | HttpUrl) -> HttpUrl: + if isinstance(v, str): + from pydantic import TypeAdapter + return TypeAdapter(HttpUrl).validate_python(v) + return vAlternatively (simpler and preferred):
- Change
base_url: HttpUrl | strtobase_url: HttpUrl- Remove the validator entirely and let Pydantic coerce from str.
rogue/tests/models/test_evaluation_result.py (1)
38-48: Fix: calling a @staticmethod in class body raises (staticmethod object not callable).Within the class body,
get_evaluation_resultis astaticmethodobject, not directly callable. This will fail at import time. Move the helper to module level (preferred) or build theEvaluationResultinline.Apply this diff to move the helper out of the class and use it:
@@ -class TestEvaluationResults: +def get_evaluation_result( + scenario: Scenario, + conversation: ConversationEvaluation, +) -> EvaluationResult: + return EvaluationResult( + scenario=scenario, + conversations=[conversation], + passed=conversation.passed, + ) + + +class TestEvaluationResults: @@ - @staticmethod - def get_evaluation_result( - scenario: Scenario, - conversation: ConversationEvaluation, - ) -> EvaluationResult: - return EvaluationResult( - scenario=scenario, - conversations=[conversation], - passed=conversation.passed, - ) + # helper moved to module level (see above)rogue/common/workdir_utils.py (1)
29-35: Critical: credentials are not excluded; PII may be written to disk.
sanitized_configfilters keys ending with_keyandauth_credentials, but the actual field isevaluated_agent_credentials. This risks persisting credentials touser_config.json.Apply this diff to robustly exclude keys and None values:
- config_dict = config.model_dump(mode="json") - # Not storing any api keys or credentials - sanitized_config = { - k: v - for k, v in config_dict.items() - if not k.endswith("_key") and k != "auth_credentials" - } + # Not storing any API keys or credentials + sanitized_config = config.model_dump( + mode="json", + exclude={ + "evaluated_agent_credentials", + "judge_llm_api_key", + }, + exclude_none=True, + )Optionally also exclude any key ending with
_credentialsgenerically.rogue/ui/components/config_screen.py (2)
134-139: Error label keys use outdated field names; validation hints won't display correctly.Pydantic errors will reference evaluated_* fields (e.g., evaluated_agent_url), but error_labels maps "agent_url"/"auth_credentials", so UI won't show the specific field errors.
- error_labels = { - "agent_url": agent_url_error, - "auth_credentials": auth_credentials_error, - "judge_llm_api_key": judge_llm_api_key_error, - # "huggingface_api_key": huggingface_api_key_error, - } + error_labels = { + "evaluated_agent_url": agent_url_error, + "evaluated_agent_credentials": auth_credentials_error, + "judge_llm_api_key": judge_llm_api_key_error, + # "huggingface_api_key": huggingface_api_key_error, + }
146-157: State update keys are inconsistent with the new evaluated_ schema.*You read evaluated_* from config but write legacy keys ("agent_url", "auth_type", "auth_credentials"). Align keys to avoid confusion and ensure downstream consumers rely on one schema.
- for component, key in [ - (agent_url, "agent_url"), + for component, key in [ + (agent_url, "evaluated_agent_url"), (interview_mode, "interview_mode"), - (auth_type, "auth_type"), - (auth_credentials, "auth_credentials"), + (auth_type, "evaluated_agent_auth_type"), + (auth_credentials, "evaluated_agent_credentials"), (service_llm, "service_llm"), (judge_llm, "judge_llm"), (judge_llm_api_key, "judge_llm_api_key"), # (huggingface_api_key, "huggingface_api_key"), (deep_test_mode, "deep_test_mode"), (parallel_runs, "parallel_runs"), ]:rogue/run_cli.py (2)
33-39: Fix argparse type/choices mismatch for --evaluated-agent-auth-typetype=AuthType + choices as strings breaks validation. Use strings for parsing and let Pydantic convert later.
- parser.add_argument( + parser.add_argument( "--evaluated-agent-auth-type", required=False, - type=AuthType, - choices=[e.value for e in AuthType], + type=str, + choices=[e.value for e in AuthType], help="How to authenticate with the evaluated agent (if needed)." f"Valid options are: {[e.value for e in AuthType]}", )
317-319: Avoid logging full CLI input; may leak credentialscli_input.model_dump() can include API keys and agent credentials. Log only safe metadata.
[security]
- logger.debug("Running CLI", extra=cli_input.model_dump()) + logger.debug( + "Running CLI", + extra={ + "evaluated_agent_url": cli_input.evaluated_agent_url.encoded_string(), + "deep_test_mode": cli_input.deep_test_mode, + "judge_llm": cli_input.judge_llm, + }, + )
♻️ Duplicate comments (38)
.github/workflows/rogue.yml (3)
29-34: Consolidate installs into one editable install transactionRunning two separate editable installs can lead to resolution/order drift. Install both in one uv pip call from repo root.
- - name: Install rogue sdk - run: source .venv/bin/activate && uv pip install -e sdks/python - - - name: Install rogue server - run: source .venv/bin/activate && uv pip install -e . + - name: Install rogue server and SDK (editable) + run: source .venv/bin/activate && uv pip install -e . -e sdks/python
40-42: Remove trailing ampersand; background-action handles daemonizingThe background-action manages lifecycle and log tailing. Keeping “&” can orphan processes and break cancellation.
- run: uv run examples/tshirt_store_agent --host 0.0.0.0 --port 10001 & + run: uv run examples/tshirt_store_agent --host 0.0.0.0 --port 10001
44-51: Use background-action without ‘&’; server CLI path is valid in this PR
- Remove “&” as above.
- The earlier advice to switch to
-m rogue.serveris now optional: since this PR adds a top-levelserversubcommand inrogue/__main__.py,uv run -m rogue server …is valid. Keep it if you prefer unified CLI UX.Apply:
- run: uv run -m rogue server --host 0.0.0.0 --port 8000 & + run: uv run -m rogue server --host 0.0.0.0 --port 8000Quick verification script (confirms the server subcommand exists and args are wired):
#!/bin/bash set -euo pipefail echo "=== Rogue CLI server subparser and args ===" rg -n -C3 'add_parser\(\s*"server"' rogue/__main__.py rg -n -C3 'def set_server_args' rogue/run_server.py rg -n -C2 '\-\-host|\-\-port' rogue/run_server.pyrogue/server/main.py (2)
56-62: CORS is overly permissive; restrict by environmentAllowing any origin/headers with credentials is risky in production. Restrict origins based on environment and narrow headers/methods as needed.
Example tightening:
+ import os + allowed_origins = ( + os.getenv("ALLOWED_ORIGINS", "").split(",") + if os.getenv("ALLOWED_ORIGINS") + else ["http://localhost:3000"] + ) app.add_middleware( CORSMiddleware, - allow_origins=["*"], - allow_credentials=True, - allow_methods=["GET", "POST", "PUT", "DELETE"], - allow_headers=["*"], + allow_origins=allowed_origins, + allow_credentials=False, + allow_methods=["GET", "POST", "PUT", "DELETE"], + allow_headers=["Content-Type", "Authorization"], )Note: ensure your chosen
allow_methodscovers actual usage; CORS middleware will still handle preflight OPTIONS automatically.
103-107: Avoid multiple entrypoints; remove main guard here or in rogue/server/main.pyThere’s already a CLI entry via
rogue/__main__.pyand likely a module entry viarogue/server/__main__.py. Keeping a secondif __name__ == "__main__"here can diverge behavior (env vs CLI flags).Remove the block here and consolidate on the module or CLI entrypoint:
-if __name__ == "__main__": - host = os.getenv("HOST", "127.0.0.1") - port = int(os.getenv("PORT", "8000")) - reload = os.getenv("RELOAD", "false").lower() in ("true", "1", "yes") - start_server(host, port, reload)rogue/ui/components/interviewer.py (3)
38-41: Don't hardcode the server URL; read from config/state.Use a configurable base URL (from state["config"]["server_url"] or a sensible default) to support different environments.
- sdk_config = RogueClientConfig( - base_url="http://localhost:8000", - timeout=600.0, - ) + base_url = (config or {}).get("server_url", "http://localhost:8000") + sdk_config = RogueClientConfig( + base_url=base_url, + timeout=600.0, + )
60-64: Return only the bot text; avoid returning a tuple.The Chatbot history expects a single string for the bot message. Returning a tuple will leak implementation details into the UI and break downstream usage (e.g., finalize_context stores the tuple as business context).
- return ( - response.response, - response.is_complete, - response.message_count, - ) + return response.response
65-68: Provide a safe fallback on exceptions (avoid returning None).Currently on failure this returns None, which later gets assigned to history[-1][1]. Return a user-friendly string instead.
except Exception: logger.exception("Failed to send interview message") + return "I encountered an error processing your message. Please try again." finally: await sdk.close()rogue/ui/components/scenario_generator.py (4)
40-45: Don't hardcode the server URL; read from config/state.Pull base_url from state["config"]["server_url"] (or default) to support non-local environments.
- sdk_config = RogueClientConfig( - base_url="http://localhost:8000", - timeout=600.0, - ) + base_url = (config or {}).get("server_url", "http://localhost:8000") + sdk_config = RogueClientConfig( + base_url=base_url, + timeout=600.0, + )
51-54: Improve error semantics: log accurately and re-raise to outer handler.Returning None on error causes downstream None handling issues. Let the outer try/except display the error UI.
- except Exception: - logger.exception("Failed to generate scenarios from LLM response.") + except Exception: + logger.exception("Failed to generate scenarios via SDK.") + raise finally: await sdk.close()
56-57: Avoid asyncio.run() inside Gradio; make the handler async and await.This can raise "asyncio.run() cannot be called from a running event loop".
- try: - scenarios = asyncio.run(generate_scenarios_async()) + try: + scenarios = await generate_scenarios_async()Outside this hunk, update the callback signature to async:
# Change this: def generate_and_display_scenarios(state, current_context): # To this: async def generate_and_display_scenarios(state, current_context):
46-50: Guard against None model overriding SDK default.If service_llm is missing, passing model=None can break the call. Provide a fallback.
- return await sdk.generate_scenarios( - business_context=current_context, - model=service_llm, - api_key=api_key, - ) + return await sdk.generate_scenarios( + business_context=current_context, + model=service_llm or "openai/gpt-4o-mini", + api_key=api_key, + )sdks/python/rogue_sdk/websocket.py (1)
152-172: Track async handler tasks and log exceptions correctlycreate_task without tracking can leak tasks; also logger.exception(exc_info=t.exception()) doesn’t attach the traceback in stdlib logging. Track tasks, add a done-callback that logs the exception with a proper exc_info tuple, and cancel/gather on disconnect.
class RogueWebSocketClient: @@ self._stop_event = asyncio.Event() self._message_handler_task: Optional[asyncio.Task] = None + self._event_handler_tasks: List[asyncio.Task] = [] @@ async def disconnect(self) -> None: """Disconnect from WebSocket.""" self._stop_event.set() self.is_connected = False + # Cancel any pending event handler tasks + for task in list(self._event_handler_tasks): + if not task.done(): + task.cancel() + if self._event_handler_tasks: + await asyncio.gather(*self._event_handler_tasks, return_exceptions=True) + self._event_handler_tasks.clear() @@ def _emit(self, event: WebSocketEventType, data: Any) -> None: """Emit event to handlers.""" handlers = self.event_handlers.get(event, []) - for handler in handlers: + for handler in handlers: try: if asyncio.iscoroutinefunction(handler): - task = asyncio.create_task(handler(event, data)) - task.add_done_callback( - lambda t: ( - logger.exception( - "WS handler error", - exc_info=t.exception(), - ) - if t.exception() - else None - ) - ) + task = asyncio.create_task(handler(event, data)) + self._event_handler_tasks.append(task) + def _on_task_done(t: asyncio.Task): + exc = t.exception() + if exc: + logger.exception( + "WS handler error", + exc_info=(type(exc), exc, exc.__traceback__), + ) + # prune finished tasks + self._event_handler_tasks = [ + et for et in self._event_handler_tasks if not et.done() + ] + task.add_done_callback(_on_task_done) else: handler(event, data) except Exception: logger.exception("Error in event handler")rogue/server/services/evaluation_service.py (7)
13-16: Prefer importing the shared websocket_manager singleton directlyYou can import websocket_manager instead of get_websocket_manager() to avoid confusion about multiple instances (the manager module already exposes a cached singleton).
-from ..websocket.manager import get_websocket_manager +from ..websocket.manager import websocket_manager @@ - self.websocket_manager = get_websocket_manager() + self.websocket_manager = websocket_managerAlso applies to: 24-25
211-213: Guard completion timestamp with the lockSerialize writes to completed_at.
- job.completed_at = datetime.now(timezone.utc) + async with self._lock: + job.completed_at = datetime.now(timezone.utc) self._notify_job_update(job)
148-157: Guard completion updates with the lockAvoid concurrent writes to results/status/progress fields.
- # Update job with results - if final_results and final_results.results: - job.results = final_results.results - job.status = EvaluationStatus.COMPLETED - job.progress = 1.0 - else: - # No results - mark as failed - job.status = EvaluationStatus.FAILED - job.error_message = "Evaluation completed but no results were generated" + # Update job with results + async with self._lock: + if final_results and final_results.results: + job.results = final_results.results + job.status = EvaluationStatus.COMPLETED + job.progress = 1.0 + else: + # No results - mark as failed + job.status = EvaluationStatus.FAILED + job.error_message = "Evaluation completed but no results were generated"
90-93: Protect job state updates with the async lockrun_job mutates shared job state without holding the lock, risking races with concurrent readers/writers (e.g., cancel_job).
- job.status = EvaluationStatus.RUNNING - job.started_at = datetime.now(timezone.utc) + async with self._lock: + job.status = EvaluationStatus.RUNNING + job.started_at = datetime.now(timezone.utc) self._notify_job_update(job)
196-198: Guard failure updates with the lockSerialize failure state assignment.
- job.status = EvaluationStatus.FAILED - job.error_message = user_error + async with self._lock: + job.status = EvaluationStatus.FAILED + job.error_message = user_error
131-142: Guard progress mutations with the lockThese updates should be serialized to avoid lost updates or inconsistent views.
if update_type == "status": # Update progress based on status messages if "Running scenarios" in str(data): - job.progress = 0.1 + async with self._lock: + job.progress = 0.1 self._notify_job_update(job) elif update_type == "chat": # Real-time chat updates via WebSocket logger.info(f"Received chat update for job {job_id}: {data}") self._notify_chat_update(job_id, data) - job.progress = min(0.9, job.progress + 0.1) + async with self._lock: + job.progress = min(0.9, job.progress + 0.1) self._notify_job_update(job)
56-67: Fix cancel semantics: return True only when a cancellation happenedCurrently returns True even if no state change occurred. Guard with a lock and return based on actual transition.
- async def cancel_job(self, job_id: str) -> bool: - job = await self.get_job(job_id) - if not job: - return False - - if job.status in [EvaluationStatus.PENDING, EvaluationStatus.RUNNING]: - job.status = EvaluationStatus.CANCELLED - job.completed_at = datetime.now(timezone.utc) - self._notify_job_update(job) - - return True + async def cancel_job(self, job_id: str) -> bool: + cancelled = False + async with self._lock: + job = self.jobs.get(job_id) + if not job: + return False + if job.status in [EvaluationStatus.PENDING, EvaluationStatus.RUNNING]: + job.status = EvaluationStatus.CANCELLED + job.completed_at = datetime.now(timezone.utc) + cancelled = True + if cancelled: + self._notify_job_update(job) + return cancelledsdks/python/rogue_sdk/__init__.py (1)
6-40: Update example in sdks/python/rogue_sdk/init.py to userun_evaluationand correctly constructEvaluationRequestThe docstring’s sample still references the removed
quick_evaluatemethod and passes raw dicts torun_evaluation_with_updates, which no longer matches the SDK’s API. Please update the example to userun_evaluationand to wrap parameters in the appropriate Pydantic models.• File:
sdks/python/rogue_sdk/__init__.py
• Lines: 6–40Suggested diff:
- import asyncio - from rogue_sdk import RogueSDK, RogueClientConfig, AuthType, ScenarioType + import asyncio + from rogue_sdk import ( + RogueSDK, + RogueClientConfig, + AuthType, + ScenarioType, + EvaluationRequest, + AgentConfig, + Scenario, + Scenarios, + ) @@ - async with RogueSDK(config) as client: - # Quick evaluation - result = await client.quick_evaluate( - agent_url="http://localhost:3000", - scenarios=["The agent should be polite", "No discounts allowed"] - ) - print(f"Evaluation result: {result.status}") + async with RogueSDK(config) as client: + # Quick evaluation + result = await client.run_evaluation( + agent_url="http://localhost:3000", + scenarios=Scenarios( + scenarios=[ + Scenario(scenario="The agent should be polite", scenario_type=ScenarioType.POLICY), + Scenario(scenario="No discounts allowed", scenario_type=ScenarioType.POLICY), + ] + ), + business_context="Customer service context", + ) + print(f"Evaluation status: {result.status}") @@ - # Evaluation with real-time updates - job = await client.run_evaluation_with_updates( - agent_config=AgentConfig( - evaluated_agent_url="http://localhost:3000", - evaluated_agent_auth_type=AuthType.NO_AUTH, - judge_llm="openai/gpt-4o-mini", - business_context="Customer service context", - ), - scenarios=Scenarios(scenarios=[ - Scenario(scenario="Test scenario", scenario_type=ScenarioType.POLICY) - ]), - on_update=lambda update: print(f"Status: {update.get('status')}"), - on_chat=lambda chat: print(f"Chat: {chat}"), - ) + # Evaluation with real-time updates + job = await client.run_evaluation_with_updates( + request=EvaluationRequest( + agent_config=AgentConfig( + evaluated_agent_url="http://localhost:3000", + evaluated_agent_auth_type=AuthType.NO_AUTH, + judge_llm="openai/gpt-4o-mini", + business_context="Customer service context", + ), + scenarios=[ + Scenario(scenario="Test scenario", scenario_type=ScenarioType.POLICY) + ], + ), + on_update=lambda update: print(f"Status: {update['status']}"), + on_chat=lambda chat: print(f"Chat: {chat}"), + )rogue/ui/components/scenario_runner.py (6)
134-140: Clamp parallel_runs to UI capacity and enforce >0Avoid creating more groups than the UI renders and prevent zero/negative batches. This also avoids empty batches and index math issues.
- parallel_runs = config.get("parallel_runs", 1) + parallel_runs = max(1, int(config.get("parallel_runs", 1))) + if parallel_runs > MAX_PARALLEL_RUNS: + logger.warning( + f"parallel_runs={parallel_runs} exceeds UI capacity; clamping to {MAX_PARALLEL_RUNS}" + ) + parallel_runs = MAX_PARALLEL_RUNS logger.info(f"Setting up {parallel_runs} parallel runs")
231-250: Synchronous callback schedules async task without error handlingUncaught exceptions in the created task get lost. At minimum, wrap create_task in try/except or attach a done callback to log task exceptions.
def on_chat_update(chat_data): """Handle real-time chat updates from SDK""" logger.info( f"Worker {worker_id}: Received chat update: {chat_data}", ) - # Use asyncio.create_task to schedule the async operation - asyncio.create_task( - update_queue.put( - ( - worker_id, - "chat", - { - "role": chat_data.get("role", "assistant"), - "content": chat_data.get("content", ""), - }, - ) - ) - ) + try: + task = asyncio.create_task( + update_queue.put( + ( + worker_id, + "chat", + { + "role": chat_data.get("role", "assistant"), + "content": chat_data.get("content", ""), + }, + ) + ) + ) + task.add_done_callback( + lambda t: logger.error(f"chat update task failed: {t.exception()}") + if t.exception() + else None + ) + except Exception as e: + logger.error(f"Failed to queue chat update: {e}")
252-269: Same issue for status callback; handle task errorsMirror the error handling pattern used for chat updates.
def on_status_update(status_data): @@ - # Use asyncio.create_task to schedule the async operation - asyncio.create_task( - update_queue.put((worker_id, "status", status_msg)) - ) + try: + task = asyncio.create_task( + update_queue.put((worker_id, "status", status_msg)) + ) + task.add_done_callback( + lambda t: logger.error(f"status update task failed: {t.exception()}") + if t.exception() + else None + ) + except Exception as e: + logger.error(f"Failed to queue status update: {e}")
195-200: Hardcoded server URL; make configurableUsing HttpUrl("http://localhost:8000") hardcodes the server endpoint and breaks non-local deployments. Read from worker_config (or env) with a sensible default.
logger.info(f"🔌 Worker {worker_id}: Initializing SDK connection") - sdk_config = RogueClientConfig( - base_url=HttpUrl("http://localhost:8000"), - timeout=600.0, - ) + server_url = worker_config.get("server_url", os.getenv("ROGUE_SERVER_URL", "http://localhost:8000")) + sdk_config = RogueClientConfig( + base_url=server_url, + timeout=600.0, + ) sdk = RogueSDK(sdk_config)Note: add at top of file:
import os
214-221: scenarios is undefined here; pass the batch wrapped in ScenariosThis will raise NameError. Use the current worker’s batch and wrap it with the SDK Scenarios type.
- # Create evaluation request - request = EvaluationRequest( - agent_config=agent_config, - scenarios=scenarios, - ) + # Create evaluation request from this worker's batch + sdk_scenarios = Scenarios(scenarios=batch) + request = EvaluationRequest( + agent_config=agent_config, + scenarios=sdk_scenarios, + )
477-493: Summary generation: hardcoded URL, wrong model, resource leak, and undefined ‘summary’ on failure
- Hardcoded base_url makes deployments brittle.
- Model should be the judge model; using service_llm is misleading.
- sdk.close() should be tied to a context or finally.
- ‘summary’ is used even if an exception occurs → UnboundLocalError.
- # Generate summary using SDK (server-based) - try: - sdk_config = RogueClientConfig( - base_url="http://localhost:8000", - timeout=600.0, - ) - sdk = RogueSDK(sdk_config) - - summary = await sdk.generate_summary( - results=all_results, - model=config.get("service_llm"), - api_key=config.get("judge_llm_api_key"), - ) - - await sdk.close() - except Exception: - logger.exception("Summary generation failed") + # Generate summary using SDK (server-based) + summary = "Failed to generate summary." + try: + server_url = config.get("server_url", os.getenv("ROGUE_SERVER_URL", "http://localhost:8000")) + sdk_config = RogueClientConfig( + base_url=server_url, + timeout=600.0, + ) + async with RogueSDK(sdk_config) as sdk: + summary = await sdk.generate_summary( + results=all_results, + model=(config.get("judge_llm") or config.get("judge_llm_model") or config.get("service_llm")), + api_key=config.get("judge_llm_api_key"), + ) + except Exception: + logger.exception("Summary generation failed")Note: Ensure
import osis present at the top.rogue/evaluator_agent/evaluator_agent.py (1)
559-572: Return the agent’s response text, not the serialized objectThe docstring promises "response" is a string. Returning response.model_dump_json() breaks that contract and confuses downstream consumers.
logger.info( "✅ A2A call successful - received response from evaluated agent", extra={ "response_length": len(agent_response_text), "response_preview": ( agent_response_text[:100] + "..." if len(agent_response_text) > 100 else agent_response_text ), "context_id": context_id, }, ) - return {"response": response.model_dump_json()} + return {"response": agent_response_text}rogue/server/api/evaluation.py (2)
27-33: Use HTTP 202 for async job creation and include Location headerThis endpoint enqueues background work. 202 Accepted with a Location to the job resource is more appropriate.
-@router.post("", response_model=EvaluationResponse) +@router.post("", response_model=EvaluationResponse, status_code=202) async def create_evaluation( request: EvaluationRequest, background_tasks: BackgroundTasks, + response, evaluation_service: EvaluationService = Depends(get_evaluation_service), ): @@ await evaluation_service.add_job(job) background_tasks.add_task(evaluation_service.run_job, job_id) @@ - return EvaluationResponse( + # Point clients to the job status endpoint + try: + response.headers["Location"] = f"/evaluations/{job_id}" + except Exception: + # best-effort; do not fail the call on header issues + pass + return EvaluationResponse( job_id=job_id, status=EvaluationStatus.PENDING, message="Evaluation job created successfully", )Add missing import at top:
from fastapi import Response, statusand annotate
response: Responsein the signature.Also applies to: 64-76
101-105: Do not log entire job objects; risk of leaking secrets/PIISerializing the full job can expose credentials from agent_config or API keys. Log only safe metadata.
[security]
- job = await evaluation_service.get_job(job_id) - logger.info(f"Job: {job}") + job = await evaluation_service.get_job(job_id) + logger.info( + "Retrieved evaluation job", + extra={"job_id": job_id, "status": getattr(job, "status", None)}, + )rogue/run_cli.py (3)
147-151: Handle health check failures gracefullyAwaiting sdk.health() can raise; exit with a clear message instead of a stack trace.
- # Check if server is available - await sdk.health() + # Check if server is available + try: + await sdk.health() + except Exception as e: + logger.error(f"Unable to reach Rogue server at {rogue_server_url}: {e}") + return None
261-263: Default judge model likely has a typo ("o4-mini" → "gpt-4o-mini")Other parts of the codebase and providers use gpt-4o-mini. Align default for consistency and fewer surprises.
- if data.get("judge_llm") is None: - data["judge_llm"] = "openai/o4-mini" + if data.get("judge_llm") is None: + data["judge_llm"] = "openai/gpt-4o-mini"
171-176: Ensure output directory exists before writing resultsWithout this, write_text can raise FileNotFoundError when parent dirs are missing.
- evaluation_results_output_path.write_text( + evaluation_results_output_path.parent.mkdir(parents=True, exist_ok=True) + evaluation_results_output_path.write_text( results.model_dump_json(indent=2, exclude_none=True), encoding="utf-8", )rogue/server/services/evaluation_library.py (2)
148-161: Add error handling to streaming evaluatorThe streaming variant lacks the robust try/except of evaluate_agent. Mirror that to emit/log failures consistently.
- async for update_type, data in service.evaluate_scenarios(): - yield update_type, data + try: + async for update_type, data in service.evaluate_scenarios(): + yield update_type, data + except Exception: + logger.exception("💥 EvaluationLibrary.evaluate_agent_streaming failed") + raise
181-188: Guard against running inside an existing event loopasyncio.run() will raise in Jupyter/async contexts. Detect and provide a clearer error.
- return asyncio.run( + try: + loop = asyncio.get_running_loop() + if loop.is_running(): + raise RuntimeError( + "evaluate_agent_sync cannot run inside an active event loop. " + "Use evaluate_agent instead." + ) + except RuntimeError: + # No running loop; safe to proceed + pass + return asyncio.run( EvaluationLibrary.evaluate_agent( agent_config=agent_config, scenarios=scenarios, business_context=business_context, progress_callback=progress_callback, ) )sdks/python/rogue_sdk/sdk.py (3)
100-101: Normalize base_url for WebSocket client; strip trailing slash (and drop the no-op + "").Inconsistent trailing slash handling can break WS URL composition and differs from the HTTP client's normalization.
- self.ws_client = RogueWebSocketClient(str(self.config.base_url) + "", job_id) + self.ws_client = RogueWebSocketClient(str(self.config.base_url).rstrip("/"), job_id)
248-257: Bug: Incorrect use of Pydantic HttpUrl — do not call it as a constructor.HttpUrl is a type for validation, not a callable. This will raise at runtime. Pass the raw string and let AgentConfig validate it.
- agent_config = AgentConfig( - evaluated_agent_url=HttpUrl(agent_url), + agent_config = AgentConfig( + evaluated_agent_url=agent_url, evaluated_agent_auth_type=auth_type, evaluated_agent_credentials=auth_credentials, judge_llm=judge_model, deep_test_mode=deep_test, interview_mode=True, parallel_runs=1, business_context=business_context, )
324-333: Docstring is incorrect; method returns SendMessageResponse, not a tuple.This is user-facing API documentation and will mislead consumers.
async def send_interview_message( self, session_id: str, message: str, ) -> SendMessageResponse: - """ - Send a message in an interview session. - - Returns: - tuple: (response, is_complete, message_count) - """ + """ + Send a message in an interview session. + Returns: + SendMessageResponse + """ return await self.http_client.send_interview_message( session_id=session_id, message=message, )
| evaluate_agent = EvaluationLibrary.evaluate_agent | ||
| evaluate_agent_streaming = EvaluationLibrary.evaluate_agent_streaming | ||
| evaluate_agent_sync = EvaluationLibrary.evaluate_agent_sync |
There was a problem hiding this comment.
Exported callables are unbound methods; will require a self argument at call time.
Assigning class methods directly exposes unbound functions that still expect self. Callers invoking evaluate_agent(...) will fail.
Use an instance-bound interface:
-from .server.services.evaluation_library import EvaluationLibrary
+from .server.services.evaluation_library import EvaluationLibrary
+_evaluation_library = EvaluationLibrary()
@@
-evaluate_agent = EvaluationLibrary.evaluate_agent
-evaluate_agent_streaming = EvaluationLibrary.evaluate_agent_streaming
-evaluate_agent_sync = EvaluationLibrary.evaluate_agent_sync
+evaluate_agent = _evaluation_library.evaluate_agent
+evaluate_agent_streaming = _evaluation_library.evaluate_agent_streaming
+evaluate_agent_sync = _evaluation_library.evaluate_agent_sync📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| evaluate_agent = EvaluationLibrary.evaluate_agent | |
| evaluate_agent_streaming = EvaluationLibrary.evaluate_agent_streaming | |
| evaluate_agent_sync = EvaluationLibrary.evaluate_agent_sync | |
| from .server.services.evaluation_library import EvaluationLibrary | |
| _evaluation_library = EvaluationLibrary() | |
| evaluate_agent = _evaluation_library.evaluate_agent | |
| evaluate_agent_streaming = _evaluation_library.evaluate_agent_streaming | |
| evaluate_agent_sync = _evaluation_library.evaluate_agent_sync |
🤖 Prompt for AI Agents
In rogue/__init__.py around lines 23-25 the module assigns class methods
directly (evaluate_agent = EvaluationLibrary.evaluate_agent etc.), which exports
unbound functions that still expect a self argument; instantiate
EvaluationLibrary and export bound instance methods instead (e.g., create a
single EvaluationLibrary() object and set evaluate_agent,
evaluate_agent_streaming, evaluate_agent_sync to that object's methods) or
alternatively convert those methods to @staticmethod/@classmethod in the
class—pick the instance-bound approach to avoid changing class semantics.
| evaluated_agent_url: HttpUrl | ||
| evaluated_agent_auth_type: AuthType = AuthType.NO_AUTH |
There was a problem hiding this comment.
Use AnyHttpUrl to allow localhost/IPs
HttpUrl is strict and often rejects localhost/IPs. Your docs and examples use http://localhost:10001, which will likely fail validation with HttpUrl in Pydantic. Switch to AnyHttpUrl.
Apply this diff:
-from pydantic import BaseModel, Field, HttpUrl, SecretStr, model_validator
+from pydantic import BaseModel, Field, AnyHttpUrl, SecretStr, model_validator
@@
- evaluated_agent_url: HttpUrl
+ evaluated_agent_url: AnyHttpUrl
@@
- evaluated_agent_url: HttpUrl | None = Field(default=None)
+ evaluated_agent_url: AnyHttpUrl | None = Field(default=None)Also applies to: 54-54
🤖 Prompt for AI Agents
In rogue/models/cli_input.py around lines 13-14 (and also update the similar
declaration at line 54), the field typed as HttpUrl is too strict for
localhost/IP addresses used in docs; change the type annotation from HttpUrl to
AnyHttpUrl (importing AnyHttpUrl from pydantic if not already imported) for
evaluated_agent_url (and the other occurrence) so validation accepts localhost
and IP forms.
| input_scenarios_file: Path = workdir / "scenarios.json" | ||
| output_report_file: Path = workdir / "report.md" |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Make file path defaults derive from the actual workdir
input_scenarios_file and output_report_file are computed from the class-level default workdir, so if a caller provides a custom workdir, these won’t update accordingly. Mirror the PartialCLIInput approach so they depend on the instance’s workdir.
Apply this diff to make them optional and set them post-init:
- input_scenarios_file: Path = workdir / "scenarios.json"
- output_report_file: Path = workdir / "report.md"
+ input_scenarios_file: Path | None = None
+ output_report_file: Path | None = NoneThen add this method (outside the selected range) to CLIInput:
def model_post_init(self, __context):
if self.input_scenarios_file is None:
self.input_scenarios_file = self.workdir / "scenarios.json"
if self.output_report_file is None:
self.output_report_file = self.workdir / "report.md"Also applies to: 65-74
🤖 Prompt for AI Agents
In rogue/models/cli_input.py around lines 18-19 (and similarly for the fields
defined at lines 65-74), the default paths are computed from the class-level
workdir so they won't reflect a caller-provided workdir; make
input_scenarios_file and output_report_file Optional[Path] with a default of
None, remove the class-level Path joins, and implement a post-init method on
CLIInput named model_post_init(self, __context) that sets each field to
self.workdir / "scenarios.json" and self.workdir / "report.md" respectively only
if the field is None; apply the same pattern to the other occurrence at lines
65-74.
| from pydantic import BaseModel, Field | ||
|
|
||
| from .chat_history import ChatHistory | ||
| from rogue_sdk.types import ChatHistory |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Switch to centralized ChatHistory import looks correct; verify packaging to avoid runtime ImportError
Good move consolidating types to rogue_sdk.types. Please confirm the rogue package declares a runtime dependency on the SDK (e.g., rogue-sdk) so that from rogue_sdk.types import ChatHistory works for end-users who install only the server/package (outside CI where both are installed in editable mode).
Run this script to verify that the root package declares a dependency on the SDK:
If rogue-sdk (or the actual package name for the SDK) isn’t listed in the root pyproject.toml under [project.dependencies], users doing pip install -e . will hit an ImportError here.
🏁 Script executed:
#!/bin/bash
# Check for 'rogue-sdk' (or similar SDK package name) in project dependencies
echo "Searching for SDK dependency in pyproject files..."
rg -n -C2 -g 'pyproject.toml' -P '(?i)^\s*dependencies\s*=\s*\[|^\s*optional-dependencies|rogue[-_ ]sdk'
# Check that the SDK importable module exists in the repo (sanity)
echo "Verifying SDK module presence..."
fd -H 'types.py' | rg -n 'sdks/python/.*/rogue_sdk/types\.py' -nLength of output: 1138
Ensure the SDK is declared as a runtime dependency
The root pyproject.toml currently lists
[project]
dependencies = [
"a2a-sdk==0.2.10",
"backoff>=2.2.1",
…
]but does not include "rogue-sdk". Since rogue/models/prompt_injection.py does
from rogue_sdk.types import ChatHistoryusers installing your package (outside of editable mode) will get an ImportError.
• In pyproject.toml under [project.dependencies], add the SDK, e.g.:
[project]
dependencies = [
+ "rogue-sdk>=1.0.0",
"a2a-sdk==0.2.10",
"backoff>=2.2.1",
…
]• Verify the version constraint matches the SDK’s release (see sdks/python/pyproject.toml).
• Run pip install . in a clean venv to confirm from rogue_sdk.types import ChatHistory succeeds at runtime.
🤖 Prompt for AI Agents
In rogue/models/prompt_injection.py around line 2 the module imports ChatHistory
from rogue_sdk but the root pyproject.toml does not list "rogue-sdk" as a
runtime dependency; add an appropriate rogue-sdk entry under
[project.dependencies] in the root pyproject.toml (use the version constraint
consistent with sdks/python/pyproject.toml), save, then verify by creating a
clean virtualenv and running pip install . and importing from rogue_sdk.types to
confirm the ImportError is resolved.
| Usage: | ||
| python -m rogue.server | ||
|
|
||
| Environment Variables: | ||
| HOST: Server host (default: 127.0.0.1) | ||
| PORT: Server port (default: 8000) | ||
| RELOAD: Enable auto-reload for development (default: False) | ||
| """ |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Docstring usage may mislead
This file can’t be executed via python -m rogue.server; that targets rogue/server/__main__.py. Either ensure that main delegates here (preferred) or revise usage text to python -m rogue.server.main.
Quick check:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Does rogue/server/__main__.py exist? ==="
fd -a '__main__.py' | rg -n 'rogue/server/__main__\.py' || trueLength of output: 181
Fix module entrypoint in docstring or add __main__.py
The current usage text in rogue/server/main.py claims you can run the server via:
python -m rogue.server
—but there is no rogue/server/__main__.py, so that command will error. To resolve this, choose one of the following:
• Add an entrypoint under rogue/server/__main__.py that imports and invokes main.py, for example:
# rogue/server/__main__.py
from .main import main
if __name__ == "__main__":
main()• Or, update the usage section in rogue/server/main.py (lines 6–13) to reference the actual module path:
- python -m rogue.server
+ python -m rogue.server.main🤖 Prompt for AI Agents
In rogue/server/main.py around lines 6 to 13, the docstring claims you can run
the server with "python -m rogue.server" but there is no
rogue/server/__main__.py entrypoint; either add a simple __main__.py that
imports and calls main() from main.py (so python -m rogue.server works), or
change the usage text to a correct command (for example "python -m
rogue.server.main" or "python rogue/server/main.py") to reflect the actual
entrypoint; implement one of these two fixes and update the docstring
accordingly.
| cli_input = CLIInputWithMockScenarios( | ||
| evaluated_agent_url=HttpUrl("https://example.com"), | ||
| judge_llm_model="example-model", | ||
| judge_llm="example-model", | ||
| business_context="example-context", | ||
| input_scenarios_file=input_scenarios_file_mock, | ||
| ) |
There was a problem hiding this comment.
Don’t instantiate HttpUrl directly; pass a string and let Pydantic parse.
HttpUrl("https://example.com") is invalid for Pydantic v2 and will raise. Provide a plain string; the model will validate/coerce it.
- cli_input = CLIInputWithMockScenarios(
- evaluated_agent_url=HttpUrl("https://example.com"),
+ cli_input = CLIInputWithMockScenarios(
+ evaluated_agent_url="https://example.com",
judge_llm="example-model",
business_context="example-context",
input_scenarios_file=input_scenarios_file_mock,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cli_input = CLIInputWithMockScenarios( | |
| evaluated_agent_url=HttpUrl("https://example.com"), | |
| judge_llm_model="example-model", | |
| judge_llm="example-model", | |
| business_context="example-context", | |
| input_scenarios_file=input_scenarios_file_mock, | |
| ) | |
| cli_input = CLIInputWithMockScenarios( | |
| evaluated_agent_url="https://example.com", | |
| judge_llm="example-model", | |
| business_context="example-context", | |
| input_scenarios_file=input_scenarios_file_mock, | |
| ) |
🤖 Prompt for AI Agents
In rogue/tests/models/test_cli_input.py around lines 81 to 86, the test
instantiates HttpUrl directly which is incompatible with Pydantic v2; instead
pass a plain string (e.g., "https://example.com") for evaluated_agent_url so
Pydantic can validate/parse it when constructing CLIInputWithMockScenarios.
| # Run async function in sync context | ||
| bot_message = asyncio.run(handle_interview_message()) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid asyncio.run() inside a running event loop; make the handler async.
Gradio supports async handlers. Using asyncio.run here is prone to "asyncio.run() cannot be called from a running event loop".
- # Run async function in sync context
- bot_message = asyncio.run(handle_interview_message())
+ # Run async in Gradio's event loop
+ bot_message = await handle_interview_message()Outside this hunk, change the callback signature to async so Gradio can await it:
# Change this:
def respond(message, history, state):
# To this:
async def respond(message, history, state):🤖 Prompt for AI Agents
In rogue/ui/components/interviewer.py around lines 70-71, the handler uses
asyncio.run(handle_interview_message()) which will fail if an event loop is
already running; change the callback to an async function (e.g., async def
respond(...)) and remove asyncio.run, instead await handle_interview_message()
directly; ensure any call sites and Gradio registration use the async handler
and that handle_interview_message and any downstream functions are awaitable
(add awaits or make them async as needed).
| # All types | ||
| from .types import * | ||
| from .websocket import RogueWebSocketClient | ||
|
|
||
| # Version | ||
| __version__ = "1.0.0" | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid wildcard import; export symbols explicitly and keep all in sync
Explicit imports make the public API clear and prevent namespace pollution. Also, all is missing Scenarios, EvaluationResults, and WebSocketEventType.
-# All types
-from .types import *
-from .websocket import RogueWebSocketClient
+# All types - explicit exports
+from .types import (
+ RogueClientConfig,
+ EvaluationRequest,
+ EvaluationResponse,
+ EvaluationJob,
+ EvaluationResult,
+ EvaluationResults,
+ AgentConfig,
+ Scenario,
+ Scenarios,
+ AuthType,
+ ScenarioType,
+ EvaluationStatus,
+ HealthResponse,
+ JobListResponse,
+ WebSocketMessage,
+ WebSocketEventType,
+)
+from .websocket import RogueWebSocketClient
@@
__all__ = [
"RogueSDK",
"RogueHttpClient",
"RogueWebSocketClient",
"RogueClientConfig",
"EvaluationRequest",
"EvaluationResponse",
"EvaluationJob",
"EvaluationResult",
+ "EvaluationResults",
"AgentConfig",
"Scenario",
+ "Scenarios",
"AuthType",
"ScenarioType",
"EvaluationStatus",
"HealthResponse",
"JobListResponse",
"WebSocketMessage",
+ "WebSocketEventType",
]Also applies to: 56-74
🤖 Prompt for AI Agents
In sdks/python/rogue_sdk/__init__.py around lines 49 to 55 (and similarly apply
to lines 56 to 74), remove the wildcard import from .types and explicitly import
the public symbols you intend to export (e.g., import Scenarios,
EvaluationResults, WebSocketEventType, and any other public types) and keep an
explicit __all__ list that includes those symbols plus RogueWebSocketClient;
update the top-level exports to only the named imports so the package API is
explicit and __all__ stays in sync with the exported names.
| response = await self._client.request(method, endpoint, **kwargs) | ||
| response.raise_for_status() | ||
| return response.json() | ||
|
|
There was a problem hiding this comment.
Handle 204/empty and non-JSON responses to avoid runtime errors.
response.json() will raise for 204 No Content or non-JSON content types (e.g., some DELETE endpoints). This can break cancel/end flows.
- response = await self._client.request(method, endpoint, **kwargs)
- response.raise_for_status()
- return response.json()
+ response = await self._client.request(method, endpoint, **kwargs)
+ response.raise_for_status()
+
+ # Gracefully handle no-content and non-JSON responses
+ content_type = (response.headers.get("content-type") or "").lower()
+ if response.status_code == 204 or not response.content:
+ return {}
+ if "application/json" in content_type:
+ return response.json()
+ # Fallback: return empty dict to keep downstream parsing safe
+ return {}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| response = await self._client.request(method, endpoint, **kwargs) | |
| response.raise_for_status() | |
| return response.json() | |
| response = await self._client.request(method, endpoint, **kwargs) | |
| response.raise_for_status() | |
| # Gracefully handle no-content and non-JSON responses | |
| content_type = (response.headers.get("content-type") or "").lower() | |
| if response.status_code == 204 or not response.content: | |
| return {} | |
| if "application/json" in content_type: | |
| return response.json() | |
| # Fallback: return empty dict to keep downstream parsing safe | |
| return {} |
🤖 Prompt for AI Agents
In sdks/python/rogue_sdk/client.py around lines 80 to 83, response.json() can
raise for 204 No Content or non-JSON responses; change the return logic to
safely handle those cases by: after response.raise_for_status(), check if
response.status_code == 204 or not response.content and return None (or an empty
value appropriate for callers), otherwise inspect
response.headers.get("Content-Type","") for "application/json" and attempt to
parse JSON inside a try/except (catch JSON decoding errors and return None or
response.text), so non-JSON or empty responses don’t raise runtime errors.
Summary by CodeRabbit
New Features
Changes
Documentation
Chores