Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds Qualifire API key support and job_id propagation to CLI reporting flow, updates CLI input model, exports new SDK summary types, bumps Python SDK version, adjusts CI install commands, and expands README documentation with detailed mode arguments, config, and examples. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant CLI as rogue run_cli
participant SDK as Python SDK
participant S as Rogue Server
participant Q as Qualifire Service
U->>CLI: Run CLI with options (may include --qualifire-api-key)
CLI->>SDK: run_scenarios(...)
SDK->>S: Execute scenarios
S-->>SDK: EvaluationResults + job_id?
SDK-->>CLI: Results, job_id (optional)
alt qualifire_api_key provided
CLI->>SDK: generate_summary(results, job_id, api_key)
SDK->>Q: Persist/report summary
Q-->>SDK: Summary/ACK
SDK-->>CLI: Summary created
else no API key
CLI-->>U: Warn: persistence will not occur
CLI->>SDK: generate_summary(results, job_id, api_key=None)
SDK-->>CLI: Local summary only
end
CLI-->>U: Output report path / summary info
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
rogue/models/cli_input.py (1)
12-12: Defaults captureworkdirat class definition time (won’t follow a customworkdir).
input_scenarios_file = workdir / ...andoutput_report_file = workdir / ...are bound once at import, not per-instance. If a caller setsworkdir, these paths won’t update.Apply this diff to make them instance-derived:
- input_scenarios_file: Path = workdir / "scenarios.json" - output_report_file: Path = workdir / "report.md" + input_scenarios_file: Path | None = None + output_report_file: Path | None = NoneAdd a per-instance initializer to
CLIInput(outside the selected lines):from typing import Any # add to imports def model_post_init(self, __context: Any) -> None: if self.input_scenarios_file is None: self.input_scenarios_file = self.workdir / "scenarios.json" if self.output_report_file is None: self.output_report_file = self.workdir / "report.md"Also applies to: 18-19
rogue/run_cli.py (1)
377-385: Passqualifire_api_key_secrettocreate_reportThe
create_reportfunction signature was modified to acceptqualifire_api_key_secret, but the call site doesn't pass this parameter. This means the Qualifire API key will always beNoneeven if the user provides it via CLI.Apply this diff to pass the
qualifire_api_key_secretparameter:logger.info("Creating report") report_summary = await create_report( rogue_server_url=args.rogue_server_url, judge_llm=cli_input.judge_llm, results=results, output_report_file=cli_input.output_report_file, judge_llm_api_key_secret=cli_input.judge_llm_api_key, + qualifire_api_key_secret=cli_input.qualifire_api_key, deep_test_mode=cli_input.deep_test_mode, judge_model=cli_input.judge_llm, )README.md (1)
362-370: Documentation lists non-existent OpenAI modelsThe README lists several OpenAI models that don't exist (gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, etc.). This could confuse users trying to configure the tool.
Update the model list to reflect actual available OpenAI models. The currently available models should be:
- gpt-4o
- gpt-4o-mini
- gpt-4-turbo
- gpt-3.5-turbo
Remove the non-existent model names to avoid user confusion.
🧹 Nitpick comments (5)
rogue/models/cli_input.py (2)
24-35: Harden file I/O and validation error handling.
read_text()and JSON validation can fail (permissions, encoding, invalid JSON). Wrap with try/except and return actionable errors.Apply this diff:
- return Scenarios.model_validate_json(self.input_scenarios_file.read_text()) + try: + content = self.input_scenarios_file.read_text(encoding="utf-8") + except OSError as e: + raise ValueError( + f"Failed to read input scenarios file: {self.input_scenarios_file}" + ) from e + try: + return Scenarios.model_validate_json(content) + except Exception as e: + raise ValueError( + f"Invalid or un-parseable JSON in scenarios file: {self.input_scenarios_file}" + ) from e
61-67: Type-hint cleanup: removetype: ignoreand annotatemodel_post_init.Keep mypy happy and align with guidelines.
Apply this diff:
- business_context_file: Path = None # type: ignore # fixed in model_post_init - input_scenarios_file: Path = None # type: ignore # fixed in model_post_init - output_report_file: Path = None # type: ignore # fixed in model_post_init + business_context_file: Path | None = None # fixed in model_post_init + input_scenarios_file: Path | None = None # fixed in model_post_init + output_report_file: Path | None = None # fixed in model_post_init @@ - def model_post_init(self, __context): + def model_post_init(self, __context: Any) -> None:Note: ensure
from typing import Anyis imported once at the top (module-level).rogue/run_cli.py (1)
209-213: Remove duplicatequalifire_api_keyextractionThe
qualifire_api_keyis extracted twice fromqualifire_api_key_secret: once at lines 209-213 and again at lines 223-227. The second extraction inside the try block is redundant since the value was already extracted before the try block.Apply this diff to remove the duplicate extraction:
async def create_report( rogue_server_url: str, judge_llm: str, results: EvaluationResults, output_report_file: Path, judge_llm_api_key_secret: SecretStr | None = None, qualifire_api_key_secret: SecretStr | None = None, deep_test_mode: bool = False, judge_model: str | None = None, ) -> str: judge_llm_api_key = ( judge_llm_api_key_secret.get_secret_value() if judge_llm_api_key_secret else None ) qualifire_api_key = ( qualifire_api_key_secret.get_secret_value() if qualifire_api_key_secret else None ) # Use SDK for summary generation (server-based) sdk_config = RogueClientConfig( base_url=rogue_server_url, timeout=600.0, ) sdk = RogueSDK(sdk_config) try: - qualifire_api_key = ( - qualifire_api_key_secret.get_secret_value() - if qualifire_api_key_secret - else None - ) summary, _ = await sdk.generate_summary( results=results, model=judge_llm, api_key=judge_llm_api_key, qualifire_api_key=qualifire_api_key, deep_test=deep_test_mode, judge_model=judge_model, )Also applies to: 223-227
README.md (2)
3-3: Add alt text to the tracking pixel imageThe image should have alternate text for accessibility, even if it's a tracking pixel.
Apply this diff:
- +
137-137: Wrap bare URL in angle bracketsBare URLs should be wrapped in angle brackets for better Markdown compatibility.
Apply this diff:
-- `--rogue-server-url URL` - Rogue server URL (default: http://localhost:8000) +- `--rogue-server-url URL` - Rogue server URL (default: <http://localhost:8000>)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README.md(4 hunks)rogue/models/cli_input.py(1 hunks)rogue/run_cli.py(4 hunks)
🔥 Files not summarized due to errors (1)
- rogue/models/cli_input.py: Error: Server error: no LLM provider could handle the message
🧰 Additional context used
📓 Path-based instructions (2)
{rogue,tests,examples}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
{rogue,tests,examples}/**/*.py: Format Python code with Black
Organize imports according to isort conventions
Use type hints for all function signatures
Follow PEP 8 naming (snake_case for variables and functions, PascalCase for classes)
Use try/except blocks for code that may raise exceptions
Ensure code passes flake8 linting
Ensure code passes mypy type checks per .mypy.ini
Address Bandit-reported security issues in Python code
Files:
rogue/models/cli_input.pyrogue/run_cli.py
rogue/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Place application/package code under the rogue/ directory
Files:
rogue/models/cli_input.pyrogue/run_cli.py
🪛 markdownlint-cli2 (0.17.2)
README.md
3-3: Images should have alternate text (alt text)
(MD045, no-alt-text)
137-137: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Cursor Bugbot
- GitHub Check: codestyle
| # Set defaults for required fields that are missing | ||
| if data.get("judge_llm") is None: | ||
| data["judge_llm"] = "openai/o4-mini" | ||
| data["judge_llm"] = "openai/gpt-5" |
There was a problem hiding this comment.
🧩 Analysis chain
Verify that the openai/gpt-5 model exists
The default judge LLM model has been changed to openai/gpt-5. This model doesn't appear to be a valid OpenAI model name (as of my knowledge cutoff). This could cause runtime failures for users who don't explicitly specify a judge LLM model.
Consider using a valid model like openai/gpt-4o or openai/gpt-4o-mini as the default:
- data["judge_llm"] = "openai/gpt-5"
+ data["judge_llm"] = "openai/gpt-4o-mini"🌐 Web query:
What are the current valid OpenAI GPT model names available through their API?
💡 Result:
Short answer — common GPT models currently offered in the OpenAI API (families and typical variants):
- GPT-4 family: gpt-4, gpt-4-turbo. [1]
- GPT-4o family: gpt-4o, gpt-4o-mini (and realtime/voice/vision variants where available). [1][2]
- GPT-4.1 family: gpt-4.1, gpt-4.1-mini (newer 4.1 releases). [4]
- GPT-3.5 family: gpt-3.5-turbo and its context-window variants (e.g., 16k/4k variants). [1]
Notes: OpenAI frequently adds/removes model snapshots and introduces variant names (mini, turbo, realtime, nano, etc.), so check the official API models list or the OpenAI docs/models endpoint for the authoritative, up‑to‑date names before integrating. [1][4]
Sources:
[1] OpenAI Help Center — How can I access GPT-4, GPT-4 Turbo, GPT-4o, and GPT-4o mini?
[2] OpenAI blog — GPT‑4o mini announcement.
[4] OpenAI Help Center / Model release notes — GPT‑4.1 & related releases.
Replace invalid default judge LLM "openai/gpt-5" with a supported OpenAI model
openai/gpt-5 is not a recognized OpenAI model and will cause runtime failures for users who don't override the default.
File: rogue/run_cli.py Lines: 290-290
- data["judge_llm"] = "openai/gpt-5"
+ data["judge_llm"] = "openai/gpt-4o-mini"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| data["judge_llm"] = "openai/gpt-5" | |
| data["judge_llm"] = "openai/gpt-4o-mini" |
🤖 Prompt for AI Agents
In rogue/run_cli.py around line 290, the default judge LLM is set to the invalid
model string "openai/gpt-5"; replace it with a supported OpenAI model identifier
(for example "openai/gpt-4") so the default will work at runtime, and ensure any
related tests or documentation that reference the old default are updated
accordingly.
| if cli_input.qualifire_api_key is None: | ||
| console.print( | ||
| """[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """, # noqa: E501 | ||
| ) | ||
| return 1 | ||
|
|
There was a problem hiding this comment.
Consider making the exit code configurable or warning-only
Returning exit code 1 when the Qualifire API key is missing might break existing CI/CD pipelines where users don't require report persistence. While the warning is helpful, forcing a non-zero exit code for a non-critical feature might be disruptive.
Consider one of these alternatives:
- Keep the warning but return 0 (success) if the evaluation itself passed
- Add a CLI flag to control whether missing Qualifire key should fail the process
- Only fail if explicitly required via configuration
Example of option 1:
if cli_input.qualifire_api_key is None:
console.print(
"""[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """, # noqa: E501
)
- return 1
return get_exit_code(results)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if cli_input.qualifire_api_key is None: | |
| console.print( | |
| """[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """, # noqa: E501 | |
| ) | |
| return 1 | |
| if cli_input.qualifire_api_key is None: | |
| console.print( | |
| """[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """, # noqa: E501 | |
| ) |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
rogue/run_cli.py (4)
34-39: Fix argparse Enum type/choices mismatch (runtime parse bug)
type=AuthTypereturns an Enum butchoices=[e.value for e in AuthType]are strings, so valid inputs will be rejected.- parser.add_argument( - "--evaluated-agent-auth-type", - required=False, - type=AuthType, - choices=[e.value for e in AuthType], + parser.add_argument( + "--evaluated-agent-auth-type", + required=False, + type=str, + choices=[e.value for e in AuthType], help="How to authenticate with the evaluated agent (if needed)." f"Valid options are: {[e.value for e in AuthType]}", )
17-28: Add missing --workdir flag (prevents AttributeError later)
get_cli_inputandrun_cliuseargs.workdirbut the parser doesn’t define it.def set_cli_args(parser: ArgumentParser) -> None: + parser.add_argument( + "--workdir", + type=Path, + default=Path(".") / ".rogue", + help="Working directory for CLI artifacts (scenarios.json, report.md, etc.)", + ) parser.add_argument( "--config-file", type=Path, help="Path to config file", )
347-360: Use str() for HttpUrl rather than .encoded_string()
HttpUrl/Url in Pydantic doesn’t guarantee anencoded_string()method; this will raise at runtime. Usestr(url).- get_agent_card(cli_input.evaluated_agent_url.encoded_string()) + get_agent_card(str(cli_input.evaluated_agent_url)) - evaluated_agent_url=cli_input.evaluated_agent_url.encoded_string(), + evaluated_agent_url=str(cli_input.evaluated_agent_url),
343-345: Avoid logging secrets in debug output
model_dump()may serialize secret fields; don’t include API keys/credentials in logs.- logger.debug("Running CLI", extra=cli_input.model_dump()) + logger.debug( + "Running CLI", + extra=cli_input.model_dump(exclude={"judge_llm_api_key", "evaluated_agent_credentials", "qualifire_api_key"}), + )
♻️ Duplicate comments (2)
rogue/models/cli_input.py (2)
22-22: Resolved: Qualifire API key now accepted in CLIInputAdding
qualifire_api_keyhere unblocks CLI usage. Looks good.
65-65: Resolved: Mirrored field in PartialCLIInputIncluding
qualifire_api_keyin the partial model fixes prior validation/merge issues.
🧹 Nitpick comments (5)
rogue/models/cli_input.py (2)
61-66: Tighten typing for file-path optionals (drop type: ignore)Use explicit optional types to satisfy mypy without ignores.
- business_context_file: Path = None # type: ignore # fixed in model_post_init - input_scenarios_file: Path = None # type: ignore # fixed in model_post_init - output_report_file: Path = None # type: ignore # fixed in model_post_init + business_context_file: Path | None = None # set in model_post_init + input_scenarios_file: Path | None = None # set in model_post_init + output_report_file: Path | None = None # set in model_post_init
67-67: Annotate model_post_init signatureAdd the missing type for __context to satisfy type-checkers.
- def model_post_init(self, __context): + def model_post_init(self, __context: object) -> None:rogue/run_cli.py (3)
3-3: Prefer built-in generics over typing.TupleModern Python supports
tuple[...]. Drop the import and update hints.-from typing import Tuple ... -) -> Tuple[EvaluationResults | None, str | None]: +) -> tuple[EvaluationResults | None, str | None]: ... -) -> Tuple[EvaluationResults | None, str | None]: +) -> tuple[EvaluationResults | None, str | None]:Also applies to: 108-108, 146-146
234-236: Preserve traceback when re-raisingUse bare
raiseafter logging to keep the original traceback.- except Exception as e: + except Exception: logger.exception("Failed to generate summary") - raise e + raise
392-396: Fix grammar in user-facing warningTweak the message.
- """[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """, + """[red]Qualifire API key is not set. This report will not persist. It is highly recommended to set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
rogue/models/cli_input.py(2 hunks)rogue/run_cli.py(10 hunks)rogue/server/api/llm.py(0 hunks)
💤 Files with no reviewable changes (1)
- rogue/server/api/llm.py
🧰 Additional context used
📓 Path-based instructions (2)
{rogue,tests,examples}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
{rogue,tests,examples}/**/*.py: Format Python code with Black
Organize imports according to isort conventions
Use type hints for all function signatures
Follow PEP 8 naming (snake_case for variables and functions, PascalCase for classes)
Use try/except blocks for code that may raise exceptions
Ensure code passes flake8 linting
Ensure code passes mypy type checks per .mypy.ini
Address Bandit-reported security issues in Python code
Files:
rogue/models/cli_input.pyrogue/run_cli.py
rogue/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Place application/package code under the rogue/ directory
Files:
rogue/models/cli_input.pyrogue/run_cli.py
🧬 Code graph analysis (1)
rogue/run_cli.py (2)
sdks/python/rogue_sdk/types.py (1)
EvaluationResults(192-208)rogue/server/services/qualifire_service.py (1)
report_summary(10-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Cursor Bugbot
- GitHub Check: codestyle
🔇 Additional comments (2)
rogue/run_cli.py (2)
371-385: Plumbing looks correct: key + job_id threaded to summary generationPassing
qualifire_api_key_secretandjob_idintocreate_reportand down to the SDK is correct.Please confirm the SDK’s
generate_summarysignature matches(results, model, api_key, qualifire_api_key, deep_test, judge_model, job_id)in the version pinned by this repo.
160-185: SDK eval flow and results write-out LGTMHealth check, job run, wait, serialize results, and return
(results, job_id)look good.
| # Set defaults for required fields that are missing | ||
| if data.get("judge_llm") is None: | ||
| data["judge_llm"] = "openai/o4-mini" | ||
| data["judge_llm"] = "openai/gpt-5" | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
Validate default judge LLM model name
"openai/gpt-5" is likely invalid and will break defaults for users. Please verify against official docs and choose a supported default (e.g., "openai/gpt-4o-mini").
- if data.get("judge_llm") is None:
- data["judge_llm"] = "openai/gpt-5"
+ if data.get("judge_llm") is None:
+ data["judge_llm"] = "openai/gpt-4o-mini"To confirm the current valid model names:
🌐 Web query:
What OpenAI GPT model IDs are currently valid for the API, and is "openai/gpt-5" available?
💡 Result:
Short answer: Yes — GPT‑5 is available in the API, but the model ID is gpt-5 (and related IDs), not openai/gpt-5.
Key valid OpenAI GPT model IDs (examples from OpenAI docs/announcements):
- GPT‑5 family: gpt-5, gpt-5-mini, gpt-5-nano (API release described by OpenAI). [1][2]
- GPT‑4.1 family: gpt-4.1, gpt-4.1-mini, gpt-4.1-nano (API). [3]
- GPT‑4o / GPT‑4 families and earlier: gpt-4o, gpt-4, gpt-4o-mini, gpt-3.5-turbo, etc., remain in use per OpenAI docs/notes. [3][4]
Notes:
- OpenAI publishes exact, canonical model IDs in their product pages and release notes — use those (e.g., "gpt-5"), not a prefixed ID like "openai/gpt-5". [1][3]
- For the most up-to-date complete list and any availability/region or rate-limit details, check OpenAI’s model release notes and the GPT‑5 announcement pages. [3][4]
Sources:
[1] OpenAI — Introducing GPT‑5 for developers (API release details).
[2] OpenAI — GPT‑5 System Card / Model notes.
[3] OpenAI — Introducing GPT‑4.1 in the API (model IDs/pricing).
[4] OpenAI — Model release notes / Help Center.
Replace invalid default model ID "openai/gpt-5" with canonical "gpt-5".
rogue/run_cli.py (lines 286–289): data["judge_llm"] currently defaults to "openai/gpt-5" — change it to "gpt-5" because OpenAI model IDs use the canonical form (no "openai/" prefix).
🤖 Prompt for AI Agents
In rogue/run_cli.py around lines 286 to 289, the default judge_llm value is set
to the invalid model ID "openai/gpt-5"; update the assignment so the default
uses the canonical OpenAI model identifier "gpt-5" instead (replace
"openai/gpt-5" with "gpt-5") to ensure the configured default is valid.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
rogue/run_cli.py (5)
3-3: Prefer builtin generics; drop unused Tuple import.Use
tuple[...](PEP 585) and removefrom typing import Tuple.Apply:
-from typing import TupleThen update the two annotations at Lines 108 and 146 (see comments there).
78-84: CLI flag addition looks good; clarify help text.Minor copy tweak to explicitly name the env var.
- help="API key to use when reporting summary to Qualifire. " - "Can be left unset if env is used.", + help="API key used when reporting summary to Qualifire. " + "Can be left unset when using the QUALIFIRE_API_KEY environment variable.",
146-146: Same: use builtintuple[...].-) -> Tuple[EvaluationResults | None, str | None]: +) -> tuple[EvaluationResults | None, str | None]:
392-396: Good: no longer forcing non‑zero exit without key; fix warning grammar.Message has a grammatical error; tighten copy.
- """[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """, # noqa: E501 + """[red]Qualifire API key is not set. This report will not persist. It is highly recommended to set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red]""", # noqa: E501
108-108: Return type hint: use builtin tuple[...]Modernize typing per PEP 585 — callers unpack two values (results, job_id) at rogue/run_cli.py:357, so change is safe.
-) -> Tuple[EvaluationResults | None, str | None]: +) -> tuple[EvaluationResults | None, str | None]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
rogue/models/cli_input.py(1 hunks)rogue/run_cli.py(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- rogue/models/cli_input.py
🧰 Additional context used
📓 Path-based instructions (2)
{rogue,tests,examples}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
{rogue,tests,examples}/**/*.py: Format Python code with Black
Organize imports according to isort conventions
Use type hints for all function signatures
Follow PEP 8 naming (snake_case for variables and functions, PascalCase for classes)
Use try/except blocks for code that may raise exceptions
Ensure code passes flake8 linting
Ensure code passes mypy type checks per .mypy.ini
Address Bandit-reported security issues in Python code
Files:
rogue/run_cli.py
rogue/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Place application/package code under the rogue/ directory
Files:
rogue/run_cli.py
🧬 Code graph analysis (1)
rogue/run_cli.py (2)
sdks/python/rogue_sdk/types.py (1)
EvaluationResults(192-208)rogue/server/services/qualifire_service.py (1)
report_summary(10-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Cursor Bugbot
- GitHub Check: codestyle
🔇 Additional comments (2)
rogue/run_cli.py (2)
211-216: Resolved prior duplication; single derivation of Qualifire key.This addresses the earlier “redundant API key assignment” report. Good.
384-385: API key now plumbed into create_report — fixes prior bug.This resolves the “API key not passed to report function” issue.
| # Set defaults for required fields that are missing | ||
| if data.get("judge_llm") is None: | ||
| data["judge_llm"] = "openai/o4-mini" | ||
| data["judge_llm"] = "openai/gpt-5" |
There was a problem hiding this comment.
🧩 Analysis chain
Default judge model likely invalid; avoid “openai/gpt-5” as a default.
High risk of runtime failure if this ID isn’t recognized by your router/SDK. Prefer a widely available default (e.g., openai/gpt-4o-mini) or validate against the server’s supported models.
- if data.get("judge_llm") is None:
- data["judge_llm"] = "openai/gpt-5"
+ if data.get("judge_llm") is None:
+ data["judge_llm"] = "openai/gpt-4o-mini"🌐 Web query:
What are the current, valid OpenAI Chat/Omni model IDs for API usage in September 2025, and is "openai/gpt-5" a supported identifier? Please cite official OpenAI docs.
💡 Result:
Short answer
-
Current official API model IDs for the GPT‑5 family (Sept 2025): gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat-latest, and related “thinking” variants (e.g., gpt-5-thinking, gpt-5-thinking-nano / GPT‑5 Pro variants per OpenAI docs). [1][2][3]
-
"openai/gpt-5" is not the documented/supported model identifier in OpenAI’s official docs; use the IDs above (e.g., "gpt-5" or "gpt-5-chat-latest") as shown in OpenAI’s API / release pages. [1][2]
Sources
- OpenAI — “Introducing GPT‑5 for developers” (model names & API availability). [link].
- OpenAI — “GPT‑5 System Card” (model labels including gpt-5-thinking, gpt-5-thinking-nano). [link].
- OpenAI Help Center — “API Model Availability by Usage Tier and Verification Status” / Model Release Notes (lists gpt-5, gpt-5-mini and access). [link].
Do not default judge_llm to "openai/gpt-5"; use a documented model ID or validate at runtime.
rogue/run_cli.py lines 286–288 — remove the default "openai/gpt-5" (not a documented API model ID). Set a documented ID (e.g., "gpt-5-chat-latest" or "gpt-5") or add startup validation/fallback to a server-supported model.
🤖 Prompt for AI Agents
In rogue/run_cli.py around lines 286 to 288, the code sets a hard default
judge_llm to the undocumented "openai/gpt-5"; remove that literal default and
either set a documented model ID (for example "gpt-5-chat-latest" or a
project-specific approved ID) or implement runtime validation with a fallback:
check the configured judge_llm and if absent/invalid call a validator (or query
the server-supported models) and set a safe documented default or raise a clear
startup error; update the message/error to guide users to choose a supported
model.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
sdks/python/rogue_sdk/__init__.py (1)
76-81: Exports resolve; optional: derive all from types to avoid driftConfirmed: sdks/python/rogue_sdk/types.py defines StructuredSummary, SummaryGenerationRequest, SummaryGenerationResponse, ReportSummaryRequest, ReportSummaryResponse and sdks/python/rogue_sdk/init.py imports them via
from .types import *and lists them in all. Optional: replace the static all with the suggested pattern (keep explicit base_exports, then extend with names from types.all) to prevent future drift.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
sdks/python/rogue_sdk/__init__.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (1)
sdks/python/rogue_sdk/__init__.py (1)
51-52: Exposetypessubmodule at package root — LGTMThis enables
rogue_sdk.typesfor namespace usage and IDE discovery. No functional concerns.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
.github/workflows/rogue.yml (5)
29-31: Avoid shell activation; target the venv explicitly with uv for reliability.Shell activation is unnecessary and can be brittle in CI. Point uv at the venv’s python instead.
- run: source .venv/bin/activate && uv pip install -e sdks/python --force-reinstall + run: uv pip --python .venv/bin/python install -e sdks/python --force-reinstall
44-55: Second EXIT trap overrides the first; only one process gets cleaned up.Define a single EXIT trap that handles both PIDs to prevent orphaned processes and reduce flakiness.
- trap 'echo "🛑 Stopping agent..."; kill $AGENT_PID' EXIT + trap 'echo "🛑 Stopping processes..."; [[ -n "${AGENT_PID:-}" ]] && kill "$AGENT_PID"; [[ -n "${ROGUE_PID:-}" ]] && kill "$ROGUE_PID"' EXIT @@ - trap 'echo "🛑 Stopping rogue server..."; kill $ROGUE_PID' EXIT + # (single EXIT trap above handles both AGENT_PID and ROGUE_PID)
35-66: Gate secret-dependent steps for forked PRs to avoid failures.OPENAI_API_KEY isn’t available on forked PRs; the agent/server run will fail. Skip this section for forks.
- - name: Run rogue + - name: Run rogue + if: ${{ github.event_name != 'pull_request' || github.event.pull_request.head.repo.fork == false }} shell: bash env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}If you prefer to skip the entire job for forks, put the same
if:at the job level.
1-9: Optional: add a concurrency group to cancel in-progress runs on new commits.Prevents queued/duplicate “Rogue” jobs from stepping on each other.
name: Rogue + +concurrency: + group: rogue-${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true
60-60: Nit: extraneous blank line.Harmless, but you can drop it to keep the script tight.
.github/workflows/test.yml (2)
36-38: Avoid shell activation; point uv at the venv directly.Keeps steps shell-agnostic and a bit faster.
- run: source .venv/bin/activate && uv pip install -e sdks/python --force-reinstall + run: uv pip --python .venv/bin/python install -e sdks/python --force-reinstall
10-21: Optional: add a concurrency group for matrix runs.Cancels older in-flight matrices on new pushes to the same ref.
jobs: test: runs-on: ubuntu-latest + concurrency: + group: test-${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/rogue.yml(2 hunks).github/workflows/test.yml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
.github/**
📄 CodeRabbit inference engine (AGENTS.md)
Keep CI/CD workflow files under .github/
Files:
.github/workflows/test.yml.github/workflows/rogue.yml
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/actions/run-tests/action.yml (2)
14-14: Make test runner resilient to missing/cross‑platform venvs.Sourcing
.venv/bin/activateassumes a POSIX venv exists. The one‑liner below picks the venv Python on Linux/macOS or Windows, and falls back to system Python.Apply this diff (also includes the coverage fix above):
- run: source .venv/bin/activate && pytest --junitxml=test-results.xml --cov=rogue --cov-report=xml + run: V=python; if [ -x .venv/bin/python ]; then V=.venv/bin/python; elif [ -x .venv/Scripts/python.exe ]; then V=.venv/Scripts/python.exe; fi; "$V" -m pytest --junitxml=test-results.xml --cov=. --cov-report=xml
16-26: Always upload reports even on test failures.Add
if: ${{ always() }}so artifacts are uploaded when pytest fails.Suggested snippet:
- name: "Upload Test Results" if: ${{ always() }} uses: actions/upload-artifact@v4 with: name: junit-results-${{ inputs.python-version }} path: test-results.xml - name: "Upload Coverage Results" if: ${{ always() }} uses: actions/upload-artifact@v4 with: name: coverage-report-${{ inputs.python-version }} path: coverage.xml
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/actions/run-tests/action.yml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
.github/**
📄 CodeRabbit inference engine (AGENTS.md)
Keep CI/CD workflow files under .github/
Files:
.github/actions/run-tests/action.yml
🔇 Additional comments (1)
.github/actions/run-tests/action.yml (1)
14-14: Confirmed — pytest and pytest-cov are installed upstream; no change required.
.github/workflows/test.yml creates a venv and runsuv pip install -e .before calling./.github/actions/run-tests; pyproject.toml's dev group includespytestandpytest-cov(uv.lock containspytest-cov).
.github/actions/run-tests/action.yml
Outdated
| - name: "Run tests" | ||
| shell: bash | ||
| run: source .venv/bin/activate && uv run --dev pytest --junitxml=test-results.xml --cov=rogue --cov-report=xml | ||
| run: source .venv/bin/activate && pytest --junitxml=test-results.xml --cov=rogue --cov-report=xml |
There was a problem hiding this comment.
Decouple coverage target from package rename (use project-wide coverage).
Hardcoding --cov=rogue will break or silently zero out coverage after the rename. Run coverage on the repo instead.
Apply this diff:
- run: source .venv/bin/activate && pytest --junitxml=test-results.xml --cov=rogue --cov-report=xml
+ run: V=python; if [ -x .venv/bin/python ]; then V=.venv/bin/python; elif [ -x .venv/Scripts/python.exe ]; then V=.venv/Scripts/python.exe; fi; "$V" -m pytest --junitxml=test-results.xml --cov=. --cov-report=xml📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| run: source .venv/bin/activate && pytest --junitxml=test-results.xml --cov=rogue --cov-report=xml | |
| run: V=python; if [ -x .venv/bin/python ]; then V=.venv/bin/python; elif [ -x .venv/Scripts/python.exe ]; then V=.venv/Scripts/python.exe; fi; "$V" -m pytest --junitxml=test-results.xml --cov=. --cov-report=xml |
🤖 Prompt for AI Agents
In .github/actions/run-tests/action.yml around line 14, the pytest invocation
hardcodes --cov=rogue which will break after a package rename; change the
coverage target to run over the whole repository (for example replace
--cov=rogue with --cov=. or remove the --cov option and use --cov=. so coverage
is collected project-wide) so coverage metrics don't depend on a specific
package name.
Summary by CodeRabbit
New Features
Documentation
Chores