feature/cli report results FIRE 778 by drorIvry · Pull Request #47 · qualifire-dev/rogue

drorIvry · 2025-09-18T15:32:34Z

rename rogue
wip

Summary by CodeRabbit

New Features
- CLI now supports a Qualifire API key option, with a warning shown if omitted to indicate results won’t be persisted.
- Default judge model updated to openai/gpt-5.
- Python SDK exposes additional summary-related types for easier integration.
Documentation
- Expanded README with hero image, status badge, reorganized Quick Start, detailed mode arguments and usage examples, config file guidance, and a comprehensive example workflow.
Chores
- Python SDK version bumped to 0.1.3.
- CI updated to force-reinstall SDK and streamline installation steps.

coderabbitai · 2025-09-18T15:32:49Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds Qualifire API key support and job_id propagation to CLI reporting flow, updates CLI input model, exports new SDK summary types, bumps Python SDK version, adjusts CI install commands, and expands README documentation with detailed mode arguments, config, and examples.

Changes

Cohort / File(s)	Summary
Documentation `README.md`	Adds header image and tests badge; expands Quick Start; introduces detailed Mode Arguments with code blocks; documents new CLI flags; adds Config File section and Notes; enriches Examples and CLI workflows; formatting tweaks.
CLI pipeline updates `rogue/run_cli.py`	Adds `--qualifire-api-key`; threads `qualifire_api_key_secret` and optional `job_id` through `run_scenarios`, `_run_scenarios_with_sdk`, and `create_report`; updates return types to `Tuple[EvaluationResults \| None, str \| None]`; changes default judge model to `openai/gpt-5`; warns when API key missing; refactors summary generation to use early-derived key.
CLI input model `rogue/models/cli_input.py`	Adds optional field `qualifire_api_key: SecretStr \| None` to `PartialCLIInput`.
Python SDK public exports `sdks/python/rogue_sdk/__init__.py`	Exposes new summary-related symbols in `__all__`: `StructuredSummary`, `SummaryGenerationRequest`, `SummaryGenerationResponse`, `ReportSummaryResponse`, `ReportSummaryRequest`.
Python SDK packaging `sdks/python/pyproject.toml`	Bumps version from `0.1.0` to `0.1.3`.
CI workflows `.github/workflows/rogue.yml`, `.github/workflows/test.yml`	Ensures SDK install uses `--force-reinstall`; updates server install to `uv sync --dev`; whitespace tweak before run step.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant CLI as rogue run_cli
  participant SDK as Python SDK
  participant S as Rogue Server
  participant Q as Qualifire Service

  U->>CLI: Run CLI with options (may include --qualifire-api-key)
  CLI->>SDK: run_scenarios(...)
  SDK->>S: Execute scenarios
  S-->>SDK: EvaluationResults + job_id?
  SDK-->>CLI: Results, job_id (optional)

  alt qualifire_api_key provided
    CLI->>SDK: generate_summary(results, job_id, api_key)
    SDK->>Q: Persist/report summary
    Q-->>SDK: Summary/ACK
    SDK-->>CLI: Summary created
  else no API key
    CLI-->>U: Warn: persistence will not occur
    CLI->>SDK: generate_summary(results, job_id, api_key=None)
    SDK-->>CLI: Local summary only
  end

  CLI-->>U: Output report path / summary info

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feature/FIRE 779 get the api key from env #46: Similar CLI changes adding Qualifire API key and job_id threading through report creation.
FIRE-751 - report to BFF #37: Implements Qualifire reporting/structured summary integration across CLI and SDK types.
FIRE-665 | CICD | Fix rogue CI/CD + Documentation #23: Updates rogue GitHub Actions workflow steps related to SDK/server installation.

Suggested reviewers

yuval-qf
osher-qualifire

Poem

I thump my paws on terminal ground,
A key of Qualifire now newly found.
Job IDs hop along the trail,
Reports persist without a fail.
Docs grown tall like carrot rows—
CI burrows cleanly flows.
Happy hops—the summary glows! 🥕🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "feature/cli report results FIRE 778" directly references CLI report results and matches the PR’s primary changes around CLI reporting and related flags, so it is relevant and informative to reviewers; it does include branch-prefix noise ("feature/") and a ticket identifier ("FIRE 778") which are acceptable but slightly reduce clarity for changelog/history reading.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/cli-report-results-FIRE-778

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 054dbd9 and f03c277.

📒 Files selected for processing (1)

.github/workflows/test.yml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/test.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Cursor Bugbot
GitHub Check: codestyle

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

rogue/models/cli_input.py (1)
12-12: Defaults capture workdir at class definition time (won’t follow a custom workdir).

input_scenarios_file = workdir / ... and output_report_file = workdir / ... are bound once at import, not per-instance. If a caller sets workdir, these paths won’t update.

Apply this diff to make them instance-derived:
-    input_scenarios_file: Path = workdir / "scenarios.json"
-    output_report_file: Path = workdir / "report.md"
+    input_scenarios_file: Path | None = None
+    output_report_file: Path | None = None
Add a per-instance initializer to CLIInput (outside the selected lines):
from typing import Any  # add to imports

def model_post_init(self, __context: Any) -> None:
    if self.input_scenarios_file is None:
        self.input_scenarios_file = self.workdir / "scenarios.json"
    if self.output_report_file is None:
        self.output_report_file = self.workdir / "report.md"
Also applies to: 18-19
rogue/run_cli.py (1)
377-385: Pass qualifire_api_key_secret to create_report

The create_report function signature was modified to accept qualifire_api_key_secret, but the call site doesn't pass this parameter. This means the Qualifire API key will always be None even if the user provides it via CLI.

Apply this diff to pass the qualifire_api_key_secret parameter:
     logger.info("Creating report")
     report_summary = await create_report(
         rogue_server_url=args.rogue_server_url,
         judge_llm=cli_input.judge_llm,
         results=results,
         output_report_file=cli_input.output_report_file,
         judge_llm_api_key_secret=cli_input.judge_llm_api_key,
+        qualifire_api_key_secret=cli_input.qualifire_api_key,
         deep_test_mode=cli_input.deep_test_mode,
         judge_model=cli_input.judge_llm,
     )
README.md (1)

362-370: Documentation lists non-existent OpenAI models

The README lists several OpenAI models that don't exist (gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, etc.). This could confuse users trying to configure the tool.

Update the model list to reflect actual available OpenAI models. The currently available models should be:

gpt-4o

gpt-4o-mini

gpt-4-turbo

gpt-3.5-turbo

Remove the non-existent model names to avoid user confusion.

🧹 Nitpick comments (5)

rogue/models/cli_input.py (2)

24-35: Harden file I/O and validation error handling.

read_text() and JSON validation can fail (permissions, encoding, invalid JSON). Wrap with try/except and return actionable errors.

Apply this diff:

-        return Scenarios.model_validate_json(self.input_scenarios_file.read_text())
+        try:
+            content = self.input_scenarios_file.read_text(encoding="utf-8")
+        except OSError as e:
+            raise ValueError(
+                f"Failed to read input scenarios file: {self.input_scenarios_file}"
+            ) from e
+        try:
+            return Scenarios.model_validate_json(content)
+        except Exception as e:
+            raise ValueError(
+                f"Invalid or un-parseable JSON in scenarios file: {self.input_scenarios_file}"
+            ) from e

61-67: Type-hint cleanup: remove type: ignore and annotate model_post_init.

Keep mypy happy and align with guidelines.

Apply this diff:

-    business_context_file: Path = None  # type: ignore # fixed in model_post_init
-    input_scenarios_file: Path = None  # type: ignore # fixed in model_post_init
-    output_report_file: Path = None  # type: ignore # fixed in model_post_init
+    business_context_file: Path | None = None  # fixed in model_post_init
+    input_scenarios_file: Path | None = None  # fixed in model_post_init
+    output_report_file: Path | None = None  # fixed in model_post_init
@@
-    def model_post_init(self, __context):
+    def model_post_init(self, __context: Any) -> None:

Note: ensure from typing import Any is imported once at the top (module-level).

rogue/run_cli.py (1)

209-213: Remove duplicate qualifire_api_key extraction

The qualifire_api_key is extracted twice from qualifire_api_key_secret: once at lines 209-213 and again at lines 223-227. The second extraction inside the try block is redundant since the value was already extracted before the try block.

Apply this diff to remove the duplicate extraction:

 async def create_report(
     rogue_server_url: str,
     judge_llm: str,
     results: EvaluationResults,
     output_report_file: Path,
     judge_llm_api_key_secret: SecretStr | None = None,
     qualifire_api_key_secret: SecretStr | None = None,
     deep_test_mode: bool = False,
     judge_model: str | None = None,
 ) -> str:
     judge_llm_api_key = (
         judge_llm_api_key_secret.get_secret_value()
         if judge_llm_api_key_secret
         else None
     )
 
     qualifire_api_key = (
         qualifire_api_key_secret.get_secret_value()
         if qualifire_api_key_secret
         else None
     )
 
     # Use SDK for summary generation (server-based)
     sdk_config = RogueClientConfig(
         base_url=rogue_server_url,
         timeout=600.0,
     )
     sdk = RogueSDK(sdk_config)
 
     try:
-        qualifire_api_key = (
-            qualifire_api_key_secret.get_secret_value()
-            if qualifire_api_key_secret
-            else None
-        )
         summary, _ = await sdk.generate_summary(
             results=results,
             model=judge_llm,
             api_key=judge_llm_api_key,
             qualifire_api_key=qualifire_api_key,
             deep_test=deep_test_mode,
             judge_model=judge_model,
         )

Also applies to: 223-227

README.md (2)

3-3: Add alt text to the tracking pixel image

The image should have alternate text for accessibility, even if it's a tracking pixel.

Apply this diff:
-![](https://pixel.qualifire.ai/api/record/rogue)
+![Qualifire Analytics](https://pixel.qualifire.ai/api/record/rogue)
137-137: Wrap bare URL in angle brackets

Bare URLs should be wrapped in angle brackets for better Markdown compatibility.

Apply this diff:
-- `--rogue-server-url URL` - Rogue server URL (default: http://localhost:8000)
+- `--rogue-server-url URL` - Rogue server URL (default: <http://localhost:8000>)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc4beb9 and 0d34ed0.

📒 Files selected for processing (3)

README.md (4 hunks)
rogue/models/cli_input.py (1 hunks)
rogue/run_cli.py (4 hunks)

🔥 Files not summarized due to errors (1)

rogue/models/cli_input.py: Error: Server error: no LLM provider could handle the message

🧰 Additional context used

📓 Path-based instructions (2)

{rogue,tests,examples}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

{rogue,tests,examples}/**/*.py: Format Python code with Black
Organize imports according to isort conventions
Use type hints for all function signatures
Follow PEP 8 naming (snake_case for variables and functions, PascalCase for classes)
Use try/except blocks for code that may raise exceptions
Ensure code passes flake8 linting
Ensure code passes mypy type checks per .mypy.ini
Address Bandit-reported security issues in Python code

Files:

rogue/models/cli_input.py
rogue/run_cli.py

rogue/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Place application/package code under the rogue/ directory

Files:

rogue/models/cli_input.py
rogue/run_cli.py

🪛 markdownlint-cli2 (0.17.2)

README.md

3-3: Images should have alternate text (alt text)

(MD045, no-alt-text)

137-137: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Cursor Bugbot
GitHub Check: codestyle

rogue/models/cli_input.py

coderabbitai · 2025-09-18T15:44:34Z

rogue/run_cli.py

    # Set defaults for required fields that are missing
    if data.get("judge_llm") is None:
-        data["judge_llm"] = "openai/o4-mini"
+        data["judge_llm"] = "openai/gpt-5"


⚠️ Potential issue

🧩 Analysis chain

Verify that the openai/gpt-5 model exists

The default judge LLM model has been changed to openai/gpt-5. This model doesn't appear to be a valid OpenAI model name (as of my knowledge cutoff). This could cause runtime failures for users who don't explicitly specify a judge LLM model.

Consider using a valid model like openai/gpt-4o or openai/gpt-4o-mini as the default:

- data["judge_llm"] = "openai/gpt-5" + data["judge_llm"] = "openai/gpt-4o-mini"

🌐 Web query:

What are the current valid OpenAI GPT model names available through their API?

💡 Result:

Short answer — common GPT models currently offered in the OpenAI API (families and typical variants):

GPT-4 family: gpt-4, gpt-4-turbo. [1]

GPT-4o family: gpt-4o, gpt-4o-mini (and realtime/voice/vision variants where available). [1][2]

GPT-4.1 family: gpt-4.1, gpt-4.1-mini (newer 4.1 releases). [4]

GPT-3.5 family: gpt-3.5-turbo and its context-window variants (e.g., 16k/4k variants). [1]

Notes: OpenAI frequently adds/removes model snapshots and introduces variant names (mini, turbo, realtime, nano, etc.), so check the official API models list or the OpenAI docs/models endpoint for the authoritative, up‑to‑date names before integrating. [1][4]

Sources:
[1] OpenAI Help Center — How can I access GPT-4, GPT-4 Turbo, GPT-4o, and GPT-4o mini?
[2] OpenAI blog — GPT‑4o mini announcement.
[4] OpenAI Help Center / Model release notes — GPT‑4.1 & related releases.

Replace invalid default judge LLM "openai/gpt-5" with a supported OpenAI model

openai/gpt-5 is not a recognized OpenAI model and will cause runtime failures for users who don't override the default.

File: rogue/run_cli.py Lines: 290-290

- data["judge_llm"] = "openai/gpt-5" + data["judge_llm"] = "openai/gpt-4o-mini"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

data["judge_llm"] = "openai/gpt-5"

data["judge_llm"] = "openai/gpt-4o-mini"

🤖 Prompt for AI Agents

In rogue/run_cli.py around line 290, the default judge LLM is set to the invalid model string "openai/gpt-5"; replace it with a supported OpenAI model identifier (for example "openai/gpt-4") so the default will work at runtime, and ensure any related tests or documentation that reference the old default are updated accordingly.

coderabbitai · 2025-09-18T15:44:34Z

rogue/run_cli.py

+    if cli_input.qualifire_api_key is None:
+        console.print(
+            """[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """,  # noqa: E501
+        )
+        return 1
+


⚠️ Potential issue

Consider making the exit code configurable or warning-only

Returning exit code 1 when the Qualifire API key is missing might break existing CI/CD pipelines where users don't require report persistence. While the warning is helpful, forcing a non-zero exit code for a non-critical feature might be disruptive.

Consider one of these alternatives:

Keep the warning but return 0 (success) if the evaluation itself passed

Add a CLI flag to control whether missing Qualifire key should fail the process

Only fail if explicitly required via configuration

Example of option 1:

if cli_input.qualifire_api_key is None: console.print( """[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """, # noqa: E501 ) - return 1 return get_exit_code(results)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if cli_input.qualifire_api_key is None:

console.print(

"""[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """, # noqa: E501

)

return 1

if cli_input.qualifire_api_key is None:

console.print(

"""[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """, # noqa: E501

)

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

rogue/run_cli.py (4)

34-39: Fix argparse Enum type/choices mismatch (runtime parse bug)

type=AuthType returns an Enum but choices=[e.value for e in AuthType] are strings, so valid inputs will be rejected.

-    parser.add_argument(
-        "--evaluated-agent-auth-type",
-        required=False,
-        type=AuthType,
-        choices=[e.value for e in AuthType],
+    parser.add_argument(
+        "--evaluated-agent-auth-type",
+        required=False,
+        type=str,
+        choices=[e.value for e in AuthType],
         help="How to authenticate with the evaluated agent (if needed)."
         f"Valid options are: {[e.value for e in AuthType]}",
     )

17-28: Add missing --workdir flag (prevents AttributeError later)

get_cli_input and run_cli use args.workdir but the parser doesn’t define it.

 def set_cli_args(parser: ArgumentParser) -> None:
+    parser.add_argument(
+        "--workdir",
+        type=Path,
+        default=Path(".") / ".rogue",
+        help="Working directory for CLI artifacts (scenarios.json, report.md, etc.)",
+    )
     parser.add_argument(
         "--config-file",
         type=Path,
         help="Path to config file",
     )

347-360: Use str() for HttpUrl rather than .encoded_string()

HttpUrl/Url in Pydantic doesn’t guarantee an encoded_string() method; this will raise at runtime. Use str(url).

-    get_agent_card(cli_input.evaluated_agent_url.encoded_string())
+    get_agent_card(str(cli_input.evaluated_agent_url))

-        evaluated_agent_url=cli_input.evaluated_agent_url.encoded_string(),
+        evaluated_agent_url=str(cli_input.evaluated_agent_url),

343-345: Avoid logging secrets in debug output

model_dump() may serialize secret fields; don’t include API keys/credentials in logs.

-    logger.debug("Running CLI", extra=cli_input.model_dump())
+    logger.debug(
+        "Running CLI",
+        extra=cli_input.model_dump(exclude={"judge_llm_api_key", "evaluated_agent_credentials", "qualifire_api_key"}),
+    )

♻️ Duplicate comments (2)

rogue/models/cli_input.py (2)

22-22: Resolved: Qualifire API key now accepted in CLIInput

Adding qualifire_api_key here unblocks CLI usage. Looks good.

65-65: Resolved: Mirrored field in PartialCLIInput

Including qualifire_api_key in the partial model fixes prior validation/merge issues.

🧹 Nitpick comments (5)

rogue/models/cli_input.py (2)

61-66: Tighten typing for file-path optionals (drop type: ignore)

Use explicit optional types to satisfy mypy without ignores.

-    business_context_file: Path = None  # type: ignore # fixed in model_post_init
-    input_scenarios_file: Path = None  # type: ignore # fixed in model_post_init
-    output_report_file: Path = None  # type: ignore # fixed in model_post_init
+    business_context_file: Path | None = None  # set in model_post_init
+    input_scenarios_file: Path | None = None  # set in model_post_init
+    output_report_file: Path | None = None  # set in model_post_init

67-67: Annotate model_post_init signature

Add the missing type for __context to satisfy type-checkers.

-    def model_post_init(self, __context):
+    def model_post_init(self, __context: object) -> None:

rogue/run_cli.py (3)

3-3: Prefer built-in generics over typing.Tuple

Modern Python supports tuple[...]. Drop the import and update hints.

-from typing import Tuple
...
-) -> Tuple[EvaluationResults | None, str | None]:
+) -> tuple[EvaluationResults | None, str | None]:
...
-) -> Tuple[EvaluationResults | None, str | None]:
+) -> tuple[EvaluationResults | None, str | None]:

Also applies to: 108-108, 146-146

234-236: Preserve traceback when re-raising

Use bare raise after logging to keep the original traceback.

-    except Exception as e:
+    except Exception:
         logger.exception("Failed to generate summary")
-        raise e
+        raise

392-396: Fix grammar in user-facing warning

Tweak the message.

-            """[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """,
+            """[red]Qualifire API key is not set. This report will not persist. It is highly recommended to set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """,

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d34ed0 and 74bf5b1.

📒 Files selected for processing (3)

rogue/models/cli_input.py (2 hunks)
rogue/run_cli.py (10 hunks)
rogue/server/api/llm.py (0 hunks)

💤 Files with no reviewable changes (1)

rogue/server/api/llm.py

🧰 Additional context used

📓 Path-based instructions (2)

{rogue,tests,examples}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

{rogue,tests,examples}/**/*.py: Format Python code with Black
Organize imports according to isort conventions
Use type hints for all function signatures
Follow PEP 8 naming (snake_case for variables and functions, PascalCase for classes)
Use try/except blocks for code that may raise exceptions
Ensure code passes flake8 linting
Ensure code passes mypy type checks per .mypy.ini
Address Bandit-reported security issues in Python code

Files:

rogue/models/cli_input.py
rogue/run_cli.py

rogue/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Place application/package code under the rogue/ directory

Files:

rogue/models/cli_input.py
rogue/run_cli.py

🧬 Code graph analysis (1)

rogue/run_cli.py (2)

sdks/python/rogue_sdk/types.py (1)

EvaluationResults (192-208)

rogue/server/services/qualifire_service.py (1)

report_summary (10-40)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Cursor Bugbot
GitHub Check: codestyle

🔇 Additional comments (2)

rogue/run_cli.py (2)

371-385: Plumbing looks correct: key + job_id threaded to summary generation

Passing qualifire_api_key_secret and job_id into create_report and down to the SDK is correct.

Please confirm the SDK’s generate_summary signature matches (results, model, api_key, qualifire_api_key, deep_test, judge_model, job_id) in the version pinned by this repo.

160-185: SDK eval flow and results write-out LGTM

Health check, job run, wait, serialize results, and return (results, job_id) look good.

coderabbitai · 2025-09-18T17:20:33Z

rogue/run_cli.py

    # Set defaults for required fields that are missing
    if data.get("judge_llm") is None:
-        data["judge_llm"] = "openai/o4-mini"
+        data["judge_llm"] = "openai/gpt-5"



⚠️ Potential issue

🧩 Analysis chain

Validate default judge LLM model name

"openai/gpt-5" is likely invalid and will break defaults for users. Please verify against official docs and choose a supported default (e.g., "openai/gpt-4o-mini").

- if data.get("judge_llm") is None: - data["judge_llm"] = "openai/gpt-5" + if data.get("judge_llm") is None: + data["judge_llm"] = "openai/gpt-4o-mini"

To confirm the current valid model names:

🌐 Web query:

What OpenAI GPT model IDs are currently valid for the API, and is "openai/gpt-5" available?

💡 Result:

Short answer: Yes — GPT‑5 is available in the API, but the model ID is gpt-5 (and related IDs), not openai/gpt-5.

Key valid OpenAI GPT model IDs (examples from OpenAI docs/announcements):

GPT‑5 family: gpt-5, gpt-5-mini, gpt-5-nano (API release described by OpenAI). [1][2]

GPT‑4.1 family: gpt-4.1, gpt-4.1-mini, gpt-4.1-nano (API). [3]

GPT‑4o / GPT‑4 families and earlier: gpt-4o, gpt-4, gpt-4o-mini, gpt-3.5-turbo, etc., remain in use per OpenAI docs/notes. [3][4]

Notes:

OpenAI publishes exact, canonical model IDs in their product pages and release notes — use those (e.g., "gpt-5"), not a prefixed ID like "openai/gpt-5". [1][3]

For the most up-to-date complete list and any availability/region or rate-limit details, check OpenAI’s model release notes and the GPT‑5 announcement pages. [3][4]

Sources:
[1] OpenAI — Introducing GPT‑5 for developers (API release details).
[2] OpenAI — GPT‑5 System Card / Model notes.
[3] OpenAI — Introducing GPT‑4.1 in the API (model IDs/pricing).
[4] OpenAI — Model release notes / Help Center.

Replace invalid default model ID "openai/gpt-5" with canonical "gpt-5".
rogue/run_cli.py (lines 286–289): data["judge_llm"] currently defaults to "openai/gpt-5" — change it to "gpt-5" because OpenAI model IDs use the canonical form (no "openai/" prefix).

🤖 Prompt for AI Agents

In rogue/run_cli.py around lines 286 to 289, the default judge_llm value is set to the invalid model ID "openai/gpt-5"; update the assignment so the default uses the canonical OpenAI model identifier "gpt-5" instead (replace "openai/gpt-5" with "gpt-5") to ensure the configured default is valid.

-

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

rogue/run_cli.py (5)
3-3: Prefer builtin generics; drop unused Tuple import.

Use tuple[...] (PEP 585) and remove from typing import Tuple.

Apply:
-from typing import Tuple
Then update the two annotations at Lines 108 and 146 (see comments there).

78-84: CLI flag addition looks good; clarify help text.

Minor copy tweak to explicitly name the env var.
-        help="API key to use when reporting summary to Qualifire. "
-        "Can be left unset if env is used.",
+        help="API key used when reporting summary to Qualifire. "
+        "Can be left unset when using the QUALIFIRE_API_KEY environment variable.",
146-146: Same: use builtin tuple[...].
-) -> Tuple[EvaluationResults | None, str | None]:
+) -> tuple[EvaluationResults | None, str | None]:
392-396: Good: no longer forcing non‑zero exit without key; fix warning grammar.

Message has a grammatical error; tighten copy.
-            """[red]Qualifire API key is not set. This report will not persist, it if highly recommended set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red] """,  # noqa: E501
+            """[red]Qualifire API key is not set. This report will not persist. It is highly recommended to set it using the --qualifire-api-key flag or the QUALIFIRE_API_KEY environment variable.[/red]""",  # noqa: E501
108-108: Return type hint: use builtin tuple[...]

Modernize typing per PEP 585 — callers unpack two values (results, job_id) at rogue/run_cli.py:357, so change is safe.
-) -> Tuple[EvaluationResults | None, str | None]:
+) -> tuple[EvaluationResults | None, str | None]:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74bf5b1 and 1546e28.

📒 Files selected for processing (2)

rogue/models/cli_input.py (1 hunks)
rogue/run_cli.py (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

rogue/models/cli_input.py

🧰 Additional context used

📓 Path-based instructions (2)

{rogue,tests,examples}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

{rogue,tests,examples}/**/*.py: Format Python code with Black
Organize imports according to isort conventions
Use type hints for all function signatures
Follow PEP 8 naming (snake_case for variables and functions, PascalCase for classes)
Use try/except blocks for code that may raise exceptions
Ensure code passes flake8 linting
Ensure code passes mypy type checks per .mypy.ini
Address Bandit-reported security issues in Python code

Files:

rogue/run_cli.py

rogue/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Place application/package code under the rogue/ directory

Files:

rogue/run_cli.py

🧬 Code graph analysis (1)

rogue/run_cli.py (2)

sdks/python/rogue_sdk/types.py (1)

EvaluationResults (192-208)

rogue/server/services/qualifire_service.py (1)

report_summary (10-40)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Cursor Bugbot
GitHub Check: codestyle

🔇 Additional comments (2)

rogue/run_cli.py (2)

211-216: Resolved prior duplication; single derivation of Qualifire key.

This addresses the earlier “redundant API key assignment” report. Good.

384-385: API key now plumbed into create_report — fixes prior bug.

This resolves the “API key not passed to report function” issue.

coderabbitai · 2025-09-21T08:15:55Z

rogue/run_cli.py

    # Set defaults for required fields that are missing
    if data.get("judge_llm") is None:
-        data["judge_llm"] = "openai/o4-mini"
+        data["judge_llm"] = "openai/gpt-5"


⚠️ Potential issue

🧩 Analysis chain

Default judge model likely invalid; avoid “openai/gpt-5” as a default.

High risk of runtime failure if this ID isn’t recognized by your router/SDK. Prefer a widely available default (e.g., openai/gpt-4o-mini) or validate against the server’s supported models.

- if data.get("judge_llm") is None: - data["judge_llm"] = "openai/gpt-5" + if data.get("judge_llm") is None: + data["judge_llm"] = "openai/gpt-4o-mini"

🌐 Web query:

What are the current, valid OpenAI Chat/Omni model IDs for API usage in September 2025, and is "openai/gpt-5" a supported identifier? Please cite official OpenAI docs.

💡 Result:

Short answer

Current official API model IDs for the GPT‑5 family (Sept 2025): gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat-latest, and related “thinking” variants (e.g., gpt-5-thinking, gpt-5-thinking-nano / GPT‑5 Pro variants per OpenAI docs). [1][2][3]

"openai/gpt-5" is not the documented/supported model identifier in OpenAI’s official docs; use the IDs above (e.g., "gpt-5" or "gpt-5-chat-latest") as shown in OpenAI’s API / release pages. [1][2]

Sources

OpenAI — “Introducing GPT‑5 for developers” (model names & API availability). [link].

OpenAI — “GPT‑5 System Card” (model labels including gpt-5-thinking, gpt-5-thinking-nano). [link].

OpenAI Help Center — “API Model Availability by Usage Tier and Verification Status” / Model Release Notes (lists gpt-5, gpt-5-mini and access). [link].

Do not default judge_llm to "openai/gpt-5"; use a documented model ID or validate at runtime.

rogue/run_cli.py lines 286–288 — remove the default "openai/gpt-5" (not a documented API model ID). Set a documented ID (e.g., "gpt-5-chat-latest" or "gpt-5") or add startup validation/fallback to a server-supported model.

🤖 Prompt for AI Agents

In rogue/run_cli.py around lines 286 to 288, the code sets a hard default judge_llm to the undocumented "openai/gpt-5"; remove that literal default and either set a documented model ID (for example "gpt-5-chat-latest" or a project-specific approved ID) or implement runtime validation with a fallback: check the configured judge_llm and if absent/invalid call a validator (or query the server-supported models) and set a safe documented default or raise a clear startup error; update the message/error to guide users to choose a supported model.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

sdks/python/rogue_sdk/__init__.py (1)

76-81: Exports resolve; optional: derive all from types to avoid drift

Confirmed: sdks/python/rogue_sdk/types.py defines StructuredSummary, SummaryGenerationRequest, SummaryGenerationResponse, ReportSummaryRequest, ReportSummaryResponse and sdks/python/rogue_sdk/init.py imports them via from .types import * and lists them in all. Optional: replace the static all with the suggested pattern (keep explicit base_exports, then extend with names from types.all) to prevent future drift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbdaadf and 863f21b.

📒 Files selected for processing (1)

sdks/python/rogue_sdk/__init__.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Cursor Bugbot

🔇 Additional comments (1)

sdks/python/rogue_sdk/__init__.py (1)

51-52: Expose types submodule at package root — LGTM

This enables rogue_sdk.types for namespace usage and IDE discovery. No functional concerns.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (7)

.github/workflows/rogue.yml (5)
29-31: Avoid shell activation; target the venv explicitly with uv for reliability.

Shell activation is unnecessary and can be brittle in CI. Point uv at the venv’s python instead.
-        run: source .venv/bin/activate && uv pip install -e sdks/python --force-reinstall
+        run: uv pip --python .venv/bin/python install -e sdks/python --force-reinstall
44-55: Second EXIT trap overrides the first; only one process gets cleaned up.

Define a single EXIT trap that handles both PIDs to prevent orphaned processes and reduce flakiness.
-          trap 'echo "🛑 Stopping agent..."; kill $AGENT_PID' EXIT
+          trap 'echo "🛑 Stopping processes..."; [[ -n "${AGENT_PID:-}" ]] && kill "$AGENT_PID"; [[ -n "${ROGUE_PID:-}" ]] && kill "$ROGUE_PID"' EXIT
@@
-          trap 'echo "🛑 Stopping rogue server..."; kill $ROGUE_PID' EXIT
+          # (single EXIT trap above handles both AGENT_PID and ROGUE_PID)
35-66: Gate secret-dependent steps for forked PRs to avoid failures.

OPENAI_API_KEY isn’t available on forked PRs; the agent/server run will fail. Skip this section for forks.
-      - name: Run rogue
+      - name: Run rogue
+        if: ${{ github.event_name != 'pull_request' || github.event.pull_request.head.repo.fork == false }}
         shell: bash
         env:
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
If you prefer to skip the entire job for forks, put the same if: at the job level.

1-9: Optional: add a concurrency group to cancel in-progress runs on new commits.

Prevents queued/duplicate “Rogue” jobs from stepping on each other.
 name: Rogue
+
+concurrency:
+  group: rogue-${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
60-60: Nit: extraneous blank line.

Harmless, but you can drop it to keep the script tight.
.github/workflows/test.yml (2)
36-38: Avoid shell activation; point uv at the venv directly.

Keeps steps shell-agnostic and a bit faster.
-        run: source .venv/bin/activate && uv pip install -e sdks/python --force-reinstall
+        run: uv pip --python .venv/bin/python install -e sdks/python --force-reinstall
10-21: Optional: add a concurrency group for matrix runs.

Cancels older in-flight matrices on new pushes to the same ref.
 jobs:
   test:
     runs-on: ubuntu-latest
+    concurrency:
+      group: test-${{ github.workflow }}-${{ github.ref }}
+      cancel-in-progress: true

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 863f21b and 1ac39c1.

📒 Files selected for processing (2)

.github/workflows/rogue.yml (2 hunks)
.github/workflows/test.yml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

.github/**

📄 CodeRabbit inference engine (AGENTS.md)

Keep CI/CD workflow files under .github/

Files:

.github/workflows/test.yml
.github/workflows/rogue.yml

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

.github/actions/run-tests/action.yml (2)
14-14: Make test runner resilient to missing/cross‑platform venvs.

Sourcing .venv/bin/activate assumes a POSIX venv exists. The one‑liner below picks the venv Python on Linux/macOS or Windows, and falls back to system Python.

Apply this diff (also includes the coverage fix above):
-      run: source .venv/bin/activate && pytest --junitxml=test-results.xml --cov=rogue --cov-report=xml
+      run: V=python; if [ -x .venv/bin/python ]; then V=.venv/bin/python; elif [ -x .venv/Scripts/python.exe ]; then V=.venv/Scripts/python.exe; fi; "$V" -m pytest --junitxml=test-results.xml --cov=. --cov-report=xml
16-26: Always upload reports even on test failures.

Add if: ${{ always() }} so artifacts are uploaded when pytest fails.

Suggested snippet:
    - name: "Upload Test Results"
      if: ${{ always() }}
      uses: actions/upload-artifact@v4
      with:
        name: junit-results-${{ inputs.python-version }}
        path: test-results.xml

    - name: "Upload Coverage Results"
      if: ${{ always() }}
      uses: actions/upload-artifact@v4
      with:
        name: coverage-report-${{ inputs.python-version }}
        path: coverage.xml

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ac39c1 and 054dbd9.

📒 Files selected for processing (1)

.github/actions/run-tests/action.yml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

.github/**

📄 CodeRabbit inference engine (AGENTS.md)

Keep CI/CD workflow files under .github/

Files:

.github/actions/run-tests/action.yml

🔇 Additional comments (1)

.github/actions/run-tests/action.yml (1)

14-14: Confirmed — pytest and pytest-cov are installed upstream; no change required.
.github/workflows/test.yml creates a venv and runs uv pip install -e . before calling ./.github/actions/run-tests; pyproject.toml's dev group includes pytest and pytest-cov (uv.lock contains pytest-cov).

coderabbitai · 2025-09-21T12:33:47Z

.github/actions/run-tests/action.yml

    - name: "Run tests"
      shell: bash
-      run: source .venv/bin/activate && uv run --dev pytest --junitxml=test-results.xml --cov=rogue --cov-report=xml
+      run: source .venv/bin/activate && pytest --junitxml=test-results.xml --cov=rogue --cov-report=xml


⚠️ Potential issue

Decouple coverage target from package rename (use project-wide coverage).

Hardcoding --cov=rogue will break or silently zero out coverage after the rename. Run coverage on the repo instead.

Apply this diff:

- run: source .venv/bin/activate && pytest --junitxml=test-results.xml --cov=rogue --cov-report=xml + run: V=python; if [ -x .venv/bin/python ]; then V=.venv/bin/python; elif [ -x .venv/Scripts/python.exe ]; then V=.venv/Scripts/python.exe; fi; "$V" -m pytest --junitxml=test-results.xml --cov=. --cov-report=xml

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

run: source .venv/bin/activate && pytest --junitxml=test-results.xml --cov=rogue --cov-report=xml

run: V=python; if [ -x .venv/bin/python ]; then V=.venv/bin/python; elif [ -x .venv/Scripts/python.exe ]; then V=.venv/Scripts/python.exe; fi; "$V" -m pytest --junitxml=test-results.xml --cov=. --cov-report=xml

🤖 Prompt for AI Agents

In .github/actions/run-tests/action.yml around line 14, the pytest invocation hardcodes --cov=rogue which will break after a package rename; change the coverage target to run over the whole repository (for example replace --cov=rogue with --cov=. or remove the --cov option and use --cov=. so coverage is collected project-wide) so coverage metrics don't depend on a specific package name.

drorIvry added 4 commits September 9, 2025 15:53

rename rogue

37943f0

Merge branch 'main' of https://github.com/qualifire-dev/rogue-private

48e851b

Merge branch 'main' of https://github.com/qualifire-dev/rogue-private

8e31943

wip

0d34ed0

osher-qualifire approved these changes Sep 18, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

fixed some issues

74bf5b1

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

drorIvry added 2 commits September 18, 2025 20:27

Merge branch 'main' of https://github.com/qualifire-dev/rogue-private

9adf8f3

-

merge main

1546e28

coderabbitai bot reviewed Sep 21, 2025

View reviewed changes

match versions

dbdaadf

This comment was marked as outdated.

Sign in to view

drorIvry added 3 commits September 21, 2025 14:59

match versions

88f383c

match versions

4aaa9f8

imports

863f21b

This comment was marked as outdated.

Sign in to view

tests

1ac39c1

coderabbitai bot reviewed Sep 21, 2025

View reviewed changes

tests

054dbd9

coderabbitai bot reviewed Sep 21, 2025

View reviewed changes

test

f03c277

drorIvry merged commit 682cb5c into main Sep 22, 2025
4 of 9 checks passed

drorIvry deleted the feature/cli-report-results-FIRE-778 branch September 22, 2025 08:41

coderabbitai bot mentioned this pull request Sep 22, 2025

Hotfix | CI using sdk from pypi instead of local #48

Merged

This was referenced Sep 29, 2025

feature/auto updates FIRE 799 #49

Merged

feature/aio example script #51

Merged

drorIvry mentioned this pull request Oct 7, 2025

feature/interview tui FIRE 811 #52

Merged

coderabbitai bot mentioned this pull request Oct 22, 2025

FIRE-832 | Rogue-CLI | MCP Protocol support #109

Merged

21 tasks

coderabbitai bot mentioned this pull request Dec 30, 2025

feature/red teaming #141

Merged

21 tasks

coderabbitai bot mentioned this pull request Feb 2, 2026

feature/red team report FIRE 1004 #155

Open

21 tasks

	data["judge_llm"] = "openai/gpt-5"
	data["judge_llm"] = "openai/gpt-4o-mini"

	run: source .venv/bin/activate && pytest --junitxml=test-results.xml --cov=rogue --cov-report=xml
	run: V=python; if [ -x .venv/bin/python ]; then V=.venv/bin/python; elif [ -x .venv/Scripts/python.exe ]; then V=.venv/Scripts/python.exe; fi; "$V" -m pytest --junitxml=test-results.xml --cov=. --cov-report=xml

Conversation

drorIvry commented Sep 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drorIvry commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 18, 2025 •

edited

Loading