Conversation
Summary by CodeRabbit
WalkthroughAdds Azure OpenAI support end-to-end: TUI UI fields and provider entry, propagates azure endpoint/version/deployment through TUI/CLI, services, orchestrators, evaluator/red-team agents, LLM wrapper, reporting paths, and SDK types; wires api_base/api_version into litellm completion calls. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant TUI as TUI/Client
participant API as Server API
participant Orch as Orchestrator
participant Agent as EvaluatorAgent
participant LLM as LiteLlm / litellm
TUI->>API: start evaluation / request (includes api_base, api_version, deployment)
API->>Orch: create orchestrator(judge_llm_api_base, judge_llm_api_version, attacker_llm_api_base, ...)
Orch->>Agent: arun_evaluator_agent(..., judge_llm_api_base, judge_llm_api_version)
Agent->>LLM: get_llm_from_model(model, api_base, api_version)
LLM-->>Agent: completion response (uses api_base/api_version)
Agent-->>Orch: evaluation results
Orch-->>API: final results / summary
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@rogue/common/agent_model_wrapper.py`:
- Around line 66-70: The code unconditionally calls litellm._turn_on_debug()
when kwargs exist; change this to only enable debug when an explicit flag or
environment variable is set (e.g., check os.environ["LITELLM_DEBUG"] or a debug
kwarg). Update the branch around litellm._turn_on_debug() in the kwargs handling
so that litellm._turn_on_debug() is called only if the debug flag is truthy,
otherwise instantiate LiteLlm(model=model, **kwargs) without enabling debug;
ensure the debug condition checks are explicit and documented near the LiteLlm
instantiation.
In `@rogue/evaluator_agent/policy_evaluation.py`:
- Around line 374-377: The file unconditionally calls litellm._turn_on_debug(),
which enables verbose logging; change this to respect an environment flag (e.g.,
LITELLM_DEBUG) so debug is only enabled when the env var is truthy: import os,
read os.environ.get("LITELLM_DEBUG") (or similar), and call
litellm._turn_on_debug() only when that value indicates debugging; mirror the
same guarded approach used in agent_model_wrapper.py and ensure the symbol
litellm._turn_on_debug() is invoked conditionally based on the env var.
In `@rogue/evaluator_agent/red_team/base_red_team_attacker_agent.py`:
- Around line 76-77: The constructor stores _attacker_llm_api_base and
_attacker_llm_api_version but they are never used; either thread these values
into the attacker LLM client creation used by the attack generation logic
(analogous to how _judge_llm_api_base/_judge_llm_api_version are used in
_create_response_evaluator) or remove the unused attributes and constructor
parameters; specifically, update the attack generation path (the method(s)
responsible for creating the attacker LLM client or building attacker prompts —
e.g. any generate_attack / create_attacker_client / attack generation methods in
BaseRedTeamAttackerAgent and subclasses) to accept and pass
_attacker_llm_api_base and _attacker_llm_api_version when
instantiating/configuring the attacker LLM, or delete the attributes and adjust
subclasses/callers accordingly.
In `@rogue/server/services/llm_service.py`:
- Around line 181-185: Remove the direct call to litellm._turn_on_debug()
(present near the top-level imports and again around lines 252-256) because it
enables global verbose logging that can expose secrets and causes global side
effects; instead either rely on the documented LITELLM_LOG environment variable
(e.g., LITELLM_LOG=DEBUG/INFO) to control logging or, if dynamic runtime control
is required, wrap any invocation in a try/except and gate it behind a safe
feature flag (e.g., an internal DEBUG flag or config) so that
litellm._turn_on_debug() is only called when explicitly allowed and any
exceptions are caught and logged safely without printing credentials. Ensure you
update both occurrences in llm_service.py and remove the bare/global call from
module scope to avoid side effects during import.
🧹 Nitpick comments (10)
packages/tui/internal/tui/common_controller.go (1)
347-359: Consider handling models without the "azure/" prefix.The
azure_deploymentis only saved when the model has the "azure/" prefix. If a user configures a model name without this prefix,azure_deploymentwon't be set, which could cause issues downstream.Suggested improvement
// Save deployment name (strip "azure/" prefix from model) if strings.HasPrefix(msg.Model, "azure/") { m.config.APIKeys["azure_deployment"] = strings.TrimPrefix(msg.Model, "azure/") + } else { + // Model doesn't have prefix, use as-is for deployment name + m.config.APIKeys["azure_deployment"] = msg.Model }rogue/common/agent_model_wrapper.py (1)
10-26: Remove dead code or explain why it's commented out.The
_normalize_azure_endpointfunction is defined but its usage at line 54 is commented out. If this normalization is not needed, consider removing the function entirely. If it's planned for future use, add a comment explaining why.rogue/evaluator_agent/run_evaluator_agent.py (2)
105-133: Docstring missing documentation for new parameters.The new
judge_llm_api_baseandjudge_llm_api_versionparameters are not documented in the Args section of the docstring.📝 Add documentation for new parameters
judge_llm_aws_access_key_id: AWS access key ID for judge LLM judge_llm_aws_secret_access_key: AWS secret access key for judge LLM judge_llm_aws_region: AWS region for judge LLM evaluation_mode: Should be POLICY for this function + judge_llm_api_base: Optional API base URL for judge LLM (e.g., for Azure OpenAI) + judge_llm_api_version: Optional API version for judge LLM (e.g., for Azure OpenAI) Yields:
278-285: Inconsistent error logging context.The top-level exception logging (lines 278-285) omits
judge_llm_api_baseandjudge_llm_api_versionfrom the extra payload, while the inner exception handler (lines 221-236) includes them. For consistent debugging, consider adding these fields here as well.📝 Add missing context to error logging
except Exception: logger.exception( "💥 arun_evaluator_agent failed", extra={ "evaluated_agent_url": evaluated_agent_url, "judge_llm": judge_llm, + "judge_llm_api_base": judge_llm_api_base, + "judge_llm_api_version": judge_llm_api_version, }, ) raiserogue/evaluator_agent/red_team/factory.py (1)
55-97: Docstring missing documentation for new API base/version parameters.The function docstring doesn't document the four new parameters:
attacker_llm_api_base,attacker_llm_api_version,judge_llm_api_base, andjudge_llm_api_version.📝 Add documentation for new parameters
judge_llm_aws_access_key_id: AWS credentials for judge LLM judge_llm_aws_secret_access_key: AWS credentials for judge LLM judge_llm_aws_region: AWS region for judge LLM + attacker_llm_api_base: Optional API base URL for attacker LLM (e.g., for Azure OpenAI) + attacker_llm_api_version: Optional API version for attacker LLM (e.g., for Azure OpenAI) + judge_llm_api_base: Optional API base URL for judge LLM (e.g., for Azure OpenAI) + judge_llm_api_version: Optional API version for judge LLM (e.g., for Azure OpenAI) qualifire_api_key: Optional API key for premium features python_entrypoint_file: Path to Python file with call_agent functionsdks/python/rogue_sdk/types.py (1)
905-906: Consider adding Field descriptions for API documentation.The new
api_baseandapi_versionfields inScenarioGenerationRequestandSummaryGenerationRequestlackField()descriptions, which could improve API documentation and OpenAPI schema generation. This is optional but would improve consistency with some other fields in the codebase.📝 Add Field descriptions (optional)
- api_base: Optional[str] = None - api_version: Optional[str] = None + api_base: Optional[str] = Field( + default=None, + description="API base URL (e.g., for Azure OpenAI)", + ) + api_version: Optional[str] = Field( + default=None, + description="API version (e.g., for Azure OpenAI)", + )Also applies to: 926-927
packages/tui/internal/components/llm_config_dialog.go (3)
413-432: Consider extracting common input handling logic to reduce duplication.The Bedrock and Azure input handling (cursor movement, backspace, delete, character input, paste) follows nearly identical patterns with only the field mappings differing. This creates maintenance burden if behavior needs to change.
💡 Example refactor approach
Consider creating a helper method that returns the active field's input pointer and cursor pointer based on provider and active field index:
func (d *LLMConfigDialog) getActiveInputAndCursor() (*string, *int) { provider := d.Providers[d.SelectedProvider] if provider.Name == "bedrock" { switch d.ActiveInputField { case 0: return &d.AWSAccessKeyInput, &d.AWSAccessKeyCursor case 1: return &d.AWSSecretKeyInput, &d.AWSSecretKeyCursor case 2: return &d.AWSRegionInput, &d.AWSRegionCursor } } else if provider.Name == "azure" { switch d.ActiveInputField { case 0: return &d.APIKeyInput, &d.APIKeyCursor case 1: return &d.AzureEndpointInput, &d.AzureEndpointCursor case 2: return &d.AzureAPIVersionInput, &d.AzureAPIVersionCursor case 3: return &d.AzureDeploymentInput, &d.AzureDeploymentCursor } } return &d.APIKeyInput, &d.APIKeyCursor }This could consolidate the backspace, delete, cursor movement, and character input logic into single implementations.
Also applies to: 476-495, 604-627, 657-676, 718-745, 1016-1031
177-180:LLMConfigResultMsglacks an explicit Azure deployment field.The Azure deployment is currently embedded in the
Modelfield as"azure/" + deployment. If downstream code ever needs to extract the deployment name separately, it would require string parsing. Consider adding an explicitAzureDeploymentfield for clarity.💡 Suggested addition
type LLMConfigResultMsg struct { Provider string APIKey string AWSAccessKeyID string AWSSecretAccessKey string AWSRegion string AzureEndpoint string AzureAPIVersion string + AzureDeployment string Model string Action string }
826-827: Hardcoded Azure API version "2025-01-01-preview" may become outdated.The default API version is set in two places (lines 826-827 for reconfiguration, 854-855 for new configuration). Consider defining this as a constant to ensure consistency and easier updates.
💡 Suggested fix
+const DefaultAzureAPIVersion = "2025-01-01-preview" + // In handleEnter, for reconfiguration: - d.AzureAPIVersionInput = "2025-01-01-preview" + d.AzureAPIVersionInput = DefaultAzureAPIVersion // And for new configuration: - d.AzureAPIVersionInput = "2025-01-01-preview" + d.AzureAPIVersionInput = DefaultAzureAPIVersionAlso applies to: 854-855
rogue/server/services/llm_service.py (1)
170-179: Document newapi_base/api_versionparameters.These public methods now accept Azure/OpenAI endpoint details, but the docs don’t mention them. Please update the docstring (and/or add one to
generate_summary_from_results) to avoid confusion for SDK and API callers.Also applies to: 241-250
| if kwargs: | ||
| import litellm | ||
|
|
||
| litellm._turn_on_debug() | ||
| llm = LiteLlm(model=model, **kwargs) |
There was a problem hiding this comment.
Debug mode should not be unconditionally enabled in production.
litellm._turn_on_debug() enables verbose debug logging which may expose sensitive information and create excessive log volume in production. This should be controlled by an environment variable or debug flag.
Proposed fix
if kwargs:
import litellm
+ import os
- litellm._turn_on_debug()
+ if os.getenv("LITELLM_DEBUG", "").lower() in ("true", "1"):
+ litellm._turn_on_debug()
llm = LiteLlm(model=model, **kwargs)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if kwargs: | |
| import litellm | |
| litellm._turn_on_debug() | |
| llm = LiteLlm(model=model, **kwargs) | |
| if kwargs: | |
| import litellm | |
| import os | |
| if os.getenv("LITELLM_DEBUG", "").lower() in ("true", "1"): | |
| litellm._turn_on_debug() | |
| llm = LiteLlm(model=model, **kwargs) |
🤖 Prompt for AI Agents
In `@rogue/common/agent_model_wrapper.py` around lines 66 - 70, The code
unconditionally calls litellm._turn_on_debug() when kwargs exist; change this to
only enable debug when an explicit flag or environment variable is set (e.g.,
check os.environ["LITELLM_DEBUG"] or a debug kwarg). Update the branch around
litellm._turn_on_debug() in the kwargs handling so that litellm._turn_on_debug()
is called only if the debug flag is truthy, otherwise instantiate
LiteLlm(model=model, **kwargs) without enabling debug; ensure the debug
condition checks are explicit and documented near the LiteLlm instantiation.
| import litellm | ||
| from litellm import completion | ||
|
|
||
| litellm._turn_on_debug() |
There was a problem hiding this comment.
Debug mode should not be unconditionally enabled.
Same issue as in agent_model_wrapper.py - litellm._turn_on_debug() enables verbose debug logging unconditionally. This should be controlled by an environment variable.
Proposed fix
# litellm import takes a while, importing here to reduce startup time.
import litellm
from litellm import completion
- litellm._turn_on_debug()
+ if os.getenv("LITELLM_DEBUG", "").lower() in ("true", "1"):
+ litellm._turn_on_debug()🤖 Prompt for AI Agents
In `@rogue/evaluator_agent/policy_evaluation.py` around lines 374 - 377, The file
unconditionally calls litellm._turn_on_debug(), which enables verbose logging;
change this to respect an environment flag (e.g., LITELLM_DEBUG) so debug is
only enabled when the env var is truthy: import os, read
os.environ.get("LITELLM_DEBUG") (or similar), and call litellm._turn_on_debug()
only when that value indicates debugging; mirror the same guarded approach used
in agent_model_wrapper.py and ensure the symbol litellm._turn_on_debug() is
invoked conditionally based on the env var.
| self._attacker_llm_api_base = attacker_llm_api_base | ||
| self._attacker_llm_api_version = attacker_llm_api_version |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if attacker_llm_api_base/version are used in the orchestrator or related files
rg -n "attacker_llm_api_base|attacker_llm_api_version" --type pyRepository: qualifire-dev/rogue
Length of output: 2181
🏁 Script executed:
#!/bin/bash
# Read the base_red_team_attacker_agent.py file to check usage
cat -n rogue/evaluator_agent/red_team/base_red_team_attacker_agent.pyRepository: qualifire-dev/rogue
Length of output: 15246
🏁 Script executed:
#!/bin/bash
# Search for subclasses of BaseRedTeamAttackerAgent
rg -n "BaseRedTeamAttackerAgent" --type py -A 2Repository: qualifire-dev/rogue
Length of output: 3082
🏁 Script executed:
#!/bin/bash
# Check all subclass files for usage of the stored attributes
rg -n "_attacker_llm_api_base|_attacker_llm_api_version" --type py \
rogue/evaluator_agent/red_team/a2a_red_team_attacker_agent.py \
rogue/evaluator_agent/red_team/mcp_red_team_attacker_agent.py \
rogue/evaluator_agent/red_team/python_red_team_attacker_agent.pyRepository: qualifire-dev/rogue
Length of output: 45
Attacker LLM API base/version are stored but not used.
The _attacker_llm_api_base and _attacker_llm_api_version attributes are stored at lines 76-77 but never referenced in this class or its subclasses. In contrast, the similar _judge_llm_api_base and _judge_llm_api_version attributes are actively used in the _create_response_evaluator() method (lines 308-309). Consider whether these attacker LLM parameters should be passed through to the attack generation logic or removed if not needed.
🤖 Prompt for AI Agents
In `@rogue/evaluator_agent/red_team/base_red_team_attacker_agent.py` around lines
76 - 77, The constructor stores _attacker_llm_api_base and
_attacker_llm_api_version but they are never used; either thread these values
into the attacker LLM client creation used by the attack generation logic
(analogous to how _judge_llm_api_base/_judge_llm_api_version are used in
_create_response_evaluator) or remove the unused attributes and constructor
parameters; specifically, update the attack generation path (the method(s)
responsible for creating the attacker LLM client or building attacker prompts —
e.g. any generate_attack / create_attacker_client / attack generation methods in
BaseRedTeamAttackerAgent and subclasses) to accept and pass
_attacker_llm_api_base and _attacker_llm_api_version when
instantiating/configuring the attacker LLM, or delete the attributes and adjust
subclasses/callers accordingly.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@rogue/common/agent_model_wrapper.py`:
- Around line 10-23: The _normalize_azure_endpoint function currently appends
"/openai/deployments/{deployment}" which produces malformed URLs; change it so
when model starts with "azure/" it returns the Azure resource root only (strip
trailing slashes and any extra path segments the user may have added) instead of
constructing the deployment path—i.e., in _normalize_azure_endpoint, detect
model.startswith("azure/") and return a cleaned api_base (e.g., using
rstrip("/") or otherwise removing extra path segments) so LiteLLM can build the
full path from the model value itself.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@rogue/common/agent_model_wrapper.py`:
- Around line 67-79: Wrap the LiteLlm instantiation in a try/except to catch and
log configuration errors: around the call to LiteLlm(model=model, **kwargs) in
agent_model_wrapper.py, catch Exception, call logger.exception("Failed to create
LiteLlm instance", extra={"model": model}) and re-raise, then proceed to the
existing logger.debug and return llm; ensure the debug logging still redacts
api_key in kwargs and includes getattr(llm, "_additional_args", {}) as before
and do not log optional api_base/api_version in the exception handler.
| if kwargs: | ||
| llm = LiteLlm(model=model, **kwargs) | ||
| logger.debug( | ||
| "LiteLlm instance created", | ||
| extra={ | ||
| "model": model, | ||
| "kwargs": { | ||
| k: v if k != "api_key" else "***" for k, v in kwargs.items() | ||
| }, | ||
| "additional_args": getattr(llm, "_additional_args", {}), | ||
| }, | ||
| ) | ||
| return llm |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n rogue/common/agent_model_wrapper.py | sed -n '50,95p'Repository: qualifire-dev/rogue
Length of output: 1346
🏁 Script executed:
cat -n rogue/common/agent_model_wrapper.py | sed -n '1,70p' | head -50Repository: qualifire-dev/rogue
Length of output: 1835
🏁 Script executed:
rg "LiteLlm" -A 3 -B 3 rogue/ --type py | head -60Repository: qualifire-dev/rogue
Length of output: 3848
🏁 Script executed:
rg "except.*:" rogue/common/agent_model_wrapper.py -A 5Repository: qualifire-dev/rogue
Length of output: 249
Wrap LiteLlm instantiation in try/except to surface configuration errors.
Per coding guidelines, use try/except blocks for code that may raise exceptions. LiteLlm(model=model, **kwargs) at line 68 can fail with configuration errors; without handling, these bubble without context.
Add exception handling with structured logging, but log only what's reliably available (model and kwargs contents):
if kwargs:
try:
llm = LiteLlm(model=model, **kwargs)
except Exception:
logger.exception(
"Failed to create LiteLlm instance",
extra={"model": model},
)
raise
logger.debug(
"LiteLlm instance created",
extra={
"model": model,
"kwargs": {
k: v if k != "api_key" else "***" for k, v in kwargs.items()
},
"additional_args": getattr(llm, "_additional_args", {}),
},
)
return llmAvoid logging api_base and api_version in the exception handler—they are optional parameters that may be None, and api_base is reassigned at line 55 if truthy (normalized value).
🤖 Prompt for AI Agents
In `@rogue/common/agent_model_wrapper.py` around lines 67 - 79, Wrap the LiteLlm
instantiation in a try/except to catch and log configuration errors: around the
call to LiteLlm(model=model, **kwargs) in agent_model_wrapper.py, catch
Exception, call logger.exception("Failed to create LiteLlm instance",
extra={"model": model}) and re-raise, then proceed to the existing logger.debug
and return llm; ensure the debug logging still redacts api_key in kwargs and
includes getattr(llm, "_additional_args", {}) as before and do not log optional
api_base/api_version in the exception handler.
Description
Motivation and Context
Type of Change
Changes Made
Screenshots/Examples (if applicable)
Checklist
uv run black .to format my codeuv run flake8 .and fixed all issuesuv run mypy --config-file .mypy.ini .and addressed type checking issuesuv run bandit -c .bandit.yaml -r .for security checksuv run pytestand all tests passTesting
Test Configuration:
Test Steps:
1.
2.
3.
Additional Notes
Related Issues/PRs