Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
### dotenv template
python/.env

test_results

### Go template
# If you prefer the allow list template instead of the deny list, see community template:
# https://github.com/github/gitignore/blob/main/community/Golang/Go.AllowList.gitignore
Expand Down
3 changes: 2 additions & 1 deletion python/prompts/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from ._istio_crd import get_istio_crd_prompt
from .models import IstioCrdType

__all__ = ["get_istio_crd_prompt"]
__all__ = ["get_istio_crd_prompt"]
1 change: 1 addition & 0 deletions python/prompts/_istio_crd.py
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,7 @@ def __init__(self, crd_type: "IstioCrdType"):
super().__init__(
name=f"Istio {crd_type.value} Template",
description=f"Template for generating {crd_type.value} configurations",
version="0.0.1",
sections=[
PromptSection(
name="Role Definition",
Expand Down
2 changes: 2 additions & 0 deletions python/prompts/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,13 @@ def __init__(
self,
name: str,
description: str,
version: str,
sections: list[PromptSection] | None,
variables: list[TemplateVariable] | None = None,
):
self.name = name
self.description = description
self.version = version
self.sections = sections
self.variables = variables or []
self._validate_variables()
Expand Down
115 changes: 115 additions & 0 deletions python/tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
## Artifacts

The test data is split into two groups - test cases and agent configurations.

### Test cases

The test cases file consists of one or more test cases, each having an input prompt (this is the query that's sent directly to the agent) and the expected output, which is the response that the agent should return.

```yaml
version: "1.0"
metadata:
description: "Authorization Policy Test Cases"

test_cases:
- name: deny_post_8080
input: "Deny results with POST method on port 8080 on all workloads in the foo namespace"
category: AuthorizationPolicy
expected_output:
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: httpbin
namespace: foo
spec:
action: DENY
rules:
- to:
- operation:
methods:
- POST
ports:
- "8080"
```

### Agent configuration

The agent configuration specifies the system prompt for the agent. This is the context that the agent uses to generate the response.

```yaml
version: "1.0"
name: "istio_authpolicy_crd_agent"
metadata:
description: "Agent for generating Istio Authorization Policy CRDs"
version: "0.0.1"

system_messages:
- |
You're an Istio CRD agent. You modify or create a new JSON based on the UQ. The JSON must conform to the PROTO SPEC. The response must only include one or more AuthorizationPolicy resource type.

PROTO...
```
## Running tests

To run the tests you pass in the test case, the agent file and specify the model you want to use:

```bash
uv run main.py run test_cases/authpolicy_test_cases.yaml agents/istio_crd_agent_0.yaml --model gpt-4o-mini
```

Once you've created the baseline results (or first test results), you can modify the prompt or model and run the tests again to see if the results change.

To check for changes in the results, you can use the `compare` command:

```bash
uv run main.py compare test_results/results1.json test_results/results2.json
```

The `compare` command will output the differences between the two test results files and provide a summary of the changes between the two runs:

```console
Comparing results_20250122_153628.json with results_20250122_153642.json
Model changed: True
Prompt changed: False

Analyzing 2 test cases:

=== Test 1 ===
Input: Deny results with POST method on port 8080 on all workloads in the foo namespace
Duration delta: 6569.08ms
Similarity: 95.46% → 95.46% (Δ: +0.00%)
No differences in output

=== Test 2 ===
Input: Allow GET requests on port 3000 for service-a in the bar namespace
Duration delta: 3134.77ms
Similarity: 87.52% → 83.91% (Δ: -3.62%)
Output differences:
--- results_20250122_153628.json (Test 2)
+++ results_20250122_153642.json (Test 2)
@@ -2,16 +2,16 @@
"apiVersion": "security.istio.io/v1",
"kind": "AuthorizationPolicy",
"metadata": {
- "name": "allow-get-service-a",
+ "name": "allow-get-port-3000",
"namespace": "bar"
},
"spec": {
+ "action": "ALLOW",
"selector": {
- "matchLabels": {
+ "match_labels": {
"app": "service-a"
}
},
- "action": "ALLOW",
"rules": [
{
"to": [

=== Summary Statistics ===
Tests with differences: 1 of 2
Average similarity delta: -1.81%
Average duration delta: +4851.93ms
```
Empty file added python/tests/__init__.py
Empty file.
93 changes: 93 additions & 0 deletions python/tests/agent_tester.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
from autogen_agentchat.agents import AssistantAgent
from schema import TestCase, TestRunResult, TestResult
from datetime import datetime
from pathlib import Path
from dataclasses import asdict
import json
import hashlib
import difflib

class AgentTester:
def __init__(self, agent: "AssistantAgent", test_cases: list[TestCase], results_dir: str = "test_results"):
self.agent = agent
self.test_cases = test_cases
self.results_dir = Path(results_dir)
self.results_dir.mkdir(exist_ok=True)

async def run_tests(self) -> TestRunResult:
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

model = self.agent._model_client._to_config().model_dump().get('model')

# Get all system messages once for the entire test run
system_msg = "\n".join([msg.content for msg in self.agent._system_messages])

# Create config once for the entire test run
config = {
"agent_name": self.agent.name,
"tools": "\n".join([tool.name for tool in self.agent._tools]),
"model": model,
"prompt": system_msg,
"prompt_hash": hashlib.sha256(system_msg.encode('utf-8')).hexdigest(),
}

results = []
for test_case in self.test_cases:
print(".", end="", flush=True)
start = datetime.now()
# Run the agent with the test input
response = await self.agent.run(task=test_case.input)
end = datetime.now()

duration_ms = (end - start).total_seconds() * 1000

# Only get the last TextMessage and get the content of it
output = response.messages[-1].content

similarity = self._calculate_similarity(test_case.expected_output, json.loads(output))

# Create test result without config
result = TestResult(
category=test_case.category,
input=test_case.input,
expected_output=test_case.expected_output,
actual_output=json.loads(output),
duration_ms=duration_ms,
similarity=similarity
)
results.append(result)

print()
# Create the test run result that combines config and individual results
test_run_result = TestRunResult(
timestamp=timestamp,
config=config,
results=results
)

# Save results
self._save_results(test_run_result)

return test_run_result

def _calculate_similarity(self, dict1: dict, dict2: dict):
# Convert both dictionaries to strings with consistent formatting
str1 = json.dumps(dict1, sort_keys=True)
str2 = json.dumps(dict2, sort_keys=True)

return difflib.SequenceMatcher(None, str1, str2).ratio() * 100

def _save_results(self, test_run_result: TestRunResult):
# Convert results to JSON-serializable format
results_dict = {
"timestamp": test_run_result.timestamp,
"config": test_run_result.config,
"results": [asdict(result) for result in test_run_result.results]
}

# Save to JSON file
results_file = self.results_dir / f"results_{test_run_result.timestamp}_{test_run_result.config.get('model')}.json"
with open(results_file, "w") as f:
json.dump(results_dict, f, indent=2)

print(f"Results saved to: {results_file}")
35 changes: 35 additions & 0 deletions python/tests/agents/istio_crd_agent_0.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
version: "1.0"
name: "istio_authpolicy_crd_agent"
metadata:
description: "Agent for generating Istio Authorization Policy CRDs (no protos)"
version: "0.0.1"

system_messages:
- |
You're an Istio CRD agent. You modify or create a new JSON based on the UQ. The response must only include one or more AuthorizationPolicy resource type.

EXAMPLES:
UQ: Deny requests from dev namespace to POST method on all workloads in the foo namespace
JSON: {"apiVersion": "security.istio.io/v1", "kind": "AuthorizationPolicy", "metadata": {"name": "deny-dev-post", "namespace": "foo"}, "spec": {"action": "DENY", "rules": [{"from": [{"source": {"namespaces": ["dev"]}}], "to": [{"operation": {"methods": ["POST"]}}]}]}}

UQ: Create a deny policy to deny all requests with POST method on port 8080 on all workloads in the foo namespace
JSON: {"apiVersion": "security.istio.io/v1", "kind": "AuthorizationPolicy", "metadata": {"name": "deny-post-8080", "namespace": "foo"}, "spec": {"action": "DENY", "rules": [{"to": [{"operation": {"methods": ["POST"], "ports": ["8080"]}}]}]}}

UQ: Audit any GET requests to the path with the prefix /user/profile
JSON: {"apiVersion": "security.istio.io/v1", "kind": "AuthorizationPolicy", "metadata": {"name": "audit-user-profile", "namespace": "ns1"}, "spec": {"selector": {"matchLabels": {"app": "myapi"}}, "action": "AUDIT", "rules": [{"to": [{"operation": {"methods": ["GET"], "paths": ["/user/profile/*"]}}]}]}}

UQ: Deny all requests to workloads in namespace foo
JSON: {"apiVersion": "security.istio.io/v1", "kind": "AuthorizationPolicy", "metadata": {"name": "deny-all", "namespace": "foo"}, "spec": {}}

UQ: Allow all requests to workloads in namespace foo
JSON: {"apiVersion": "security.istio.io/v1", "kind": "AuthorizationPolicy", "metadata": {"name": "allow-all", "namespace": "foo"}, "spec": {"rules": [{}]}}

UQ: Allow requests to workloads labeled with app=customers in the customers namespace if the request is from the service account cluster.local/ns/orders/orders or from the payments namespace, and the request header "foo" has the value "bar" or the request header "user" has the value "peterj".
JSON: {"apiVersion": "security.istio.io/v1", "kind": "AuthorizationPolicy", "metadata": {"name": "allow-customers", "namespace": "customers"}, "spec": {"action": "ALLOW", "selector": {"matchLabels": {"app": "customers"}}, "rules": [{"from": [{"source": {"principals": ["cluster.local/ns/orders/sa/orders"]}}, {"source": {"namespaces": ["payments"]}}], "to": [{"operation": {"when": [{"key": "request.headers[foo]", "values": ["bar"]}, {"key": "request.headers[user]", "values": ["peterj"]}]}}]}]}}

UQ: Allow IP address 1.2.3.4 and IPs from block 5.6.7.0/24 to access the apps labeled with app=payments.
JSON: {"apiVersion": "security.istio.io/v1", "kind": "AuthorizationPolicy", "metadata": {"name": "ingress-policy", "namespace": "foo"}, "spec": {"selector": {"matchLabels": {"app": "payments"}}, "action": "ALLOW", "rules": [{"from": [{"source": {"ipBlocks": ["1.2.3.4", "5.6.7.0/24"]}}]}]}}

UQ: Apply the policy to all workloads in the foo namespace and allows GET requests to prefix /info or POST requests to /data for workloads using cluster.local/ns/default/sleep service account or workloads in test namespace when the issuer claim is set to https://accounts.google.common
JSON: {"apiVersion": "security.istio.io/v1", "kind": "AuthorizationPolicy", "metadata": {"name": "allow-info-data", "namespace": "foo"}, "spec": {"action": "ALLOW", "rules": [{"from": [{"source": {"principals": ["cluster.local/ns/default/sa/sleep"]}}, {"source": {"namespaces": ["test"]}}], "to": [{"operation": {"methods": ["GET"], "paths": ["/info*"]}}, {"operation": {"methods": ["POST"], "paths": ["/data"]}}], "when": [{"key": "request.auth.claims[iss]", "values": ["https://accounts.google.com"]}]}]}}
tools: []
Loading