Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzer 2.0 #51

Merged
merged 19 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
![ci](https://github.com/prompt-security/ps-fuzz/actions/workflows/ci.yml/badge.svg)
![GitHub contributors](https://img.shields.io/github/contributors/prompt-security/ps-fuzz)
![Last release](https://img.shields.io/github/v/release/prompt-security/ps-fuzz)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/148n5M1wZXp-ojhnh-_KP01OYtUwJwlUl?usp=sharing)
</h2>


Expand Down Expand Up @@ -190,11 +191,29 @@ Run tests against the system prompt (in non-interactive batch mode):
prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt
```

#### 📺 Custom Benchmark!
Run tests against the system prompt with a custom benchmark

```
prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt --custom-benchmark=ps_fuzz/attack_data/custom_benchmark1.csv
```

#### 🐹 Run only a subset of attacks!
Run tests against the system prompt with a subset of attacks

```
prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt --custom-benchmark=ps_fuzz/attack_data/custom_benchmark1.csv --tests='["ucar","amnesia"]'
```

<br>
<br>
<br>


<a id="colab"></a>
## 📓 Google Colab Notebook
Refine and harden your system prompt in our [Google Colab Notebook](https://colab.research.google.com/drive/148n5M1wZXp-ojhnh-_KP01OYtUwJwlUl?usp=sharing)<br><br>
<img src="./resources/PromptFuzzer.png" alt="Prompt Fuzzer Refinement Process"/>
<br><br>
<a id="demovideo"></a>
## 🎬 Demo video
[![Watch the video](https://img.youtube.com/vi/8RtqtPI_bsE/hqdefault.jpg)](https://www.youtube.com/watch?v=8RtqtPI_bsE)
Expand Down Expand Up @@ -245,9 +264,9 @@ We use a dynamic testing approach, where we get the necessary context from your
<a id="roadmap"></a>
## :rainbow: What’s next on the roadmap?

- [ ] Google Colab Notebook
- [X] Google Colab Notebook
- [X] Adjust the output evaluation mechanism for prompt dataset testing
- [ ] More attack types
- [ ] Adjust the output evaluation mechanism for prompt dataset testing
- [ ] Better reporting capabilites
- [ ] Hardening recommendations

Expand Down
58 changes: 52 additions & 6 deletions ps_fuzz/app_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,16 @@
import json
import sys, os
import colorama
import pandas as pd
from .util import wrap_text
from .results_table import print_table
import logging
logger = logging.getLogger(__name__)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
console_handler = logging.StreamHandler()
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)
logger.propagate = False

class AppConfig:
default_config = {
Expand All @@ -17,14 +23,19 @@ class AppConfig:
'num_threads': 4,
'attack_temperature': 0.6,
'system_prompt': '',
'custom_benchmark': '',
'tests': []
}

def __init__(self, config_state_file: str):
self.config_state_file = config_state_file
try:
self.load()
except Exception as e:
logger.warning(f"Failed to load config state file {self.config_state_file}: {e}")
def __init__(self, config_state_file: str, config_state: dict = None):
if config_state:
self.config_state = config_state
else:
self.config_state_file = config_state_file
try:
self.load()
except Exception as e:
logger.warning(f"Failed to load config state file {self.config_state_file}: {e}")

def get_attributes(self):
return self.config_state
Expand Down Expand Up @@ -109,6 +120,39 @@ def target_model(self, value: str):
self.config_state['target_model'] = value
self.save()

@property
def custom_benchmark(self) -> str:
return self.config_state['custom_benchmark']

@custom_benchmark.setter
def custom_benchmark(self, value: str):
if not value: raise ValueError("Custom benchmark file cannot be empty, has to be a path to file")
if not os.path.exists(value): raise ValueError("Custom benchmark file does not exist")
if not os.path.isfile(value): raise ValueError("Custom benchmark file is not a file")
if not os.access(value, os.R_OK): raise ValueError("Custom benchmark file is not readable")
if os.path.getsize(value) == 0: raise ValueError("Custom benchmark file is empty")
if not value.endswith('.csv'): raise ValueError("Custom benchmark file must be a CSV file")
df = pd.read_csv(value)
if 'prompt' not in df.columns: raise ValueError("Custom benchmark file must have a 'prompt' column")
if 'response' not in df.columns: raise ValueError("Custom benchmark file must have a 'response' column")
self.config_state['custom_benchmark'] = value
self.save()

@property
def tests(self) -> [str]:
return self.config_state['tests']

@tests.setter
def tests(self, value: str):
try:
if len(value) > 0:
self.config_state['tests'] = json.loads(value)
else:
self.config_state['tests'] = []
except Exception as e:
self.config_state['tests'] = []
self.save()

@property
def num_attempts(self) -> int:
return self.config_state['num_attempts']
Expand Down Expand Up @@ -164,6 +208,8 @@ def parse_cmdline_args():
parser.add_argument('--attack-model', type=str, default=None, help="Attack model")
parser.add_argument('--target-provider', type=str, default=None, help="Target provider")
parser.add_argument('--target-model', type=str, default=None, help="Target model")
parser.add_argument('--custom-benchmark', type=str, default=None, help="Custom benchmark file")
parser.add_argument('--tests', type=str, default='', help="Custom test configuration (LIST)")
parser.add_argument('-n', '--num-attempts', type=int, default=None, help="Number of different attack prompts")
parser.add_argument('-t', '--num-threads', type=int, default=None, help="Number of worker threads")
parser.add_argument('-a', '--attack-temperature', type=float, default=None, help="Temperature for attack model")
Expand Down
Loading
Loading