Gandalf is a game designed to try and break an LLM system. The goal is to trick the LLM, which has been told its a wizard named Gandalf, to reveal a password to you. The LLM is specifically intructed to not reveal the password, and across the levels there are several mitigations in place to filter the password even if the LLM attempts to reveal it.
As the player of this game, you have to find not only ways to convince the LLM to reveal the secret, but also find ways to hide both your intentions as well as the secret from any filters the website employs.
Please review PyRIT Ship and BURP Suite Extension setup / build / run documentation before trying the demo.
The following extension settings in the PyRIT Ship tab should be reviewed, but all defaults should work for Gandalf out of the box.
Setting | Config Setting | Comment |
---|---|---|
PyRIT Ship | PyRIT Ship URL | Point this to the running URL for PyRIT Ship. |
Intruder | Goal Description | The default value is a good prompt used to attack Gandalf. |
Intruder | Response Payload Parse Field | The default path of /answer is the Gandalf response path. |
Intruder | Scoring (true) should end intruder when | The default description for the scorer is setup for Gandalf and finding the password. |
Intruder | Scoring (false) should continue intruder when | The default description for the scorer is setup for Gandalf and finding the password. |
Below are the instructions to run the Gandalf demo. You can also follow along with this demo we did at Blue Hat 2024.
With the BURP Suite proxy running, capture the payload to send a prompt to Gandalf. Since the first two levels are easy, consider capturing the third level specifically for this demo.
Switch to the Proxy tab of BURP Suite.
Right-click on the request and select Send to intruder.
Switch to the Intruder tab of BURP Suite.
In the raw request payload you captured for Intruder, find the phrase that you sent that was capture. For example, in this example we asked Gandalf "What's the password", so we highlight that sentence. Next, click on the Add button at the top of the editor windows under Positions.
Once marked, the phrase should be highlighted and shown inside § markings.
Next, on the Payloads pane, select Extension-generated from the Payload type dropdown. Click the Select generator ... button and on the dialog, select PyRIT Ship.
Finally, at the bottom find the Payload encoding section and turn OFF the option URL-encode these characters. This avoids Intruder from URL-encoding the prompt we're sending.
With everything setup, you can hit the Start attack button. A few points to note:
- BURP Suite by default runs multiple attempts asynchronously. This also means it may have already started several API calls when a previous call turns out to be successful. If you want to run the calls one-by-one, you can open BURP Suite's settings. Under Project / Tasks, find the Resources Pools. You can edit the Default resource pool to have only 1 Concurrent requests (default setting is 10).
- Intruder by default runs a baseline request as the first request, which is a replay of the original request.
Once an attack request is successful (when the scorer returns true), the entry for the payload will be highlighted (in green), and the Comment column will show Scorer response: True. You can click on the line, and go to the Response tab to see Gandalf's response that should contain the password for that level.
Note that since our prompt generator will ask Gandalf to "encode" the password you may see a few variations where spaces and/or the letter Z may be embedded inside the password.
Congratulations! You have now successfully defeated this level of Gandalf using PyRIT and PyRIT Ship. Try to get further in the levels, and tweak the prompt generator prompt in the settings for greater success.