GitHub - amazon-science/FLIRT

Feedback Loop In-context Red Teaming (FLIRT) Framework (Code):

This repository contains code for the Feedback Loop In-context Red Teaming (FLIRT) paper accepted at EMNLP 2024. The code implements the FLIRT framework to generate adversarial prompts to analyze a target model. Below we describe more details about the code.

To run the code:

Go to the code folder. Please insert your in-context zero-shot and few-shot examples in the queue.txt file. We included examples in queue.txt for the formatting requirements.

To run FLIRT:

python FLIRT.py --flirt_iters 1000 --attack_strategy Scoring_greedy

License

The code and dataset are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

please note that The stable-diffusion-v1-4 model used in the code (and hence downloaded on your machine) is under the CreativeML open RAIL-M license.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
code		code
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feedback Loop In-context Red Teaming (FLIRT) Framework (Code):

To run the code:

License

About

Releases

Packages

Contributors 2

Languages

License

amazon-science/FLIRT

Folders and files

Latest commit

History

Repository files navigation

Feedback Loop In-context Red Teaming (FLIRT) Framework (Code):

To run the code:

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages