This is a general repository to hold research, projects, reference code, etc. for research we perform at dreadnode.
Implementation of "Universal and Transferable Adversarial Attacks on Aligned Language Models" for Mistral 7B.
Implementation of "Fast Adversarial Attacks on Language Models In One GPU Minute" for Mistral 7B. At the time of release the authors have not posted the reference code from the paper, so this implementation is likely incorrect.
Implementation of "Attacking Large Language Models with Projected Gradient Descent" for Llama model variants with LitGPT. At teh time of release the authors have not posted any reference code, so be careful.
Research in partnership with OpenSSF for the AIxCC Event.