Name	Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md	README.md
ethics-and-policy.md	ethics-and-policy.md

Curated List of Resources

The following is a curated list of resources to learn more about Adversarial Attacks on AI Systems.

General

https://drive.google.com/file/d/1-Gw1QsZEVhPYSeeNYnlrcgk_FbwuUTwq/
https://docs.google.com/document/d/1bEQM1W-1fzSVWNbS4ne5PopB2b7j8zD4Jc3nm4rbK-U/mobilebasic
https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html

Demos

http://jlin.xyz/advis/

Large Language Models

General

https://arxiv.org/abs/2202.03286
https://openai.com/research/gpt-4
https://arxiv.org/abs/2209.15259
https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training

Prompt Injection

https://simonwillison.net/2022/Sep/12/prompt-injection/
https://simonwillison.net/2023/Apr/14/worst-that-can-happen/
https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/
https://arstechnica.com/information-technology/2023/02/ai-powered-bing-chat-spills-its-secrets-via-prompt-injection-attack/
https://learnprompting.org/docs/prompt_hacking/injection
https://medium.com/seeds-for-the-future/tricking-chatgpt-do-anything-now-prompt-injection-a0f65c307f6b
https://greshake.github.io/
https://en.wikipedia.org/wiki/Prompt_engineering
https://analyticsindiamag.com/prompt-injection-threat-is-real-will-turn-llms-into-monsters/
https://arxiv.org/abs/2307.15043
https://github.com/dropbox/llm-security
https://imprompter.ai/

Data Poisoning

https://arxiv.org/abs/2305.00944
https://softwarecrisis.dev/letters/the-poisoning-of-chatgpt/
https://arxiv.org/abs/2302.10149

Surveys

https://arxiv.org/abs/2406.13843

Training Data Extraction

https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html

Toxicity

https://www.cs.princeton.edu/courses/archive/fall22/cos597G/lectures/lec14.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curated List of Resources

General

Demos

Differential Privacy Attacks

Adversarial Perturbance Text

Model Stealing

Risk Management

Large Language Models

General

Prompt Injection

Data Poisoning

Surveys

Training Data Extraction

Toxicity

About

Releases

Packages

Contributors 3

rzhade3/adversarial-ai-reading-list

Folders and files

Latest commit

History

Repository files navigation

Curated List of Resources

General

Demos

Differential Privacy Attacks

Adversarial Perturbance Text

Model Stealing

Risk Management

Large Language Models

General

Prompt Injection

Data Poisoning

Surveys

Training Data Extraction

Toxicity

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages