DUPE: Detection Undermining via Prompt Engineering for Deepfake Text

James Weichert and Chinecherem Dimobi

CS 5914 Fall 2023

README

Data

final_data.csv (in the Data/ folder) contains all 420 human and GPT-generated essays used in this research project.

Data Replication

The data in final_data.csv can be replicated using the following steps:

Generate GPT-written essays using the standardized prompt "Write a College [DISCIPLINE] class essay titled '[TITLE]'", replacing [DISCIPLINE] and [TITLE] with the discipline and title of the corresponding human-written essay from human_essays.csv, respectively. To replicate our methodology, use ChatGPT 3.5
Use the ZeroGPT and GPTZero web interfaces to evaluate each essay (human or AI), recording the "AI GPT %" and "AI Probability" scores, respectively. This can only be done manually for ZeroGPT, but there is a GPTZero API available for purchase (although we did this step manually for GPTZero as well).
Paraphrase the GPT-generated essays using the desired paraphrasing prompt, then repeat step 2.

Watermarked Data

To regenerate the watermarked data, please use the generate_watermark.ipynb notebook. Once the watermarked text is generated, it is processed by the detector, which outputs the number of tokens, the number of green list tokens, and the p_values. You can find the results, including the watermarked text and its corresponding detection values, in the watermark_detect.csv file. For an essay to be considered as watermarked, its p_value should be greater than 0.05.

Experiments

Most work required using web interfaces (e.g. ZeroGPT, GPTZero, ChatGPT), so the full findings are viewable in the final_data.csv file and cannot be programatically re-generated (see above section for replication instructions). Nevertheless, the following Python Notebooks include important Exploratory Data Analysis (EDA), supporting experiments, and the watermarking generation and detection infrastructure used for this project:

perplexity-eda.ipynb includes the code used for calculating the perplexity of each text, which uses this Language Model Perplexity module. This notebook also includes the visualizations showing the distribution of perplexity scores and the relationship between perplexity and ZeroGPT "AI %" score.
semantic-similarity.ipynb includes the infrastructure for using the Universal Sentence Encoder to generate semantic similarity scores between two texts.
generate_watermark.ipynb was used to generate the 212 watermarked GPT Neo essays.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Data		Data
__pycache__		__pycache__
.DS_Store		.DS_Store
README.md		README.md
generate_watermark.ipynb		generate_watermark.ipynb
list_generate.py		list_generate.py
perplexity-eda.ipynb		perplexity-eda.ipynb
perplexity.py		perplexity.py
semantic-similarity.ipynb		semantic-similarity.ipynb
watermark_prompts.txt		watermark_prompts.txt
watermark_test.ipynb		watermark_test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DUPE: Detection Undermining via Prompt Engineering for Deepfake Text

James Weichert and Chinecherem Dimobi

README

Data

Data Replication

Watermarked Data

Experiments

About

Releases

Packages

Contributors 2

Languages

james-weichert/dupe

Folders and files

Latest commit

History

Repository files navigation

DUPE: Detection Undermining via Prompt Engineering for Deepfake Text

James Weichert and Chinecherem Dimobi

README

Data

Data Replication

Watermarked Data

Experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages