Effort: Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection (Paper; Checkpoints)
πππ Our paper has been accepted by ICML 2025 Oral π!
Welcome to our work Effort, for detecting AI-generated images (AIGIs).
In this work, we propose: (1) a very very easy and effective method for generalization AIGI detectionπ; and (2) a novel analysis tool for quantifying the "degree of model's overfitting"π.
The figure below provides a brief introduction to our method: our method can be plug-and-play inserted into any vit-based large models such as CLIP.
If you want to know a very high-level code implementation of our method, see below.
# π© Perform SVD on the original weight
U, S, Vh = torch.linalg.svd(module.weight.data, full_matrices=False)
# π¨ Keep top r singular components (main weight)
U_r = U[:, :r] # π΅ Shape: (out_features, r)
S_r = S[:r] # π΅ Shape: (r,)
Vh_r = Vh[:r, :] # π΅ Shape: (r, in_features)
# πͺ Reconstruct the main weight (fixed)
weight_main = U_r @ torch.diag(S_r) @ Vh_r
# π₯ Residual components (trainable)
U_residual = U[:, r:] # π΅ Shape: (out_features, n - r)
S_residual = S[r:] # π΅ Shape: (n - r,)
Vh_residual = Vh[r:, :] # π΅ Shape: (n - r, in_features)
If you want to see more method-specific implementation details, please see the file effort_implementation.py.
The following two tables display the part results of our method on both the (face) deepfake detection benchmark and the (natural) AIGI detection benchmark. Please refer to our paper for more results.
Please run the following script to install the required libraries:
sh install.sh
If you are a deepfake player, more interested in face deepfake detection:
- The checkpoint of "CLIP-L14 + our Effort" training on FaceForensics++ are released at Google Drive.
If you are interested in detecting general AI-generated images, we provide two checkpoints that are trained on GenImage and Chameleon datasets, respectively:
- The checkpoint of "CLIP-L14 + our Effort" training on GenImage (sdv1.4) are released at Google Drive.
- The checkpoint of "CLIP-L14 + our Effort" training on Chameleon (sdv1.4) are released at Google Drive.
You can then infer one image or one folder with several images using the pretrained weights.
Specifically, run the following line:
cd DeepfakeBench/
python3 training/demo.py --detector_config training/config/detector/effort.yaml --weights ./training/weights/{NAME_OF_THE_CKPT}.pth --image {IMAGE_PATH or IMAGE_FOLDER}
After running the above line, you can obtain the prediction results (fake probabilities) for each image.
Note, you are processing a face image, please add the --landmark_model ./preprocessing/shape_predictor_81_face_landmarks.dat to extract the facial region for inference, as our model (trained on face deepfakes) used this face extractor for processing faces.
Since I am the creator and developer of DeepfakeBench, the data, codebase, and benchmarking protocols are mainly used from it. If you are a researcher in (face) deepfake detection, I highly recommend you try DeepfakeBench.
If you want to try other codebases, such as UnivFD, we provide a folder UniversalFakeDetect_Benchmark/ to reproduce and implement our method using its codebase. Using this codebase, you can then reproduce the results of Table 2 of our manuscript.
Below, we provide the detailed procedure to use DeepfakeBench to reproduce the results of our paper, such as Table 1.
If you want to reproduce the results of each deepfake dataset, you can download the processed datasets (have already finished preprocessing such as frame extraction and face cropping) from DeepfakeBench. For evaluating more diverse fake methods (such as SimSwap, BlendFace, DeepFaceLab, etc), you are recommended to use the just-released DF40 dataset (with 40 distinct forgery methods implemented).
If you only want to use the processed data we provided, you can skip this step.
Otherwise, you need to use the following codes for doing data preprocessing strictly following DeepfakeBench.
"Rearrangment" here means that we need to create a JSON file for each dataset for collecting all frames within different folders. Please refer to DeepfakeBench and DF40 for the provided JSON files for each dataset.
After running the above line, you will obtain the JSON files for each dataset in the ./preprocessing/dataset_json folder. The rearranged structure organizes the data in a hierarchical manner, grouping videos based on their labels and data splits (i.e., train, test, validation). Each video is represented as a dictionary entry containing relevant metadata, including file paths, labels, compression levels (if applicable), etc.
First, you can run the following lines to train the model:
- For multiple GPUs:
python3 -m torch.distributed.launch --nproc_per_node=4 training/train.py \
--detector_path ./training/config/detector/effort.yaml \
--train_dataset FaceForensics++ \
--test_dataset Celeb-DF-v2 \
--ddp
- For a single GPU:
python3 training/train.py \
--detector_path ./training/config/detector/effort.yaml \
--train_dataset FaceForensics++ \
--test_dataset Celeb-DF-v2 \
Once you finish training, you can test the model on several deepfake datasets such as DF40.
python3 training/test.py \
--detector_path ./training/config/detector/effort.yaml \
--test_dataset simswap_ff blendface_ff uniface_ff fomm_ff deepfacelab \
--weights_path ./training/weights/{CKPT}.pth
Then, you can obtain similar evaluation results reported in our manuscript.
If you find our work helpful to your research, please consider citing our paper as follows:
@article{yan2024effort,
title={Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection},
author={Yan, Zhiyuan and Wang, Jiangming and Wang, Zhendong and Jin, Peng and Zhang, Ke-Yue and Chen, Shen and Yao, Taiping and Ding, Shouhong and Wu, Baoyuan and Yuan, Li},
journal={arXiv preprint arXiv:2411.15633},
year={2024}
}


