This project aims to detect image forgeries using a Convolutional Neural Network (CNN) implemented in PyTorch. Inspired by the work of Y. Rao et al. on A Deep Learning Approach to Detection of Splicing and Copy-Move Forgeries in Images, our approach involves extracting features using a CNN, followed by feature fusion, and finally classification using an SVM from scikit-learn. The datasets used in this project are the CASIA2 and the NC2016 datasets.
- Install Python and Anaconda (We recommend Python 3.9 because of Compatibility with most of the libraries we would be using)
- Now create a virtual environment to keep the project dependencies isolated after which we activate the environment. This all can be done by using the following commands:
conda create -n <name> python=3.9
conda activate <name>
3.Install the required packages for this project, we would first install pip followed by requirements.conda install pip
pip install -r requirements.txt
- After setting up the above requirements, clone the repository to get the project (link provided in report) -
git clone <repository_url>
- Acquire the datasets – CASIA2 and NC2016 online and place them in the “data” folder of the project.
- Navigate to the ' src/’ folder
cd src
extract_patches.py
-This code is used to extract training patches which will be fed into the CNNtrain_net.py
- This code will be used to train the entire CNN model and produce a model .pt filefeature_extraction.py
- This code takes the trained CNN model as input from the previous step and extracts the features which are stored in a csv filesvm_classification.py
- This code performs the cross validation and outputs the evaluation metrics.
- Run the patch extraction script to create image patches for both tampered and untampered regions.
python extract_patches.py
- Now we would use the extracted image patches to train our cnn model and save it in ‘data/output/pre_trained_cnn‘ directory.
python train_cnn.py
- Now we can execute the feature extraction script. This script will generate 400-D feature representations for each image using the trained CNN model. This script will create and save the fused feature vectors for each image in the ’data/output/features’ folder.
python feature_extraction.py
- The last step is to do SVM classification.
python svm_classification.py
This will train and test the SVM classifier on the extracted features and report the accuracy and cross-entropy loss per epoch for each dataset.
After executing the SVM classification script, we obtain the 10-fold cross-validation accuracy for both datasets and the final files generated can be checked in output folder.
Output directory : data\output
It contains all the data generated as output during the execution of the project. This directory has subdirectories for different types of output data, including accuracy, features, loss function, and trained models.
data\output\accuracy
The accuracy directory contains two CSV files - CASIA2_Accuracy.csv and NC16_Accuracy.csv - that record the accuracy of the models trained on the CASIA2 and NC16 datasets, respectively.
data\output\features
The features directory contains two CSV files - CASIA2_extracted_features.csv and NC16_extracted_features.csv - that store the extracted features for each dataset. These features can be used for further analysis or model training.
data\output\loss_function
The loss_function directory contains two CSV files - CASIA2_Loss.csv and NC16_Loss.csv - that record the loss function trend values for the models trained on the CASIA2 and NC16 datasets, respectively.
data\output\trained_models
The trained_models directory contains the final trained models saved in PyTorch format - Cnn_casia2.pt and Cnn_nc16.pt - for the CASIA2 and NC16 datasets, respectively. These models can be used for prediction or further fine-tuning.
A Final Report is Available in the Report Section of the Repository
- Sejal Chopra (40164708)
- Praveen Singh (40199511)
- Elvin Rejimone (40193868)
- Anushka Sharma (40159259)