Initial commit

RUB-SysSec · Mar 19, 2020 · f7c35c0 · f7c35c0
commit f7c35c0
Show file tree

Hide file tree

Showing 30 changed files with 2,889 additions and 0 deletions.
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,5 @@
+log
+ckpt
+final_models
+output
+archive
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,183 @@
+
+# Created by https://www.gitignore.io/api/vim,linux,macos,python
+# Edit at https://www.gitignore.io/?templates=vim,linux,macos,python
+
+### Linux ###
+*~
+
+# temporary files which can be created if a process still has a handle open of a deleted file
+.fuse_hidden*
+
+# KDE directory preferences
+.directory
+
+# Linux trash folder which might appear on any partition or disk
+.Trash-*
+
+# .nfs files are created when an open file is removed but is still being accessed
+.nfs*
+
+### macOS ###
+# General
+.DS_Store
+.AppleDouble
+.LSOverride
+
+# Icon must end with two \r
+Icon
+
+# Thumbnails
+._*
+
+# Files that might appear in the root of a volume
+.DocumentRevisions-V100
+.fseventsd
+.Spotlight-V100
+.TemporaryItems
+.Trashes
+.VolumeIcon.icns
+.com.apple.timemachine.donotpresent
+
+# Directories potentially created on remote AFP share
+.AppleDB
+.AppleDesktop
+Network Trash Folder
+Temporary Items
+.apdisk
+
+### Python ###
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# Mr Developer
+.mr.developer.cfg
+.project
+.pydevproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+### Vim ###
+# Swap
+[._]*.s[a-v][a-z]
+[._]*.sw[a-p]
+[._]s[a-rt-v][a-z]
+[._]ss[a-gi-z]
+[._]sw[a-p]
+
+# Session
+Session.vim
+Sessionx.vim
+
+# Temporary
+.netrwhist
+
+# Auto-generated tag files
+tags
+
+# Persistent undo
+[._]*.un~
+
+# Coc configuration directory
+.vim
+
+# End of https://www.gitignore.io/api/vim,linux,macos,python
+
+log
+ckpt
+final_models
+output
+archive
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,6 @@
+FROM tensorflow/tensorflow:2.1.0-gpu-py3
+
+RUN pip install -U Pillow scipy pytest
+RUN pip install -r requirements.txt
+
+WORKDIR /dct
diff --git a/README.md b/README.md
@@ -0,0 +1,170 @@
+# Leveraging Frequency Analysis for Deep Fake Image Recognition
+![logo](media/header.png)
+
+> Deep neural networks can generate images that are astonishingly realistic, 
+> so much so that it is often hard for untrained humans to distinguish them from actual photos.
+> These achievementshave been largely made possible by Generative Adversarial Networks (GANs). 
+> While these deepfake images have been thoroughly investigatedin the image domain—a classical approach from the area of image forensics—an 
+> analysis in the frequency domain has been missing. This paper addresses this shortcoming and 
+> our results reveal, that in frequency space, GAN-generated images exhibit severe artifacts that 
+> can be easily identified. We perform a comprehensive analysis, showing that these artifacts are 
+> consistent across different neural network architectures, data sets,and resolutions.
+> In a further investigation, we demonstrate that these artifacts are caused by upsampling operations 
+> found in all current GAN architectures, indicating a structural and fundamental problem in the way 
+> images are generatedvia GANs. Based on this analysis, we demonstrate how the frequency representation 
+> can be used to automatically identify deep fake images, surpassing state-of-the-art methods.
+
+## Prerequisites
+
+For ease of use we provide a Dockerfile which builds a container in which you can execute all experiments.
+Additionally, we provide a shell script for ease of use:
+
+```
+Choose: docker.sh {build|download|convert|shell}
+    build - Build the Dockerfile.
+    shell - Spawn a shell inside the docker container.
+    tests - Spawn Docker instance for pytest.
+    clean - Cleanup directories from training.
+
+```
+
+Otherwise you will need a recent Python 3 version, tensorflow 2.0+ with CUDA compatibility. See `requirements.txt` for packages needed.
+
+## Datasets
+
+We utilize these three popular datasets:
+*  [CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)
+*  [FFHQ](https://github.com/NVlabs/ffhq-dataset)
+*  [LSUN bedroom](https://github.com/fyu/lsun)
+
+
+Additionally, we utilize the pre-trained models from these repositories:
+* [StyleGAN](https://github.com/NVlabs/stylegan/)
+* [GANFingerprints](https://github.com/ningyu1991/GANFingerprints/)
+
+
+### Dataset preparation
+
+The datasets have to be converted beforehand. First run `crop_celeba.py` or `crop_lsun.py` depending on your dataset. This will create a new folder which has cropped instances of the training data to `128x128`. Then run `prepare_dataset.py`, depending on the mode selection the script expects different input.
+Note FFHQ is distributed in a cropped version.
+
+The scripts expects one directory as input, containing multiple directories each with at least 27,000 images.
+These directories will get encoded with labels in the order of appearence, i.e., encoded as follows:
+
+```
+data
+ |--- A_lsun 	-> label 0
+ |--- B_ProGAN 	-> label 1
+ 	...
+```
+It converts all images to dct encoded numpy arrays/tfrecords, depending on the mode selected. Saving the output in three directories train (100,000), val (10,000) and test (25,000).
+
+```
+usage: prepare_dataset.py [-h] [--raw] [--log] [--color] [--normalize]
+                          DIRECTORY {normal,tfrecords} ...
+
+positional arguments:
+  DIRECTORY           Directory to convert.
+  {normal,tfrecords}  Select the mode {normal|tfrecords}
+
+optional arguments:
+  -h, --help          show this help message and exit
+  --raw, -r           Save image data as raw image.
+  --log, -l           Log scale Images.
+  --color, -c         Compute as color instead.
+  --normalize, -n     Normalize data.
+
+Example:
+python prepare_dataset.py ~/datasets/GANFingerprints/perturbed_experiments/lsun/blur/ -lnc normal
+```
+
+## Computing Statistics
+
+To compute all of our statistics we utilize the `compute_statistics.py` script. This script is run on the raw (cropped) image files.
+```
+usage: compute_statistics.py [-h] [--output OUTPUT] [--color]
+                             AMOUNT [DATASETS [DATASETS ...]]
+
+positional arguments:
+  AMOUNT                The amount of images to load.
+  DATASETS              Path to datasets. The first entry is assumed to be the
+                        referrence one.
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --output OUTPUT, -o OUTPUT
+                        Output directory. Default: {output_default}.
+  --color, -c           Plot for each color channel seperate.
+  
+Example:
+python compute_statistics.py 10000 ~/datasets/ffhq/real,REAL ~/datasets/ffhq/fake,FAKE
+```
+
+## Experiments
+
+### Training your own models
+
+After you have converted the data files as laid out above, you can train a new classifier:
+```
+usage: classifer.py train [-h] [--debug] [--epochs EPOCHS]
+                          [--image_size IMAGE_SIZE]
+                          [--early_stopping EARLY_STOPPING]
+                          [--classes CLASSES] [--grayscale]
+                          [--batch_size BATCH_SIZE] [--l1 L1] [--l2 L2]
+                          MODEL TRAIN_DATASET VAL_DATASET
+
+positional arguments:
+  MODEL                 Select model to train {resnet, cnn, nn, log, log1,
+                        log2, log3}.
+  TRAIN_DATASET         Dataset to load.
+  VAL_DATASET           Dataset to load.
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --debug, -d           Debug mode.
+  --epochs EPOCHS, -e EPOCHS
+                        Epochs to train for; Default: 50.
+  --image_size IMAGE_SIZE
+                        Image size. Default: [128, 128, 3]
+  --early_stopping EARLY_STOPPING
+                        Early stopping criteria. Default: 5
+  --classes CLASSES     Classes. Default: 5
+  --grayscale, -g       Train on grayscaled images.
+  --batch_size BATCH_SIZE, -b BATCH_SIZE
+                        Batch size. Default: 32
+  --l1 L1               L1 reguralizer intensity. Default: 0.01
+  --l2 L2               L2 reguralizer intensity. Default: 0.01
+ 
+Example:
+
+python classifer.py train log2 datasets/ffhq/data_raw_color_train_tf/data.tfrecords datasets/ffhq/data_raw_color_val_tf/data.tfrecords -b 32 -e 100 --l2 0.01 --classes 1 --image_size 1024
+```
+
+### Testing
+
+You can also use our [pre-trained models](https://drive.google.com/open?id=1QjQnqMQnQOoIPwgzdJVJGwYzdReKqc0N).
+
+```
+usage: classifer.py test [-h] [--image_size IMAGE_SIZE] [--grayscale]
+                         [--batch_size BATCH_SIZE]
+                         MODEL TEST_DATASET
+
+positional arguments:
+  MODEL                 Path to model.
+  TEST_DATASET          Dataset to load.
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --image_size IMAGE_SIZE
+                        Image size. Default: [128, 128, 3]
+  --grayscale, -g       Test on grayscaled images.
+  --batch_size BATCH_SIZE, -b BATCH_SIZE
+                        Batch size. Default: 32
+                        
+Example:
+python classifer.py test path/to/classifier datasets/ffhq/data_raw_color_test_tf/data.tfrecords -b 32 --image_size 1024
+```
+
+### Baselines
+
+Basesline experiments are located in `baselines` directory. Each experiment (i.e., PRNU, Eigenfaces, and kNN) is separated into an individual script (with a common `Classifier` base-class). Usage examples on how to train and test classifier can be found in the corresponding `main` function. To repeat experiments, the constants `DATASETS_DIR` and `RESULTS_DIR` (in the beginning of the file) need to be set accordingly.
diff --git a/baselines/__init__.py b/baselines/__init__.py