This repository contains a PyTorch implementation of the paper Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News accepted at EMNLP 2020. If you find this implementation or the paper helpful, please consider citing:
@InProceedings{tanDIDAN2020,
author={Reuben Tan and Bryan A. Plummer and Kate Saenko},
title={Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News},
booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
year={2020} }
- Python 3.6
- Pytorch version 1.2.0
Please follow the instructions here (https://cs-people.bu.edu/rxtan/projects/didan/) to download the NeuralNews dataset. In particular, download this file (https://drive.google.com/file/d/1vD4DtyJOIjRzchPtCQu-KPrUjgTiWSmo/view?usp=drive_link) and place it into the data folder.
For each image, we extract 36 region features using a Faster-RCNN model (https://github.com/peteanderson80/bottom-up-attention) that is pretrained on Visual Genome. The region features for each image is stored separately as a .npy file.
To convert the articles and captions into the required input format, please go to https://github.com/nlpyang/PreSumm/blob/master/README.md and carry out steps 3 to 5 of data preparation.
We use the SpaCY python library to parse the articles and captions to detect named entities. We store this information as dictionary where the keys are the article names and the values are sets of detected name entities.
- captioning_dataset_path: Path to GoodNews captioning dataset json file
- fake_articles: Path to generated articles
- image_representations_dir: Directory which contains the object representations of images
- real_articles_dir: Directory which contains the preprocessed Torch text files for real articles
- fake_articles_dir: Directory which contains the preprocessed Torch text files for generated articles
- real_captions_dir: Directory which contains the preprocessed Torch text files for real captions
- ner_dir: Directory which contains a dictionary of named entities for each article and caption