- Authors: Sahar Abdelnabi, Rakibul Hasan, Mario Fritz
- CVPR'22
- This repository contains code to reproduce our dataset collection and training for our paper. Detailed instructions can be found in each subdirectory.
Misinformation is now a major problem due to its potential high risks to our core democratic and societal values and orders. Out-of-context misinformation is one of the easiest and effective ways used by adversaries to spread viral false stories. In this threat, a real image is re-purposed to support other narratives by misrepresenting its context and/or elements. The internet is being used as the go-to way to verify information using different sources and modalities. Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-caption pairing using Web evidence. To integrate evidence and cues from both modalities, we introduce the concept of 'multi-modal cycle-consistency check'; starting from the image/caption, we gather textual/visual evidence, which will be compared against the other paired caption/image, respectively. Moreover, we propose a novel architecture, Consistency-Checking Network (CCN), that mimics the layered human reasoning across the same and different modalities: the caption vs. textual evidence, the image vs. visual evidence, and the image vs. caption. Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking, and significantly outperforms previous baselines that did not leverage external evidence.
We here share our dataset collection pipeline that you can use to download the dataset from scratch or to download other subsets from the NewsCLIPpings dataset.
We share the links that resulted from the Google search we performed using query images and captions. You can find here more details about how to get them and their format. You can adapt the crawler pipeline to extract and download the evidence from these links.
If you would like to access our already-collected evidence (along with the preprocessing and precomputed embeddings), please find more details under curated_dataset.
You can find our pipeline for preprocessing the data and computing the embeddings under data_preprocessing. If you are using our collected evidence dataset, you can skip this step.
We share our training and evaluation code for two setups: 1) Training using sentence embeddings. 2) Training using BERT+LSTM.
You can find our code to finetune CLIP in finetuning_clip.
Checkpoints can be found here.
- If you find this code or dataset helpful, please cite our paper:
@inproceedings{abdelnabi22cvpr,
title = {Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources},
author = {Sahar Abdelnabi and Rakibul Hasan and Mario Fritz},
year = {2022},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)}
}