In this repository, you can find code and instructions for reproducing the plots from Right and left, partisanship predicts vulnerability to misinformation by Dimitar Nikolov, Alessandro Flammini, and Filippo Menczer.
To start, clone the repo:
$ git clone https://github.com/dimitargnikolov/twitter-misinformation.git
You should run all subsequent commands from the directory where you clone the repo.
There are three datasets you need to obtain. Before you begin, create a data
directory at the root of the repo.
This dataset contains a set of link sharing actions that occurred on Twitter during the month of June 2017. The dataset is available on the Harvard Dataverse.
This is a dataset from Facebook, which gives political valence scores to several popular news sites. You can request access to the dataset from Dataverse. Once you have access, put the top500.csv
file into the data
directory.
This is a dataset of manually curated sources of misinformation available at OpenSources.co. Clone it from Github in your data
directory.
$ git clone https://github.com/BigMcLargeHuge/opensources.git data/opensources
Once you obtain all data as described above, your data
directory should look like this:
data
├── domain-shares.data
├── opensources
│ ├── CONTRIBUTING.md
│ ├── LICENSE
│ ├── README.md
│ ├── badges.txt
│ ├── releasenotes.txt
│ └── sources
│ ├── sources.csv
│ └── sources.json
└── top500.csv
Make sure you have Python 3 installed on your system. Then, set up a virtualenv
with the required modules at the root of the cloned repository:
$ virtualenv -p python3 VENV
$ source VENV/bin/activate
$ pip install -r requirements.txt
From now on, any time you want to run the analysis, activate your virtual environment with:
$ source VENV/bin/activate
The replication code is contained in the .py
files in the scripts
directory. You can automate their execution with the provided snakemake workflow:
$ cd workflow
$ snakemake -p
The execution will display the actual shell commands being executed, so you can run them individually if you want. You can inspect the workflow/Snakefile
file to see how the inputs and outputs for each script are specified. In addition, you can execute each script with
$ python <script_name.py> --help
to learn about what it does.
At the end of the execution, the generated plots will be in the data
directory.
To regenerate the plots from scratch, in the workflow
directory you can do:
$ snakemake clean
$ snakemake -p
If you have any questions about running this code or obtaining the data, please open an issue in this repository and we will get back to you as soon as possible.