This repository has the dataset used in the study of controversy in Wikipedia reported in the following papers
In total there are 3 folders, 61 files, totalling 152 GiB uncompressed. The archive was created using the 7zip tool on linux, and it is a multipart file because of the file size restrictions in git.
To access the files, see https://github.com/U-Alberta/wikipedia_controversy_dataset/releases/tag/v1.
Please acknowledge the source of the dataset by citing either paper below.
@article{DBLP:journals/tist/RadB15,
author = {Hoda Sepehri Rad and
Denilson Barbosa},
title = {Identifying Controversial Wikipedia Articles Using Editor Collaboration
Networks},
journal = {{ACM} Trans. Intell. Syst. Technol.},
volume = {6},
number = {1},
pages = {5:1--5:24},
year = {2015}
}
@inproceedings{DBLP:conf/ht/RadMRB12,
author = {Hoda Sepehri Rad and
Aibek Makazhanov and
Davood Rafiei and
Denilson Barbosa},
title = {Leveraging editor collaboration patterns in wikipedia},
booktitle = {{HT}},
pages = {13--22},
publisher = {{ACM}},
year = {2012}
}