The project contains the code of our project "On detecting cherry-picked trendlines"
Poorly supported stories can be told based on data by cherry-picking the data points included. While such stories may be technically accurate, they are misleading. In this paper, we build a system for detecting cherry-picking, with a focus on trendlines extracted from temporal data. We define a support metric for detecting such trendlines. Given a dataset and a statement made based on a trendline, we compute a support score that indicates how cherry-picked it is. Studying different types of trendlines and formalizing terms, we propose efficient and effective algorithms for computing the support measure. We also study the problem of discovering the most supported statements. Besides theoretical analysis, we conduct extensive experiments on real-world data, that demonstrate the validity of our proposed techniques.
The following are a few examples you can use: ExampleScript.py, test.py, test2.py, tmp.py
[1] Abolfazl Asudeh, H. V. Jagadish, You Wu, and Cong Yu. "On detecting cherry-picked trendlines". Proceedings of the VLDB Endowment , Vol. 13(6), pages 1276--1288, 2020, VLDB Endowment.
[2] Abolfazl Asudeh, You (Will) Wu, Cong Yu, H. V. Jagadish. "Perturbation-based Detection and Resolution of Cherry-picking". Data Engineering Bulletin, Vol. 45(3), pages 39--51, 2021, Special Issue on Challenges in Combating Misinformation.
This project is licensed under the MIT License - see the LICENSE.md file for details