This repo is a part of the report - Towards Automatic Comparison of Data Privacy Documents: A Preliminary Experiment on GDPR-like Laws 🔥
-
We extract information from GDPR-like documents from different countries written in natuaral language and construct well-strucured data.
-
The structured data are 4 columns including
chapter, section, article and recital
. This could benefit any future work that would like to explore GDPR-like using computational methods. 🚀 -
This project is inspired by COSC-824 Data Protection by Design, Department of Computer Science at Georgetown University.
We convert from PDF to Docx to CSV with well-structured style. Now, our data include GDPR-like documents from:
- European 🇪🇺
- Brazil 🇧🇷
- Indian 🇮🇳
- What next? 😉
Simply load the data into a dataframe in Python
as following code.
import pandas as pd
file_path = "data/LGPD-ES-Brazil-converted.csv"
df = pd.read_csv(file_path) # columns: ["chapter", "section", "article", "recital"]
- Kornraphop Kawintiranon - Github
- Yaguang Liu - Github
- Prof. Benjamin E. Ujcich (Instructor) - Personal
If you feel our paper and resources are useful and encouraging, please consider citing our work! 🙏
@article{kawintiranon2021automatic,
title={Towards Automatic Comparison of Data Privacy Documents: A Preliminary Experiment on GDPR-like Laws},
author={Kawintiranon, Kornraphop and Liu, Yaguang},
journal={arXiv preprint arXiv:2105.10117},
year={2021},
url={https://arxiv.org/abs/2105.10117}
}
- PDF to Docx: https://smallpdf.com/pdf-to-word