Cooooookiiies
This project only focusses on the websites owned by the Dutch government. The domains used for this project were fetched from the Dutch goverrnment website. Using the script in this repository it is possible to fetch all the domain names automatically.
TODO: add univeristies, hospitals, police & banks to the list
This project makes use of Python3. In order to run this repository the dependencies of the Pythia library are required. Run the following command to install all dependencies:
$ pip3 install --upgrade ipwhois tldextract wordsegment selenium bs4 dnspython intervaltree netaddr nltk psutil
Google Chrome aswell as the corresponding Chrome Driver should be installed on the OS aswell. Make sure to download the correct driver version. This should match the version of the install Google Chrome browser. The Chrome Driver should be located in the Pythia folder and chrome path should be the following: C:\Program Files\Google\Chrome\Application\chrome.exe
et some links on the frontpage to get the cookies
and also if the user clicks something, does the user get more cookies?
compare front page vs inside
get all the links -> take up to 20
something with the https://
decisions for tools, why
what worked and what doesnt work
for the list of trackers we used these sources
rank the websites by # third parties