Authors: Valentin Peter, Ramona Kühn, Jelena Mitrović, Michael Granitzer and Hannah Schmid-Petri.
To install the required dependencies and packages run:
- Run Setup script
sh setup.sh
- Activate virtual enviroment
source televenv/bin/activate
- Start jupyter notebook
jupyter-notebook
The Scraper uses a technique called discriminative snowball sampling to sample Telegram channels given a seed. It can be accessed from within the repository as such:
# This is an example how the code could be used
from scraper.scraper import Scraper
from scraper.client import Client
Client(api_id=123456,
api_hash="***********",
phone="+49***********")
scraper = Scraper(num_messages=None,
step_size=200,
maximum_iterations=1)
scraper.scrape(["some_channel", "some_other_channel"])
The analysis of over 50 million messages is conducted in the two notebooks [Data Preprocessing](Data Preprocessing.ipynb) and final_analysis. The Data Preprocessing notebook needs to be executed first, the analysis notebook second. Due to privacy reasons the data wont be published. When using the same seed as described in the thesis on the scraper it is possible to reproduce the data set. The code for reproducing the data is provided in the file example.py.
The original execution ist still maintained in final_analysis, therefore code and results can be inspected simultaneously.