A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.
There are 4 files:
corpus.tar.xz
- ready to use corpus of 1,854,993 Ukrainian Twitter texts with .csv extention.Corpus-Downloading.ipynb
- Jupyter Notebook file with instructions for Downloading.Corpus-Filtering.ipynb
- Jupyter Notebook file with instructions for Filtering.twitter_scraper.py
- Python script for Downloading (modified version of Kenneth Reitz's scraper - https://github.com/kennethreitz/twitter-scraper)
Reference:
Bobrovnyk K. (2019) AUTOMATED BUILDING AND ANALYSIS OF UKRAINIAN TWITTER CORPUS FOR TOXIC TEXT DETECTION. in proc. of 3rd International Conference, COLINS 2019, Kharkiv, Ukraine, 2019