Skip to content

A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.

Notifications You must be signed in to change notification settings

kateryna-bobrovnyk/ukr-twi-corpus

Repository files navigation

ukr-twi-corpus

A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.

There are 4 files:

  • corpus.tar.xz - ready to use corpus of 1,854,993 Ukrainian Twitter texts with .csv extention.
  • Corpus-Downloading.ipynb - Jupyter Notebook file with instructions for Downloading.
  • Corpus-Filtering.ipynb - Jupyter Notebook file with instructions for Filtering.
  • twitter_scraper.py - Python script for Downloading (modified version of Kenneth Reitz's scraper - https://github.com/kennethreitz/twitter-scraper)

Reference:

Bobrovnyk K. (2019) AUTOMATED BUILDING AND ANALYSIS OF UKRAINIAN TWITTER CORPUS FOR TOXIC TEXT DETECTION. in proc. of 3rd International Conference, COLINS 2019, Kharkiv, Ukraine, 2019

Releases

No releases published

Packages

No packages published