Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 721 Bytes

README.md

File metadata and controls

20 lines (15 loc) · 721 Bytes

Multilingual cyber abuse detection using advanced transformer architecture (presented at IEEE TENCON 2019)

This repo presents the source code for training and pre-processing code-mixed text used in our paper:

Aditya Malte, Pratik Ratadiya, "Multilingual cyber abuse detection using advanced transformer architecture", IEEE TENCON 2019

Dataset:

TRAC-1 code-mixed dataset for detection of cyber abuse

Model:

BERT(Base/Large/Multi), XLNet, various hyperparameters

Preprocessing:

demojization, transliteration, normalization and so on.

Results:

  1. State-of-the-art performance on Hindi dataset
  2. Excellent performance (top-5) on English dataset

Note:

Colaboratory Notebooks to be added soon