Kushagra Singh, Indira Sen, Ponnurangam Kumaraguru
ACL 2018, SRW
Link to paper
Repository contains
(i) Seq2seq based transliterator (Roman to Devanagri)
(ii) Language identification tool for Hindi-English code switched text (English, Hindi, Rest)
(iii) CRF based Named Entity Recogntion tool for Hindi-English code switched text (Person, Location, Organisation)
Check http://precog.iiitd.edu.in/resources.html for the annotated corpus.
-
Install dependencies using requirements.txt file in a virtualenv.
-
Check the README in transliteration dir and follow instructions to set up.
-
Export the following env variables before running demo files
export TRANSLITERATION_DIR={{path_to_parent_dir}}/hindi-english-code-mixing-lidf-ner/transliteration
export HINGLISH_ROOT_DIR={{path_to_parent_dir}}/hindi-english-code-mixing-lidf-ner