added telugu support for language identifying model #78

nitkannen · 2021-08-31T10:51:18Z

In response to the issue: #57
This addition has been performed. (Issue #57) in inltk repo.

New Telugu data generated and appended to existing 13 Language data
Tokenizer retrained to support all 13 + 1 (Telugu) languages
Language identification model re-trained with the appended data and the new tokenizer
Dropbox links modified in config.py
Added code used to perform the above tasks in the inltk repo as folder: inltk/add language support for assisting future work to extend to language support

goru001 · 2021-10-02T07:31:59Z

Thanks @nitkannen for your contribution. I had couple of comments:

It'll be great if you can remove the training code (both model and tokenizer) and push it to either this repo or your own repo with readme containing all the details regarding train-test dataset, training procedure, results and links to download models. You'll only need to change the dropbox link here in iNLTK repo.
It'll be great if you can share a script or notebook which shows your added functionality is working as expected. You can take cues from Testing section in this PR

Again, thanks for your contribution and great work. Apologies for the delayed response.

added telugu support for identifying model

eef1b24

Provide feedback