Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added telugu support for language identifying model #78

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nitkannen
Copy link

In response to the issue: #57
This addition has been performed. (Issue #57) in inltk repo.

  1. New Telugu data generated and appended to existing 13 Language data
  2. Tokenizer retrained to support all 13 + 1 (Telugu) languages
  3. Language identification model re-trained with the appended data and the new tokenizer
  4. Dropbox links modified in config.py
  5. Added code used to perform the above tasks in the inltk repo as folder: inltk/add language support for assisting future work to extend to language support

@goru001
Copy link
Owner

goru001 commented Oct 2, 2021

Thanks @nitkannen for your contribution. I had couple of comments:

  1. It'll be great if you can remove the training code (both model and tokenizer) and push it to either this repo or your own repo with readme containing all the details regarding train-test dataset, training procedure, results and links to download models. You'll only need to change the dropbox link here in iNLTK repo.
  2. It'll be great if you can share a script or notebook which shows your added functionality is working as expected. You can take cues from Testing section in this PR

Again, thanks for your contribution and great work. Apologies for the delayed response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants