Skip to content

Commit

Permalink
Merge pull request #35 from goru001/add_english_to_inltk
Browse files Browse the repository at this point in the history
add en docs
  • Loading branch information
goru001 authored Jan 17, 2020
2 parents d42dfa2 + 0747b7e commit f066b6b
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 2 deletions.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Checkout detailed docs along with Installation instructions
| Bengali | bn |
| Tamil | ta |
| Urdu | ur |

| English | en |

#### Repositories containing models used in iNLTK
| Language | Repository | Dataset used for Language modeling | Perplexity of ULMFiT LM | Perplexity of TransformerXL LM | Dataset used for Classification | Classification Accuracy | Classification Kappa score | ULMFiT Embeddings visualization | TransformerXL Embeddings visualization |
Expand All @@ -45,6 +45,9 @@ Checkout detailed docs along with Installation instructions
| Bengali | [NLP for Bengali](https://github.com/goru001/nlp-for-bengali) | [Bengali Wikipedia Articles](https://www.kaggle.com/disisbig/bengali-wikipedia-articles) | 41.2 | 39.3 | [Bengali News Dataset](https://www.kaggle.com/disisbig/bengali-news-dataset) | 93.8 | 92 | [Bengali Embeddings projection](https://projector.tensorflow.org/?config=https://raw.githubusercontent.com/goru001/nlp-for-bengali/master/language-model/embedding_projector_config.json) | [Bengali Embeddings projection](https://projector.tensorflow.org/?config=https://raw.githubusercontent.com/goru001/nlp-for-bengali/master/language-model/embedding_projector_transformer_config.json) |
| Tamil | [NLP for Tamil](https://github.com/goru001/nlp-for-tamil) | [Tamil Wikipedia Articles](https://www.kaggle.com/disisbig/tamil-wikipedia-articles) | 19.80 | 17.22 | [Tamil News Dataset](https://www.kaggle.com/disisbig/tamil-news-dataset) | 96.78 | 95.09 | [Tamil Embeddings projection](https://projector.tensorflow.org/?config=https://raw.githubusercontent.com/goru001/nlp-for-tamil/master/language-model/embedding_projector_config.json) | [Tamil Embeddings projection](https://projector.tensorflow.org/?config=https://raw.githubusercontent.com/goru001/nlp-for-tamil/master/language-model/embedding_projector_transformer_config.json) |
| Urdu | [NLP for Urdu](https://github.com/anuragshas/nlp-for-urdu) | [Urdu Wikipedia Articles](https://www.kaggle.com/disisbig/urdu-wikipedia-articles) | 13.19 | 12.55 | [Urdu News Dataset](https://www.kaggle.com/disisbig/urdu-news-dataset) | 95.28 | 91.58 | [Urdu Embeddings projection](https://projector.tensorflow.org/?config=https://raw.githubusercontent.com/anuragshas/nlp-for-urdu/master/language-model/embedding_projector_config.json) | [Urdu Embeddings projection](https://projector.tensorflow.org/?config=https://raw.githubusercontent.com/anuragshas/nlp-for-urdu/master/language-model/embedding_projector_transformer_config.json) |

Note: English model has been directly taken from [fast.ai](https://github.com/fastai/fastai)

### Contributing

##### Add a new language support
Expand Down
3 changes: 3 additions & 0 deletions docs/api_docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ The first command above will install pytorch for cpu, which, as the name suggest
| Bengali | bn |
| Tamil | ta |
| Urdu | ur |
| English | en |

### API

Expand Down Expand Up @@ -230,5 +231,7 @@ Example:
| Tamil | [NLP for Tamil](https://github.com/goru001/nlp-for-tamil) | [Tamil Wikipedia Articles](https://www.kaggle.com/disisbig/tamil-wikipedia-articles) | 19.80 | 17.22 | [Tamil News Dataset](https://www.kaggle.com/disisbig/tamil-news-dataset) | 96.78 | 95.09 | [Tamil Embeddings projection](https://projector.tensorflow.org/?config=https://raw.githubusercontent.com/goru001/nlp-for-tamil/master/language-model/embedding_projector_config.json) | [Tamil Embeddings projection](https://projector.tensorflow.org/?config=https://raw.githubusercontent.com/goru001/nlp-for-tamil/master/language-model/embedding_projector_transformer_config.json) |
| Urdu | [NLP for Urdu](https://github.com/anuragshas/nlp-for-urdu) | [Urdu Wikipedia Articles](https://www.kaggle.com/disisbig/urdu-wikipedia-articles) | 13.19 | 12.55 | [Urdu News Dataset](https://www.kaggle.com/disisbig/urdu-news-dataset) | 95.28 | 91.58 | [Urdu Embeddings projection](https://projector.tensorflow.org/?config=https://raw.githubusercontent.com/anuragshas/nlp-for-urdu/master/language-model/embedding_projector_config.json) | [Urdu Embeddings projection](https://projector.tensorflow.org/?config=https://raw.githubusercontent.com/anuragshas/nlp-for-urdu/master/language-model/embedding_projector_transformer_config.json) |
Note: English model has been directly taken from [fast.ai](https://github.com/fastai/fastai)
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

setuptools.setup(
name="inltk",
version="0.7.5",
version="0.8",
author="Gaurav",
author_email="contactgauravforwork@gmail.com",
description="Natural Language Toolkit for Indian Languages (iNLTK)",
Expand Down

0 comments on commit f066b6b

Please sign in to comment.