Skip to content

Commit

Permalink
[model_cards] Add model cards for Urduhack model (roberta-urdu-small) (
Browse files Browse the repository at this point in the history
…huggingface#6536)

* [model_cards] roberta-urdu-small added.

* [model_cards] typo fixed.

* Tweak license format (yaml expects a simple string)

Co-authored-by: Ikram Ali <mrikram1989>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
  • Loading branch information
2 people authored and fabiocapsouza committed Nov 15, 2020
1 parent 48d352d commit 5dc30d3
Showing 1 changed file with 30 additions and 0 deletions.
30 changes: 30 additions & 0 deletions model_cards/urduhack/roberta-urdu-small/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
language: ur
thumbnail: https://raw.githubusercontent.com/urduhack/urduhack/master/docs/_static/urduhack.png
tags:
- roberta-urdu-small
- urdu
- transformers
license: mit
---
## roberta-urdu-small

[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/urduhack/urduhack/blob/master/LICENSE)
### Overview
**Language model:** roberta-urdu-small
**Model size:** 125M
**Language:** Urdu
**Training data:** News data from urdu news resources in Pakistan
### About roberta-urdu-small
roberta-urdu-small is a language model for urdu language.
```
from transformers import pipeline
fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small")
```
## Training procedure
roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from
urduhack to eliminate characters from other languages like arabic.

### About Urduhack
Urduhack is a Natural Language Processing (NLP) library for urdu language.
Github: https://github.com/urduhack/urduhack

0 comments on commit 5dc30d3

Please sign in to comment.