Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create README.md #8682

Merged
merged 2 commits into from
Nov 23, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions model_cards/NlpHUST/vibert4news-base-cased/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# BERT for Vietnamese is trained on more 20 GB news dataset
julien-c marked this conversation as resolved.
Show resolved Hide resolved

Apply for task sentiment analysis on using [AIViVN's comments dataset](https://www.aivivn.com/contests/6)

The model achieved 0.90268 on the public leaderboard, (winner's score is 0.90087)
Bert4news is used for a toolkit Vietnames(segmentation and Named Entity Recognition) at ViNLPtoolkit(https://github.com/bino282/ViNLP)

***************New Mar 11 , 2020 ***************

**[BERT](https://github.com/google-research/bert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).

We use word sentencepiece, use basic bert tokenization and same config with bert base with lowercase = False.

You can download trained model:
- [tensorflow](https://drive.google.com/file/d/1X-sRDYf7moS_h61J3L79NkMVGHP-P-k5/view?usp=sharing).
- [pytorch](https://drive.google.com/file/d/11aFSTpYIurn-oI2XpAmcCTccB_AonMOu/view?usp=sharing).



Run training with base config

``` bash

python train_pytorch.py \
--model_path=bert4news.pytorch \
--max_len=200 \
--batch_size=16 \
--epochs=6 \
--lr=2e-5

```

### Contact information
For personal communication related to this project, please contact Nha Nguyen Van (nha282@gmail.com).