Fine-tune a variety of pre-trained Transformer-based models to solve Vietnamese Reliable Intelligent Identification (ReINTEL) problem in VLSP2020 shared task.
In this project, we utilize the effectiveness of the different pre-trained language models such as vELECTRA, vBERT, PhoBERT, Bert Multilingual Cased, XLM-RoBERTa to identify reliable information shared on social network sites.
We evaluate the different input length models, it includes 256, 512, and multiple 512 (long document)
To reproduce the experiment of our model, please install the requirements.txt according to the following instructions:
- huggingface transformer
- emoji
- vncorenlp
- nltk
- pytorch
- python3
pip install -r requirements.txt
The dataset is provided by VLSP2020 Organizers. Please access this site for more information.
Hieu Tran - heraclex12@gmail.com
Project Link: https://github.com/heraclex12/VLSP2020-Fake-News-Detection
@misc{tran2020leveraging,
title={Leveraging Transfer Learning for Reliable Intelligence Identification on Vietnamese SNSs (ReINTEL)},
author={Trung-Hieu Tran and Long Phan and Truong-Son Nguyen},
year={2020},
eprint={2012.07557},
archivePrefix={arXiv},
primaryClass={cs.CL}
}