ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization
Our paper has been accepted as a long paper in the EMNLP-2021 main conference and you can find the preprint version here.
Abstractive text summarization is one of the areas influenced by the emergence of pre-trained language models. Current pre-training works in abstractive summarization give more points to the summary with more words in common with the main text and pay less attention to the semantic similarity between generated sentences and the original document. We propose ARMAN, a Transformer-based encoder-decoder model pre-trained with three novel objectives to address this issue. In ARMAN, salient sentences from a document are selected according to a modified semantic score to be masked and form a pseudo summary. To summarize more accurately and similar to human writing patterns, we applied modified sentence reordering in the best setting. We evaluated our proposed models on six downstream Persian summarization tasks. Experimental results show that our proposed model achieves state-of-the-art performance on all six summarization tasks measured by ROUGE and BERTScore. Our models also outperform prior works in Textual Entailment, Question Paraphrasing, and Multiple Choice Question Answering. Finally, we established a human evaluation and show that using the semantic score significantly improves summarization results.
Our model ARMAN(MSR) got state-of-the-art results in 5 out of 6 Persian abstractive summarization datasets using ROUGE metric, and 6 out of 6 using BERTScore.
In the following table, the results are reported using ROUGE-1/ROUGE-2/ROUGE-L
metrics.
Dataset | ARMAN(MSR) | ARMAN(SS-100) | ARMAN(SH) | ARMAN(SS-80) | PEGASUS |
---|---|---|---|---|---|
PN-Summary | 46.19/28.41/40.27 | 46.33/28.57/40.38 | 45.89/28.03/39.89 | 45.98/28.2/40.09 | 45.67/27.81/39.71 |
Wiki-Summary | 32.48/11.86/24.08 | 32.36/11.78/24.1 | 32.04/11.78/23.83 | 32.27/11.72/23.91 | 31.98/11.63/23.79 |
VOA | 48.23/29.52/44.27 | 47.73/28.95/43.89 | 46.96/27.88/42.93 | 47.91/28.9/43.75 | 47.55/28.68/43.57 |
Perkey(summary) | 63.59/52.87/60.3 | 62.83/51.92/59.53 | 63.47/52.71/60.16 | 62.97/52.11/59.64 | 62.82/51.96/59.48 |
Perkey(title) | 54.81/40.17/52.51 | 54.25/39.51/51.92 | 54.5/39.9/52.19 | 54.18/39.39/51.84 | 53.99/39.3/51.72 |
Tebyan | 37.79/21.85/31.98 | 37.64/21.78/31.94 | 37.6/21.77/31.82 | 37.53/21.73/31.77 | 37.2/21.23/31.47 |
In the following table, the results are reported using Precision-BERTScore/Recall-BERTScore/F1-BERTScore
metrics.
Dataset | ARMAN(MSR) | ARMAN(SH) | ARMAN(SS-80) | PEGASUS |
---|---|---|---|---|
PN-Summary | 80.14/79.84/79.93 | 79.95/79.69/79.76 | 80.08/79.74/79.85 | 79.86/79.67/79.7 |
Wiki-Summary | 74.67/71.55/72.95 | 74.25/71.43/72.68 | 74.24/71.48/72.71 | 74.29/71.31/72.64 |
VOA | 81.1/81.35/81.16 | 80.64/80.91/80.71 | 81.02/81.13/81 | 80.84/81.13/80.92 |
Perkey(summary) | 86.54/86.24/86.33 | 86.46/86.22/86.29 | 86.27/86.01/86.09 | 86.13/86.01/86.01 |
Perkey(title) | 83.93/83.59/83.71 | 83.85/83.49/83.62 | 83.65/83.36/83.46 | 83.68/83.31/83.45 |
Tebyan | 75.49/75.46/75.4 | 75.48/75.28/75.29 | 75.48/75.32/75.32 | 75.26/75.17/75.14 |
Furthermore, we fine-tuned our models on the ParsiNLU dataset, and the results showed that ARMAN models could be used as Language model too! Our models get state-of-the-art results in 3 out of 4 tasks (virtually on natural part of the dataset). The results are reported in the following table. The results for other models are available in ParsiNLU paper (bold results are the results that were better than other reported models with at most 400M parameters, our models have around 220M).
Task | Textual Entailment | Question Paraphrasing | Sentiment | Multiple-Choice Question Answering |
---|---|---|---|---|
Model | natural - translated | natural - translated | food - movie | literature - common knowledge - math & logic |
ARMAN(SS-80) | 54.5 - 50.6 | 82.5 - 74.8 | 51.4 - 47 | 37.7 - 25.7 - 47.7 |
ARMAN(SS-100) | 54.2 - 53 | 79.9 - 72.8 | 50 - 52.9 | 41.4 - 27.4 - 43.1 |
ARMAN(SH) | 55.5 - 52.9 | 82.6 - 75.1 | 56.7 - 42 | 34.6 - 28.6 - 45.4 |
ARMAN(MSR) | 54.8 - 51.8 | 79.9 - 75.9 | 52 - 46 | 36.57 - 21.7 - 49.14 |
PEGASUS | 54.5 - 52.6 | 80 - 76.1 | 51.9 - 56 | 40 - 27.7 - 45.1 |
Other important results about the ability of models for performing summarization in low resource scenarios are reported in our paper. Briefly, our model needs around 1K data points and 2K training steps to perform well on most summarization tasks.
This table contains pre-trained models that we trained.
model | pre-trained | vocab |
---|---|---|
ARMAN(SS-80) | download | download |
ARMAN(SS-100) | download | download |
ARMAN(SH) | download | download |
ARMAN(MSR) | download | download |
PEGASUS | download | download |
This table contains fine-tuned models that we fine-tuned on summarization tasks.
model | Perkey(summary) | Perkey(title) | Tebyan | Wiki Summary | VOA headlines | PN Summary | Vocab |
---|---|---|---|---|---|---|---|
ARMAN(SS-80) | download | download | download | download | download | download | download |
ARMAN(SS-100) | download | download | download | download | download | download | download |
ARMAN(SH) | download | download | download | download | download | download | download |
ARMAN(MSR) | download | download | download | download | download | download | download |
PEGASUS | download | download | download | download | download | download | download |
TRANSFORMER | download | download | download | download | download | download | download |
mT5 | download | download | download | download | download | download | download |
This table contains fine-tuned models that we trained on NLU tasks.
model | Entailment | Question Paraphrasing | Multiple Choice | Sentiment (Food) | Sentiment (Movie) | vocab |
---|---|---|---|---|---|---|
ARMAN(SS-80) | download | download | download | download | download | download |
ARMAN(SS-100) | download | download | download | download | download | download |
ARMAN(SH) | download | download | download | download | download | download |
ARMAN(MSR) | download | download | download | download | download | download |
PEGASUS | download | download | download | download | download | download |
The Tebyan cultural institute, which is affiliated to the organization "Sazman-e Tablighat-e Eslami", is one of the biggest and best known cultural institutes in Iran, and has cooperated with other cultural institutes in different fields for supporting cultural festivals and broadcasting their activities in the media. The activities of the institute's take place not only in Tehran, but also in the provincial centers, and 1,600,000 users visit its website each day. The Iranian deputy minister supported its activities for sport and youth on the website tebyan.net in Tehran.
We created the dataset by crawling the Tebyan website pages. Then we split it into train/test/validation
sets. The dataset is publicly available for research purposes.
train | validation | test |
---|---|---|
78445 | 6922 | 6922 |
download | download | download |
The codes and guidelines on how to pre-train or fine-tune the model are available in the pretraining
and models
folder.
We have converted our TF1 models into PyTorch models using the Huggingface library. You can find them here. It should be noted that the reported results in our paper were produced using TF1 models, so we can not guarantee that you will get the same results using converted models.
If you use this code, please consider citing our paper:
@misc{salemi2021arman,
title={ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization},
author={Alireza Salemi and Emad Kebriaei and Ghazal Neisi Minaei and Azadeh Shakery},
year={2021},
eprint={2109.04098},
archivePrefix={arXiv},
primaryClass={cs.CL}
}