UnsupNTS: Unsupervised Neural Text Simplification

This is the original implementation of the Unsupervised Neural Text Simplification system and their semi-supervised variants mentioned in the ACL 2019 long paper:

Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, and Karthik Sankaranarayanan. Unsupervised Neural Text Simplification arXiv preprint arXiv:1810.07931 (2018).

Training

Download tsdata.zip from link and extract

unzip tsdata.zip

tsdata.zip has

Unsupervised sets of easy and difficult set of sentences judged on readability ease scores.
Dict2vec embeddings trained on the above unsupervised sets.
10k parallel pairs of difficult and simplified variants.
Test set and references - eight tab seperated references per each test sentence.

Train the models using

bash train.sh

train.sh has

UNTS system from unsupervised simplification data using the exact same settings described in the paper.
UNTS-10k system, using additional 10k supervised pairs of mixture of split-rephrase and simplification parallel pairs.
UNMT system on the unsupervised simplification data.
ablations on adversarial and separation/classifier losses.

For more details and additional options, run the above scripts with the --help flag. Alternatively, visit the ipynb in google colaboratory to reproduce the results. To access pretrained models visit link. The folder predictions has the generations from the pretrained models.

Note: Pretrained models were trained with pytorch 0.3.1.

Generation and Evaluation of Simplifications

bash translate.sh

translate.sh is used for

Generating simplifications of test dataset.
Computing stand alone metrics such as Flesch readability ease score difference, Tree similarity and Document similarity metrics.
Computing SARI, BLEU and Word-diff metrics.

Acknowledgements

Our code uses functions from https://github.com/artetxem/undreamt and https://github.com/senisioi/NeuralTextSimplification extensively.

If you use our system for academic research, please cite the following paper:

@inproceedings{surya-etal-2019-unsupervised,
    title = "Unsupervised Neural Text Simplification",
    author = "Surya, Sai  and
      Mishra, Abhijit  and
      Laha, Anirban  and
      Jain, Parag  and
      Sankaranarayanan, Karthik",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-1198",
    doi = "10.18653/v1/P19-1198",
    pages = "2058--2068"
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
predictions		predictions
undreamt		undreamt
utils		utils
README.md		README.md
UnsupNTS.png		UnsupNTS.png
train.sh		train.sh
translate.sh		translate.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UnsupNTS: Unsupervised Neural Text Simplification

Training

Generation and Evaluation of Simplifications

Acknowledgements

About

Releases

Packages

Languages

subramanyamdvss/UnsupNTS

Folders and files

Latest commit

History

Repository files navigation

UnsupNTS: Unsupervised Neural Text Simplification

Training

Generation and Evaluation of Simplifications

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages