Skip to content

Latest commit

 

History

History
22 lines (12 loc) · 1.62 KB

task.md

File metadata and controls

22 lines (12 loc) · 1.62 KB

Chameleon text: Exploring ways to increase variety in artificial data

Proposer

Date updated

22/11/2018

Description

In Natural Language Processing (NLP), training of robust neural models requires a significant amount of data, which can be created artificially. However, models to artificially generate data may end up reproducing text segments from the training text used to build the models. This can be a problem particularly in the cases when training instances contain sensitive information, such as names of patients. Increasing lexical variety of artificial text is an exciting research subject. The objective of this work will be to study two existing text generation approaches: a variational learning approach (http://aclweb.org/anthology/D18-1354) and an adversarial learning approach (http://aclweb.org/anthology/N18-1122). You will apply them to the generation of Amazon Product Reviews on Electronics (http://jmcauley.ucsd.edu/data/amazon). As the first approach is specifically designed to generate diverse text, you will investigate how to increase lexical variety using the second approach. To be more precise, you will change the reinforced training objective to address semantic similarity. The main activities in the project will be: study the approaches (http://aclweb.org/anthology/D18-1354) and (http://aclweb.org/anthology/N18-1122), re-implement the first approach (using a deep learning framework, for instance, PyTorch) and extend the second approach to enrich the variety of generated text.

Requirements

No special requirements.

Possible meeting times

Contact the supervisor directly.

Number of places

2