- Unsupervised Data Augmentation
- Unsupervised Question Answering by Cloze Translation
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
- How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?
- It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations
- A Visual Survey of Data Augmentation in NLP
- Task-independent data augmentation for NLP
- Robust, Unbiased Natural Language Processing pdf
- General
- random insertion, deletion, word, sentence shuffling
- Replacing words with synonyms
- Replace the words from dicitionary of the same label
- Perturbations (letter, word, or sentence level)
- Language model
- Back translation
- Round-trip translation
- Leverage External Data
- Using external data derived from Wikipedia. linking wikipedia articles to arbitrary input text. The idea is that if the input text were on Wikipedia, it would have links to other Wikipedia articles (that are semantically related and provide additional info).
- break the input text into n-grams
- check whether each n-gram exists as a wikipedia article to create a set of ‘candidate links’
- prune the candidate links by computing the similarity of the input text and the abstract of each candidate
- Using external data derived from Wikipedia. linking wikipedia articles to arbitrary input text. The idea is that if the input text were on Wikipedia, it would have links to other Wikipedia articles (that are semantically related and provide additional info).
- Conversational Systems
- Reading Comprehension