NLPAUG Change Log

1.1.10 Dec 23, 2021

KeywordAug supports Turkish
Fix FrequencyMasking time range
Remove unnecessary printout
[Rollback ContextualWordEmbsForSentenceAug and AbstSummAug to use custom transformers API to reduce execution time]

1.1.9 Dec 1, 2021

1.1.8, Oct 18, 2021

Added RandomSentAug
Added skip_check parameter for WordEmbsAug
OCRAug support customer mapping/ json file
Improve slow loading word2vec issue
Solve transformers comparability issue

1.1.7, Jul 20, 2021

Fixed mising document bug

1.1.6, Jul 16, 2021

Fixed the missing library dependency issue

1.1.5, Jul 15, 2021

Added LambadaAug(https://arxiv.org/pdf/1911.03118.pdf) under sentencen augmenter group
ContextualWordEmbsAug, BackTranslationAug, ContextualWordEmbsForSentenceAug and AbstSummAug support batch model generation.

1.1.4, Jun 20, 2021

Fix performance issue when using single thread
Fix wordnet interface change issue
Adopt HuggingFace API for ContextualWordEmbsAug
Change model soruce from Fairseq to HuggingFace for BackTranslationAug
Adopt HuggingFace API for AbstSummAug

1.1.3, Mar 7, 2021

Add multi language (DE, ES, FR, HE, IT, NL, PL and UK) support to KeyboardAug (Special thanks to Binoy Dalal)

1.1.2, Jan 4, 2021

Add NormalizeAug (audio) and PolarityInverseAug (audio)
Fix #191, #192, #194, Fix #196

1.1.1, Dec 12, 2020

Fix #182, #184, #185, #187

1.1.0, Nov 13, 2020

Fix audio augmenter's documentation error #158
Introduced ReservedAug.
Fix #161, #166, #167, #168, #175

1.0.1 Sep 25, 2020

Added Spectrogram's Loudness augmenter #156

1.0.0 Sep 24, 2020

Upgraded to use AutoModel and AutoTokeizer for ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug and AbstSummAug. Fix #133, #105
Refactoring audio and spectrogram augmenters
Added LoudnessAug into spectrogram augmenters
Support single forward data input for deep learning models (i.e. ContextualWordEmbsAug, BackTranslationAug, ContextualWordEmbsForSentenceAug, AbstSummAug). #146
Fix missing re-assing model paramters (e.g. device) for deep learning model
BackTranslation supports to load model from local #149
Fix device parameter bug #150
Deprecated include_detail feature

0.0.20 Aug 22, 2020

Update MANIFECT file to include txt resource

0.0.19 Aug 22, 2020

Add back English mispelling dictionary

0.0.18 Aug 21, 2020

Fix PPDB model misloaded nltk module#144

0.0.17 Aug 20, 2020

Enhance default tokenizer and reverse tokenizer#143
Introduce Abstractive Summarization in sentence ausgmenter (Check out example from here)

0.0.16 Aug 10, 2020

Fix #142

0.0.15 Aug 10, 2020

Support crop action in RandomWordAug #126
Fix #130
Fix #132
Fix #134
Upgraded and verified torch (1.6.0) and transformers (3.0.2) libraies
Add new Back Translation Augmenter #75 #102 #131

0.0.14 Apr 24, 2020

Remove QWERTAug example (Replaced by KeyboardAug) [#110] (makcedward#110)
Fix [#117] (makcedward#117), [#114] (makcedward#114), [#111] (makcedward#111), #105
Support Change Log [#116] (makcedward#117)
Fix typo [#123] (makcedward#123)
Support accepting candidates in RandomCharAug [#125] (makcedward#125)

0.0.13 Feb 25, 2020

Fix spectrogram tutorial notebook [#98] (makcedward#98)
Fix RandomWordAug missed aug_max parameter [#100] (makcedward#100)
Fix loading KeyboardAug model problem [#101] (makcedward#101)
Fix performance issue when sampling candidate in ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug #107

0.0.12 Feb 5, 2020

ContextualWordEmbsAug supports bert-base-multilingual-uncased (for non English inputs)
Fix missing library dependency #74
Fix single token error when using RandomWordAug #76
Fix replacing character in RandomCharAug error #77
Enhance word's augmenter to support regular expression stopwords #81
Enhance char's augmenter to support regular expression stopwords #86
KeyboardAug supports Thai language #92
Fix word casing issue #82

0.0.11 Dec 6, 2019

Support color noise (pink, blue, red and violet noise) in audio's NoiseAug
Support given background noise in audio's NoiseAug
Support inject noise to portion of audio only in audio's NoiseAug
Introduce zone, coverage to all audio augmenter. Support only augmented portion of audio input
Add VTLP augmentation methods (Audio's augmenter)
Adopt latest transformer's interface #59
Support RoBERTa (including DistilRoBERTa) and DistilBERT (ContextualWordEmbsAug)
Support DistilGPT2 (ContextualWordEmbsForSentenceAug)
Fix librosa hard dependency #62
Introduce optimize attribute ContextualWordEmbsForSentenceAug #63
Optimize word selection for ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug (Speed up around 30%)
Add retry mechanism into ContextualWordEmbsAug insert action #68

0.0.10 Nov, 2019

Add aug_max to control maximum number of augmented item
Fix ContextualWordEmbsAug (for BERT) error when input is longer than max sequence length
Add RandomWordAug Substitute action
Fix ContextualWordEmbsAug error when no augmented data
Support multi thread processing (for CPU only) to speed up the augmentation
Fix KeyboardAug error #55

0.0.9### Sep 30, 2019

Added Swap Mode (adjacent, middle and random) for RandomAug (character level)
Added SynonymAug (WordNet/ PPDB) and AntonymAug (WordNet)
WordNetAug is deprecated. Uses SynonymAug instead
Introduce parameter n. Returning more than 1 augmented data. Changing output format from text (or numpy) to list of text (or numpy) if n > 1
Introduce parameter temperature in ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug to control the randomness
aug_n parameter is deprecated. This parameter will be replaced by top_k parameter
Fixed tokenization issue #48
Upgraded transformers dependency (or pytorch_transformer) to 2.0.0
Upgraded PyTorch dependency to 1.2.0
Added SplitAug

0.0.8### Sep 4, 2019

BertAug is replaced by ContextualWordEmbsAug
Support GPU (for ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug only) #26
Upgraded pytorch_transformer to 1.1.0 version #33
ContextualWordEmbsAug suuports both BERT and XLNet model
Removed librosa dependency
Add ContextualWordEmbsForSentenceAug for generating next sentence
Fix sampling issue #38

0.0.7### Aug 21, 2019

Add new augmenter (CropAug, LoudnessAug, MaskAug)
QwertyAug is deprecated. It will be replaced by KeyboardAug
Remove StopWordsAug. It will be replaced by RandomWordAug
Code refactoring
Added model download function for word2vec, GloVe and fasttext

0.0.6### Jul 29, 2019:

Added new augmenter TF-IDF based word replacement augmenter(TfIdfAug)
Added new augmenter Spelling mistake simulation augmenter(SpellingAug)
Added new augmenter Stopword Dropout augmenter(StopWordsAug)
Fixed #14

0.0.5### Jul 2, 2019:

Fixed #3, #4, #5, #7, #10

0.0.4### Jun 7, 2019:

Added stopwords feature in character and word augmenter.
Added character's swap augmenter.
Added word's swap augmenter.
Added validation rule for #1.
Fixed BERT reverse tokenization for #2.

0.0.3### May 23, 2019:

Added Speed, Noise, Shift and Pitch augmenters for Audio

0.0.2### Apr 30, 2019:

Added Frequency Masking and Time Masking for Speech Recognition (Spectrogram).
Added librosa library dependency for converting wav to spectrogram.

0.0.1### Mar 20, 2019: Project initialization