Skip to content

Latest commit

 

History

History
executable file
·
189 lines (156 loc) · 11.9 KB

CHANGE.md

File metadata and controls

executable file
·
189 lines (156 loc) · 11.9 KB

NLPAUG Change Log

1.1.10 Dec 23, 2021

1.1.9 Dec 1, 2021

1.1.8, Oct 18, 2021

1.1.7, Jul 20, 2021

1.1.6, Jul 16, 2021

1.1.5, Jul 15, 2021

  • Added LambadaAug(https://arxiv.org/pdf/1911.03118.pdf) under sentencen augmenter group
  • ContextualWordEmbsAug, BackTranslationAug, ContextualWordEmbsForSentenceAug and AbstSummAug support batch model generation.

1.1.4, Jun 20, 2021

1.1.3, Mar 7, 2021

  • Add multi language (DE, ES, FR, HE, IT, NL, PL and UK) support to KeyboardAug (Special thanks to Binoy Dalal)

1.1.2, Jan 4, 2021

  • Add NormalizeAug (audio) and PolarityInverseAug (audio)
  • Fix #191, #192, #194, Fix #196

1.1.1, Dec 12, 2020

1.1.0, Nov 13, 2020

1.0.1 Sep 25, 2020

  • Added Spectrogram's Loudness augmenter #156

1.0.0 Sep 24, 2020

  • Upgraded to use AutoModel and AutoTokeizer for ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug and AbstSummAug. Fix #133, #105
  • Refactoring audio and spectrogram augmenters
  • Added LoudnessAug into spectrogram augmenters
  • Support single forward data input for deep learning models (i.e. ContextualWordEmbsAug, BackTranslationAug, ContextualWordEmbsForSentenceAug, AbstSummAug). #146
  • Fix missing re-assing model paramters (e.g. device) for deep learning model
  • BackTranslation supports to load model from local #149
  • Fix device parameter bug #150
  • Deprecated include_detail feature

0.0.20 Aug 22, 2020

  • Update MANIFECT file to include txt resource

0.0.19 Aug 22, 2020

  • Add back English mispelling dictionary

0.0.18 Aug 21, 2020

  • Fix PPDB model misloaded nltk module#144

0.0.17 Aug 20, 2020

  • Enhance default tokenizer and reverse tokenizer#143
  • Introduce Abstractive Summarization in sentence ausgmenter (Check out example from here)

0.0.16 Aug 10, 2020

0.0.15 Aug 10, 2020

  • Support crop action in RandomWordAug #126
  • Fix #130
  • Fix #132
  • Fix #134
  • Upgraded and verified torch (1.6.0) and transformers (3.0.2) libraies
  • Add new Back Translation Augmenter #75 #102 #131

0.0.14 Apr 24, 2020

0.0.13 Feb 25, 2020

  • Fix spectrogram tutorial notebook [#98] (makcedward#98)
  • Fix RandomWordAug missed aug_max parameter [#100] (makcedward#100)
  • Fix loading KeyboardAug model problem [#101] (makcedward#101)
  • Fix performance issue when sampling candidate in ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug #107

0.0.12 Feb 5, 2020

  • ContextualWordEmbsAug supports bert-base-multilingual-uncased (for non English inputs)
  • Fix missing library dependency #74
  • Fix single token error when using RandomWordAug #76
  • Fix replacing character in RandomCharAug error #77
  • Enhance word's augmenter to support regular expression stopwords #81
  • Enhance char's augmenter to support regular expression stopwords #86
  • KeyboardAug supports Thai language #92
  • Fix word casing issue #82

0.0.11 Dec 6, 2019

  • Support color noise (pink, blue, red and violet noise) in audio's NoiseAug
  • Support given background noise in audio's NoiseAug
  • Support inject noise to portion of audio only in audio's NoiseAug
  • Introduce zone, coverage to all audio augmenter. Support only augmented portion of audio input
  • Add VTLP augmentation methods (Audio's augmenter)
  • Adopt latest transformer's interface #59
  • Support RoBERTa (including DistilRoBERTa) and DistilBERT (ContextualWordEmbsAug)
  • Support DistilGPT2 (ContextualWordEmbsForSentenceAug)
  • Fix librosa hard dependency #62
  • Introduce optimize attribute ContextualWordEmbsForSentenceAug #63
  • Optimize word selection for ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug (Speed up around 30%)
  • Add retry mechanism into ContextualWordEmbsAug insert action #68

0.0.10 Nov, 2019

  • Add aug_max to control maximum number of augmented item
  • Fix ContextualWordEmbsAug (for BERT) error when input is longer than max sequence length
  • Add RandomWordAug Substitute action
  • Fix ContextualWordEmbsAug error when no augmented data
  • Support multi thread processing (for CPU only) to speed up the augmentation
  • Fix KeyboardAug error #55

0.0.9### Sep 30, 2019

  • Added Swap Mode (adjacent, middle and random) for RandomAug (character level)
  • Added SynonymAug (WordNet/ PPDB) and AntonymAug (WordNet)
  • WordNetAug is deprecated. Uses SynonymAug instead
  • Introduce parameter n. Returning more than 1 augmented data. Changing output format from text (or numpy) to list of text (or numpy) if n > 1
  • Introduce parameter temperature in ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug to control the randomness
  • aug_n parameter is deprecated. This parameter will be replaced by top_k parameter
  • Fixed tokenization issue #48
  • Upgraded transformers dependency (or pytorch_transformer) to 2.0.0
  • Upgraded PyTorch dependency to 1.2.0
  • Added SplitAug

0.0.8### Sep 4, 2019

  • BertAug is replaced by ContextualWordEmbsAug
  • Support GPU (for ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug only) #26
  • Upgraded pytorch_transformer to 1.1.0 version #33
  • ContextualWordEmbsAug suuports both BERT and XLNet model
  • Removed librosa dependency
  • Add ContextualWordEmbsForSentenceAug for generating next sentence
  • Fix sampling issue #38

0.0.7### Aug 21, 2019

  • Add new augmenter (CropAug, LoudnessAug, MaskAug)
  • QwertyAug is deprecated. It will be replaced by KeyboardAug
  • Remove StopWordsAug. It will be replaced by RandomWordAug
  • Code refactoring
  • Added model download function for word2vec, GloVe and fasttext

0.0.6### Jul 29, 2019:

0.0.5### Jul 2, 2019:

0.0.4### Jun 7, 2019:

  • Added stopwords feature in character and word augmenter.
  • Added character's swap augmenter.
  • Added word's swap augmenter.
  • Added validation rule for #1.
  • Fixed BERT reverse tokenization for #2.

0.0.3### May 23, 2019:

  • Added Speed, Noise, Shift and Pitch augmenters for Audio

0.0.2### Apr 30, 2019:

  • Added Frequency Masking and Time Masking for Speech Recognition (Spectrogram).
  • Added librosa library dependency for converting wav to spectrogram.

0.0.1### Mar 20, 2019: Project initialization