- KeywordAug supports Turkish
- Fix FrequencyMasking time range
- Remove unnecessary printout
- [Rollback ContextualWordEmbsForSentenceAug and AbstSummAug to use custom transformers API to reduce execution time]
- ReservedAug supports generating all combinations
- Rollback to use native HuggingFace API from Huggingface pipeline to solve slow performance issue
- Added description to explain the model of WordEmbsAug is custom class
- Change random behavior to increase more augmentation samples
- Fix SpeedAug random factor issue
- Added RandomSentAug
- Added skip_check parameter for WordEmbsAug
- OCRAug support customer mapping/ json file
- Improve slow loading word2vec issue
- Solve transformers comparability issue
- Added LambadaAug(https://arxiv.org/pdf/1911.03118.pdf) under sentencen augmenter group
- ContextualWordEmbsAug, BackTranslationAug, ContextualWordEmbsForSentenceAug and AbstSummAug support batch model generation.
- Fix performance issue when using single thread
- Fix wordnet interface change issue
- Adopt HuggingFace API for ContextualWordEmbsAug
- Change model soruce from Fairseq to HuggingFace for BackTranslationAug
- Adopt HuggingFace API for AbstSummAug
- Add multi language (DE, ES, FR, HE, IT, NL, PL and UK) support to KeyboardAug (Special thanks to Binoy Dalal)
- Fix audio augmenter's documentation error #158
- Introduced ReservedAug.
- Fix #161, #166, #167, #168, #175
- Added Spectrogram's Loudness augmenter #156
- Upgraded to use AutoModel and AutoTokeizer for ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug and AbstSummAug. Fix #133, #105
- Refactoring audio and spectrogram augmenters
- Added LoudnessAug into spectrogram augmenters
- Support single forward data input for deep learning models (i.e. ContextualWordEmbsAug, BackTranslationAug, ContextualWordEmbsForSentenceAug, AbstSummAug). #146
- Fix missing re-assing model paramters (e.g. device) for deep learning model
- BackTranslation supports to load model from local #149
- Fix device parameter bug #150
- Deprecated include_detail feature
- Update MANIFECT file to include txt resource
- Add back English mispelling dictionary
- Fix PPDB model misloaded nltk module#144
- Enhance default tokenizer and reverse tokenizer#143
- Introduce Abstractive Summarization in sentence ausgmenter (Check out example from here)
- Fix #142
- Support crop action in RandomWordAug #126
- Fix #130
- Fix #132
- Fix #134
- Upgraded and verified torch (1.6.0) and transformers (3.0.2) libraies
- Add new Back Translation Augmenter #75 #102 #131
- Remove QWERTAug example (Replaced by KeyboardAug) [#110] (makcedward#110)
- Fix [#117] (makcedward#117), [#114] (makcedward#114), [#111] (makcedward#111), #105
- Support Change Log [#116] (makcedward#117)
- Fix typo [#123] (makcedward#123)
- Support accepting candidates in RandomCharAug [#125] (makcedward#125)
- Fix spectrogram tutorial notebook [#98] (makcedward#98)
- Fix RandomWordAug missed aug_max parameter [#100] (makcedward#100)
- Fix loading KeyboardAug model problem [#101] (makcedward#101)
- Fix performance issue when sampling candidate in ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug #107
- ContextualWordEmbsAug supports bert-base-multilingual-uncased (for non English inputs)
- Fix missing library dependency #74
- Fix single token error when using RandomWordAug #76
- Fix replacing character in RandomCharAug error #77
- Enhance word's augmenter to support regular expression stopwords #81
- Enhance char's augmenter to support regular expression stopwords #86
- KeyboardAug supports Thai language #92
- Fix word casing issue #82
- Support color noise (pink, blue, red and violet noise) in audio's NoiseAug
- Support given background noise in audio's NoiseAug
- Support inject noise to portion of audio only in audio's NoiseAug
- Introduce
zone
,coverage
to all audio augmenter. Support only augmented portion of audio input - Add VTLP augmentation methods (Audio's augmenter)
- Adopt latest transformer's interface #59
- Support RoBERTa (including DistilRoBERTa) and DistilBERT (ContextualWordEmbsAug)
- Support DistilGPT2 (ContextualWordEmbsForSentenceAug)
- Fix librosa hard dependency #62
- Introduce
optimize
attribute ContextualWordEmbsForSentenceAug #63 - Optimize word selection for ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug (Speed up around 30%)
- Add retry mechanism into ContextualWordEmbsAug insert action #68
- Add aug_max to control maximum number of augmented item
- Fix ContextualWordEmbsAug (for BERT) error when input is longer than max sequence length
- Add RandomWordAug Substitute action
- Fix ContextualWordEmbsAug error when no augmented data
- Support multi thread processing (for CPU only) to speed up the augmentation
- Fix KeyboardAug error #55
- Added Swap Mode (adjacent, middle and random) for RandomAug (character level)
- Added SynonymAug (WordNet/ PPDB) and AntonymAug (WordNet)
- WordNetAug is deprecated. Uses SynonymAug instead
- Introduce parameter n. Returning more than 1 augmented data. Changing output format from text (or numpy) to list of text (or numpy) if n > 1
- Introduce parameter temperature in ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug to control the randomness
- aug_n parameter is deprecated. This parameter will be replaced by top_k parameter
- Fixed tokenization issue #48
- Upgraded transformers dependency (or pytorch_transformer) to 2.0.0
- Upgraded PyTorch dependency to 1.2.0
- Added SplitAug
- BertAug is replaced by ContextualWordEmbsAug
- Support GPU (for ContextualWordEmbsAug and ContextualWordEmbsForSentenceAug only) #26
- Upgraded pytorch_transformer to 1.1.0 version #33
- ContextualWordEmbsAug suuports both BERT and XLNet model
- Removed librosa dependency
- Add ContextualWordEmbsForSentenceAug for generating next sentence
- Fix sampling issue #38
- Add new augmenter (CropAug, LoudnessAug, MaskAug)
- QwertyAug is deprecated. It will be replaced by KeyboardAug
- Remove StopWordsAug. It will be replaced by RandomWordAug
- Code refactoring
- Added model download function for word2vec, GloVe and fasttext
- Added new augmenter TF-IDF based word replacement augmenter(TfIdfAug)
- Added new augmenter Spelling mistake simulation augmenter(SpellingAug)
- Added new augmenter Stopword Dropout augmenter(StopWordsAug)
- Fixed #14
- Added stopwords feature in character and word augmenter.
- Added character's swap augmenter.
- Added word's swap augmenter.
- Added validation rule for #1.
- Fixed BERT reverse tokenization for #2.
- Added Speed, Noise, Shift and Pitch augmenters for Audio
- Added Frequency Masking and Time Masking for Speech Recognition (Spectrogram).
- Added librosa library dependency for converting wav to spectrogram.