Skip to content

Initial release

Latest
Compare
Choose a tag to compare
@davidberenstein1957 davidberenstein1957 released this 10 May 15:45
· 4 commits to main since this release
c75a779

Introduction

Welcome to Adept Augmentations, which can be used for creating additional data in Few Shot Named Entity Recognition (NER) settings!

Adept Augmentation is a Python package that provides data augmentation functionalities for NER training data using the spacy and datasets packages. Currently, we support one augmentor EntitySwapAugmenter, however, we plan on adding some more.

EntitySwapAugmenter takes either a datasets.Dataset or a spacy.tokens.DocBin. Additionally, it is optional to provide a set of labels. It initially creates a knowledge base of entities belonging to a certain label. When running augmenter.augment() for N runs, it then creates N new sentences with random swaps of the original entities with an entity of the same corresponding label from the knowledge base.

For example, assuming that we have knowledge base for PERSONS, LOCATIONS and PRODUCTS. We can then create additional data for the sentence "Momofuko Ando created instant noodles in Osaka." using augmenter.augment(N=2), resulting in "David created instant noodles in Madrid." or "Tom created Adept Augmentations in the Netherlands".

Adept Augmentation works for NER labels using the IOB, IOB2, BIOES and BILUO tagging schemes, as well as labels not following any tagging scheme.

Changes

  • Introduced the EntitySwapAugmenter
  • IOB, IOB2, BIOES and BILUO tagging schemes, as well as labels not following any tagging scheme.