Release Release 0.14.0 · flairNLP/flair

This release adds major new support for biomedical text analytics! It adds improved biomedical NER and a state-of-the-art model for biomedical entity linking. Other new features include (1) support for parameter-efficient fine-tuning and (2) various new datasets, bug fixes and enhancements! We also removed a few dependencies, so Flair should install faster and take up less space!

Biomedical NER and Entity Linking

With Flair 0.14.0, you can now detect and normalize biomedical entities in text.

For example, to analyze the sentence "We correlate genetic variants in IFNAR2 and POLG with long-COVID syndrome", use this code snippet:

from flair.models import EntityMentionLinker
from flair.nn import Classifier
from flair.data import Sentence

# A sentence from biomedical literature
sentence = Sentence("We correlate genetic variants in IFNAR2 and POLG with long-COVID syndrome.")

# Tag named entities in the text
ner_tagger = Classifier.load("hunflair2")
ner_tagger.predict(sentence)

# Normalize disease names
disease_linker = EntityMentionLinker.load("gene-linker")
disease_linker.predict(sentence)

# Normalize gene names
gene_linker = EntityMentionLinker.load("disease-linker")
gene_linker.predict(sentence)

# Iterate over predicted entities and print
for label in sentence.get_labels():
    print(label)

This should print out:

Span[5:6]: "IFNAR2" → Gene (1.0)
Span[5:6]: "IFNAR2" → 3455/name=IFNAR2 

Span[7:8]: "POLG" → Gene (1.0)
Span[7:8]: "POLG" → 5428/name=POLG 

Span[9:11]: "long-COVID syndrome" → Disease (1.0)
Span[9:11]: "long-COVID syndrome" → MESH:D000094024/name=Post-Acute COVID-19 Syndrome

The printout shows that:

"IFNAR2" is a gene. Further, it is recognized as gene 3455 ("interferon alpha and beta receptor subunit 2") in the NCBI database.
"POLG" is a gene. Further, it is recognized as gene 5428 ("DNA polymerase gamma, catalytic subunit") in the NCBI database.
"long-COVID syndrome" is a disease. Further, it is uniquely linked to "Post-Acute COVID-19 Syndrome" in the MESH database.

Big thanks to @sg-wbi @WangXII @mariosaenger @helpmefindaname for all their work:

Entity Mention Linker by @helpmefindaname in #3388
Support for biomedical datasets with multiple entity types by @WangXII in #3387
Update documentation for Hunflair2 release by @mariosaenger in #3410
Improve nel tutorial by @helpmefindaname in #3369
Incorporate hunflair2 docs to docpage by @helpmefindaname in #3442

Parameter-Efficient Fine-Tuning

Flair 0.14.0 also adds support for PEFT.

For instance, to fine-tune a BERT model on the TREC question classification task using LoRA, use the following snippet:

from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer

# Note: you need to install peft to use this feature!
from peft import LoraConfig, TaskType

# Get corpus and make label dictionary
corpus: Corpus = TREC_6()
label_type = "question_class"
label_dict = corpus.make_label_dictionary(label_type=label_type)

# Define embeddings with LoRA fine-tuning
document_embeddings = TransformerDocumentEmbeddings(
    "bert-base-uncased",
    fine_tune=True,
    # set LoRA config
    peft_config=LoraConfig(
        task_type=TaskType.FEATURE_EXTRACTION,
        inference_mode=False,
    ),
)

# define model
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type=label_type)

# train model
trainer = ModelTrainer(classifier, corpus)
trainer.fine_tune(
    "resources/taggers/question-classification-with-transformer",
    learning_rate=5.0e-4,
    mini_batch_size=4,
    max_epochs=1,
)

Big thanks to @janpf for this new feature!

Add PEFT training and explicit kwarg passthrough by @janpf in #3480

Smaller Library

We've removed dependencies such as gensim from the core package, since they increased the size of the Flair library and caused some compatibility/maintenance issues. This means the core package is now smaller and fast to install.

Install as always with:

pip install flair

For certain features, you still need gensim, such as training a model that uses classic word embeddings. For this use case, install with:

pip install flair[word-embeddings]

Or just install gensim separately.

Big thanks to @helpmefindaname for this new feature!

Make gensim optional by @helpmefindaname in #3493
Update models for v0.14.0 by @alanakbik in #3505
Relax version constraint for konoha by @himkt in #3394
Dependencies maintainance updates by @helpmefindaname in #3402
Make janome optional by @himkt in #3405
Bump min. version of bpemb by @stefan-it in #3468

Other Improvements

New Features and Improvements

Speed up euclidean distance calculation by @sheldon-roberts in #3485
Add DataTriples which act just like DataPairs by @janpf in #3481
Add random seed parameter to dataset splitting and downsampling for better reproducibility by @MattGPT-ai in #3475
Allow cpu device even if gpu available by @drbh in #3417
Add prediction label type for span classifier by @helpmefindaname in #3432
Character embeddings store their embedding name too by @helpmefindaname in #3477

Bug Fixes

TextPairRegressor: Fix data point iteration by @ya0guang in #3413
TextPairRegressor: Fix GPU memory leak by @MattGPT-ai in #3490
TextRegressor: Fix label_name bug by @sheldon-roberts in #3491
SequenceTagger: Fix _all_scores_for_token in ViterbiDecoder by @mauryaland in #3455
SentenceSplitter: Fix linking of sentences by @mariosaenger in #3397
SentenceSplitter: Fix case where split was performed on special characters by @helpmefindaname in #3404
Classifier: Fix loading by moving error message to main load function by @alanakbik in #3504
Trainer: Fix edge case by loading best model at end, even when there is no final evaluation by @helpmefindaname in #3470
TransformerEmbeddings: Fix special tokens by not replacing replace_additional_special_tokens by @helpmefindaname in #3451
Unit tests: Fix double data_folder in unit test by @ya0guang in #3412

New Datasets

Add revision support for all Universal Dependencies datasets by @stefan-it in #3420
NER_ESTONIAN_NOISY: Support for Estonian NER dataset with noise by @teresaloeffelhardt in #3463
MASAKHA_POS: Support for two new languages by @stefan-it in #3421
UD_BAVARIAN_MAIBAAM: Add support for new Bavarian MaiBaam UD by @stefan-it in #3426

Documentation

Minor readme fixes by @stefan-it in #3424
Fix typo transformer-embeddings.md by @abhisheklomsh in #3500
Fix typo in how-model-training-works.md by @abhisheklomsh in #3499

Build Management

Fix black and ruff by @stefan-it in #3423
Remove zappr yaml by @helpmefindaname in #3435
Fix tests package being incorrectly included in builds by @asumagic in #3440

New Contributors

@ya0guang made their first contribution in #3413
@drbh made their first contribution in #3417
@asumagic made their first contribution in #3440
@MattGPT-ai made their first contribution in #3475
@janpf made their first contribution in #3481
@sheldon-roberts made their first contribution in #3485
@abhisheklomsh made their first contribution in #3500
@teresaloeffelhardt made their first contribution in #3463

Full Changelog: v0.13.1...v0.14.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.14.0