Fix deberta overflow error #990

amstu2 · 2025-09-24T07:14:28Z

Description

Running lighteval vllm "model_name=meta-llama/Llama-3.2-3B-Instruct" "helm|summarization:xsum|0" --max-samples=5 encounters OverflowError: int too big to convert when attempting to calculate BERTScore metrics.

This appears to be an issue with the tokenizer configuration file (https://huggingface.co/microsoft/deberta-large-mnli/discussions/1), and so the tokenizer's max_model_length attribute defaults to a value of 1e30 (huggingface/transformers#14561).

Changes

I've added an extra optional argument to the __init__ method of the BERTScore class, which allows the user to override the tokenizer's max_model_length attribute.
In the new function validate_tokenizer_length(), the tokenizer's max_model_length will be set to the overriding value if set. If an override value is not set, it will check if the model length is the misconfigured value of 1e30 and, if so, default to 512 with a warning to the user. Otherwise, the original length is used.
Added override value of 512 for deberta-large-mnli, which is the default BERTScore model.

HuggingFaceDocBuilderDev · 2025-09-24T10:34:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copilot

Pull Request Overview

This PR fixes an overflow error that occurs when calculating BERTScore metrics with certain tokenizers, specifically addressing an issue where the microsoft/deberta-large-mnli model's tokenizer has a misconfigured max_model_length value of 1e30.

Added a new tokenizer_max_len parameter to the BERTScorer class to allow overriding the tokenizer's maximum model length
Implemented validation logic to detect the problematic 1e30 value and default to 512 when not explicitly overridden
Applied the fix to the existing BERTScore usage by setting tokenizer_max_len=512 for the deberta model

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
src/lighteval/metrics/imports/bert_scorer.py	Added tokenizer validation function and new parameter to BERTScorer class
src/lighteval/metrics/metrics_sample.py	Applied the tokenizer length override fix to existing BERTScore usage

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/lighteval/metrics/imports/bert_scorer.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

NathanHB · 2025-09-24T11:41:48Z

src/lighteval/metrics/imports/bert_scorer.py

        if self._model is None:
            logger.info(f"Loading BERTScorer model `{self._model_type}`")
            self._tokenizer = AutoTokenizer.from_pretrained(self._model_type)
+            self._tokenizer.max_model_length = validate_tokenizer_length(


shouldn't this be model_max_length as well ?

NathanHB

great ! can you actually run the command and check for correct tokenizer length ?

amstu2 · 2025-09-27T01:06:44Z

Tested with the following script:

from lighteval.metrics.imports.bert_scorer import BERTScorer

SCORE_THRESHOLD = 0.5
CANDIDATE = ["This is an example text."]
REFERENCE = ["This text contains an example sentence."]

print("####### Default BERTScorer model with length override #######")
scorer = BERTScorer(
    model_type="microsoft/deberta-large-mnli",
    lang="en",
    rescale_with_baseline=False,
    num_layers=9,
    tokenizer_max_len=512,
    device="cpu"
)

scores = scorer.score(cands=CANDIDATE, refs=REFERENCE)

assert all(i.item() > 0.5 for i in scores)
print("####### Test passed! #######")

print("####### Default BERTScorer model without length override #######")
scorer = BERTScorer(
    model_type="microsoft/deberta-large-mnli",
    lang="en",
    rescale_with_baseline=False,
    num_layers=9,
    device="cpu"
)

scores = scorer.score(cands=CANDIDATE, refs=REFERENCE)

assert all(i.item() > 0.5 for i in scores)
print("####### Test passed! #######")

print("####### BERTScorer model with correct tokenizer config and without override #######")

scorer = BERTScorer(
    model_type="FacebookAI/roberta-large",
    lang="en",
    rescale_with_baseline=False,
    num_layers=9,
    device="cpu"
)

scores = scorer.score(cands=CANDIDATE, refs=REFERENCE)

assert all(i.item() > 0.5 for i in scores)
print("####### Test passed! #######")

I haven't run the unit test, but it looks like bert_score could probably be removed from the list of skipped metrics in testing.

Fix deberta overflow error

b43bfa8

NathanHB requested a review from Copilot September 24, 2025 11:35

Copilot AI reviewed Sep 24, 2025

View reviewed changes

src/lighteval/metrics/imports/bert_scorer.py Outdated Show resolved Hide resolved

NathanHB approved these changes Sep 24, 2025

View reviewed changes

Apply suggestion from @Copilot

be11df1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

NathanHB reviewed Sep 24, 2025

View reviewed changes

NathanHB requested changes Sep 24, 2025

View reviewed changes

NathanHB added the bug label Sep 25, 2025

FIx BERTscore incorrect tokenizer attribute

6c60a3b

amstu2 requested a review from NathanHB September 27, 2025 01:10

Merge branch 'main' into fix-bertscore-len

6d48212

amstu2 mentioned this pull request Sep 27, 2025

[BUG] Handle BERTScore baseline files #995

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix deberta overflow error #990

Fix deberta overflow error #990

amstu2 commented Sep 24, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

NathanHB Sep 24, 2025

Uh oh!

NathanHB left a comment

Uh oh!

amstu2 commented Sep 27, 2025

Uh oh!

Uh oh!

Fix deberta overflow error #990

Are you sure you want to change the base?

Fix deberta overflow error #990

Conversation

amstu2 commented Sep 24, 2025

Description

Changes

Uh oh!

HuggingFaceDocBuilderDev commented Sep 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

NathanHB Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

NathanHB left a comment

Choose a reason for hiding this comment

Uh oh!

amstu2 commented Sep 27, 2025

Uh oh!

Uh oh!