-
Notifications
You must be signed in to change notification settings - Fork 349
Fix deberta overflow error #990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an overflow error that occurs when calculating BERTScore metrics with certain tokenizers, specifically addressing an issue where the microsoft/deberta-large-mnli
model's tokenizer has a misconfigured max_model_length
value of 1e30.
- Added a new
tokenizer_max_len
parameter to the BERTScorer class to allow overriding the tokenizer's maximum model length - Implemented validation logic to detect the problematic 1e30 value and default to 512 when not explicitly overridden
- Applied the fix to the existing BERTScore usage by setting
tokenizer_max_len=512
for the deberta model
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/lighteval/metrics/imports/bert_scorer.py | Added tokenizer validation function and new parameter to BERTScorer class |
src/lighteval/metrics/metrics_sample.py | Applied the tokenizer length override fix to existing BERTScore usage |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
if self._model is None: | ||
logger.info(f"Loading BERTScorer model `{self._model_type}`") | ||
self._tokenizer = AutoTokenizer.from_pretrained(self._model_type) | ||
self._tokenizer.max_model_length = validate_tokenizer_length( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be model_max_length
as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great ! can you actually run the command and check for correct tokenizer length ?
Tested with the following script: from lighteval.metrics.imports.bert_scorer import BERTScorer
SCORE_THRESHOLD = 0.5
CANDIDATE = ["This is an example text."]
REFERENCE = ["This text contains an example sentence."]
print("####### Default BERTScorer model with length override #######")
scorer = BERTScorer(
model_type="microsoft/deberta-large-mnli",
lang="en",
rescale_with_baseline=False,
num_layers=9,
tokenizer_max_len=512,
device="cpu"
)
scores = scorer.score(cands=CANDIDATE, refs=REFERENCE)
assert all(i.item() > 0.5 for i in scores)
print("####### Test passed! #######")
print("####### Default BERTScorer model without length override #######")
scorer = BERTScorer(
model_type="microsoft/deberta-large-mnli",
lang="en",
rescale_with_baseline=False,
num_layers=9,
device="cpu"
)
scores = scorer.score(cands=CANDIDATE, refs=REFERENCE)
assert all(i.item() > 0.5 for i in scores)
print("####### Test passed! #######")
print("####### BERTScorer model with correct tokenizer config and without override #######")
scorer = BERTScorer(
model_type="FacebookAI/roberta-large",
lang="en",
rescale_with_baseline=False,
num_layers=9,
device="cpu"
)
scores = scorer.score(cands=CANDIDATE, refs=REFERENCE)
assert all(i.item() > 0.5 for i in scores)
print("####### Test passed! #######") I haven't run the unit test, but it looks like |
Description
Running
lighteval vllm "model_name=meta-llama/Llama-3.2-3B-Instruct" "helm|summarization:xsum|0" --max-samples=5
encountersOverflowError: int too big to convert
when attempting to calculate BERTScore metrics.This appears to be an issue with the tokenizer configuration file (https://huggingface.co/microsoft/deberta-large-mnli/discussions/1), and so the tokenizer's
max_model_length
attribute defaults to a value of 1e30 (huggingface/transformers#14561).Changes
__init__
method of the BERTScore class, which allows the user to override the tokenizer'smax_model_length
attribute.validate_tokenizer_length()
, the tokenizer'smax_model_length
will be set to the overriding value if set. If an override value is not set, it will check if the model length is the misconfigured value of 1e30 and, if so, default to 512 with a warning to the user. Otherwise, the original length is used.deberta-large-mnli
, which is the default BERTScore model.