-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Fix deberta tokenizer bug caused by bug in normalizer #117189
Conversation
…fesets to be negative
Pinging @elastic/ml-core (Team:ML) |
Hi @maxhniebergall, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -194,7 +194,7 @@ Reader normalize(CharSequence str) { | |||
if (charDelta < 0) { | |||
// normalised form is shorter | |||
int lastDiff = getLastCumulativeDiff(); | |||
addOffCorrectMap(normalizedCharPos, lastDiff + charDelta); | |||
addOffCorrectMap(normalizedCharPos, lastDiff - charDelta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subtle, nice find! Can we add any tests here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I added a test which I confirmed fails prior to this fix, and works with this fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 Backport failed
You can use sqren/backport to manually backport by running |
…17189) * Fix deberta tokenizer bug caused by bug in normalizer which caused offesets to be negative * Update docs/changelog/117189.yaml
…17189) * Fix deberta tokenizer bug caused by bug in normalizer which caused offesets to be negative * Update docs/changelog/117189.yaml
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
... which caused offsets to be negative, and caused exceptions for some very rare input combinations.