Fix diffWords treating numbers and underscores as not being word characters #554

ExplodingCabbage · 2024-09-05T15:22:26Z

Fixes #553 (the original issue, not the orthogonal bug posted in a comment).

The bug was introduced in #494. Previously we were splitting into words on \b, which per the linked MDN docs considers the following things to be word characters:

Letters (A–Z, a–z), numbers (0–9), and underscore (_).

Then I swapped to using the extendedWordChars regex (previously only used for stitching together some adjacent tokens into single tokens after the original splitting), but didn't ensure it actually contained all the things that \b considers to be word characters.

This should fix it!

ExplodingCabbage · 2024-09-05T15:27:09Z

src/diff/word.js

@@ -19,7 +19,7 @@ import { longestCommonPrefix, longestCommonSuffix, replacePrefix, replaceSuffix,
 //  - U+02DC  ˜ &#732;  Small Tilde
 //  - U+02DD  ˝ &#733;  Double Acute Accent
 // Latin Extended Additional, 1E00–1EFF
-const extendedWordChars = 'a-zA-Z\\u{C0}-\\u{FF}\\u{D8}-\\u{F6}\\u{F8}-\\u{2C6}\\u{2C8}-\\u{2D7}\\u{2DE}-\\u{2FF}\\u{1E00}-\\u{1EFF}';
+const extendedWordChars = 'a-zA-Z0-9_\\u{C0}-\\u{FF}\\u{D8}-\\u{F6}\\u{F8}-\\u{2C6}\\u{2C8}-\\u{2D7}\\u{2DE}-\\u{2FF}\\u{1E00}-\\u{1EFF}';


TBH I am confused about the (pre-existing) ranges given here and by the comment explaining them above. I am gonna play around and maybe add more tests. But that can happen subsequent to just getting this PR merged.

ExplodingCabbage added 4 commits September 5, 2024 16:14

Add test for bug #553

bed093c

Fix bug

764d8bc

Merge remote-tracking branch 'origin/master' into fix-diffWords

bcd54dc

Add release notes

bb261d4

ExplodingCabbage commented Sep 5, 2024

View reviewed changes

ExplodingCabbage mentioned this pull request Sep 5, 2024

Combinations of numbers and letters no longer considered a single token in v6 #553

Closed

ExplodingCabbage merged commit 7d113b6 into master Sep 5, 2024

ExplodingCabbage deleted the fix-diffWords branch September 5, 2024 16:31

kuttyhub mentioned this pull request Dec 7, 2024

Synced Diff package version DefinitelyTyped/DefinitelyTyped#71366

Merged

8 tasks

kibanamachine mentioned this pull request Jan 6, 2025

[8.x] Update @elastic/kibana-data-discovery dependencies (main) (#202622) elastic/kibana#205647

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix diffWords treating numbers and underscores as not being word characters #554

Fix diffWords treating numbers and underscores as not being word characters #554

Uh oh!

ExplodingCabbage commented Sep 5, 2024

Uh oh!

ExplodingCabbage Sep 5, 2024 •

edited

Loading

Uh oh!

Uh oh!

Fix diffWords treating numbers and underscores as not being word characters #554

Fix diffWords treating numbers and underscores as not being word characters #554

Uh oh!

Conversation

ExplodingCabbage commented Sep 5, 2024

Uh oh!

ExplodingCabbage Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ExplodingCabbage Sep 5, 2024 •

edited

Loading