Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix diffWords treating numbers and underscores as not being word characters #554

Merged
merged 4 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions release-notes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Release Notes

## Future 7.0.0 release

- [#554](https://github.com/kpdecker/jsdiff/pull/554) **`diffWords` treats numbers and underscores as word characters again.** This behaviour was broken in v6.0.0.

## 6.0.0

This is a release containing many, *many* breaking changes. The objective of this release was to carry out a mass fix, in one go, of all the open bugs and design problems that required breaking changes to fix. A substantial, but exhaustive, changelog is below.
Expand Down
2 changes: 1 addition & 1 deletion src/diff/word.js
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ import { longestCommonPrefix, longestCommonSuffix, replacePrefix, replaceSuffix,
// - U+02DC ˜ ˜ Small Tilde
// - U+02DD ˝ ˝ Double Acute Accent
// Latin Extended Additional, 1E00–1EFF
const extendedWordChars = 'a-zA-Z\\u{C0}-\\u{FF}\\u{D8}-\\u{F6}\\u{F8}-\\u{2C6}\\u{2C8}-\\u{2D7}\\u{2DE}-\\u{2FF}\\u{1E00}-\\u{1EFF}';
const extendedWordChars = 'a-zA-Z0-9_\\u{C0}-\\u{FF}\\u{D8}-\\u{F6}\\u{F8}-\\u{2C6}\\u{2C8}-\\u{2D7}\\u{2DE}-\\u{2FF}\\u{1E00}-\\u{1EFF}';
Copy link
Collaborator Author

@ExplodingCabbage ExplodingCabbage Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I am confused about the (pre-existing) ranges given here and by the comment explaining them above. I am gonna play around and maybe add more tests. But that can happen subsequent to just getting this PR merged.


// Each token is one of the following:
// - A punctuation mark plus the surrounding whitespace
Expand Down
28 changes: 28 additions & 0 deletions test/diff/word.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,34 @@ describe('WordDiff', function() {
'.'
]);
});

// Test for bug reported at https://github.com/kpdecker/jsdiff/issues/553
it('should treat numbers as part of a word if not separated by whitespace or punctuation', () => {
expect(
wordDiff.tokenize(
'Tea Too, also known as T2, had revenue of 57m AUD in 2012-13.'
)
).to.deep.equal([
'Tea ',
' Too',
', ',
' also ',
' known ',
' as ',
' T2',
', ',
' had ',
' revenue ',
' of ',
' 57m ',
' AUD ',
' in ',
' 2012',
'-',
'13',
'.'
]);
});
});

describe('#diffWords', function() {
Expand Down