-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combinations of numbers and letters no longer considered a single token in v6 #553
Comments
I believe this also breaks emojis in react-native TextInput (or in general). const changes = diffChars(originalText, changedText) as CharactersDiffChange[]; will result in downgrading to 5.2.0 everything is fine. |
It wasn't deliberate, and I agree with you. Damn, I was really hoping I could pull off that 6.0.0 release without introducing any new bugs. :( |
@efstathiosntonas what you describe (with |
@ExplodingCabbage will create an expo repro! |
Looks like the first bug here - the one reported by @greysteil - was introduced by #494. Looks like I carelessly assumed that the pre-existing I'll try to get a new release out today with both these bugs fixed. |
You're the best @ExplodingCabbage ❤️ |
@ExplodingCabbage do you still need a repro for |
Would be appreciated - even just posting what two strings to diff against each other to see a bug! Otherwise will try to figure it out myself. |
I think the fastest would be to read or implement it on an expo project: https://gist.github.com/efstathiosntonas/01142fd9243573d649caddd952f81829 Just try to enter emojis on the textinput from the native keyboard of the device/simulator, they will turn out as ?? instead of the emojis. |
Original issue fixed by #554. I'll have a play around with your Gist and try to figure out the
By contrast, v5.2.0 gets the
So it doesn't appear to me at first glance like I have broken anything emoji-related and I am rather hoping I am going to find that whatever the problem is is not in jsdiff but in your code... but I will see if I can repro now and then get to work on establishing whether that's true. |
Okay, I just got round to glancing at the Gist, @efstathiosntonas. Didn't try to get it running since I haven't used React Native for eons and don't remember how, but did take a quick look at the Replacing all use of the Note the change in Unicode handling in
and the README says:
If after doing some more debugging you can give me a snippet of code that just invokes jsdiff and gets a result you think is wrong, let me know, but I'm gonna assume |
thank you Mark for taking the time to look into it. Will try your suggestions tomorrow and I’ll update the issue! |
Interesting - worth noting that even Array.from('👩👩👦👦').length
// 7 For constructs like that you need const seg = new Intl.Segmenter('en', { granularity: "grapheme" });
Array.from(seg.segment('👩👩👦👦')).length
// 1 See e.g. here for more info. |
@ehoogeveen-medweb, yep! Possibly relevant reading - I've got a (now somewhat-outdated, as it was written before It would be absolutely reasonable for a user of the library to shun |
Okay, I'll probably actually release the fix for the original issue here tomorrow. Before I release, I want to try to rewrite the (I don't understand the UCD data very well and basically none of Unicode's data releases I've ever looked at have been well-documented, but https://www.unicode.org/Public/15.0.0/ucd/PropList.txt gives me a list of characters with the |
@ExplodingCabbage can confirm, converting all |
Okay, trying to generate a new
... but I end up with 761 different character ranges for word characters. Definitely not at all consistent with the existing regex and I cannot realistically audit all the differences. Since I've fixed the specific issue raised here, I'm gonna leave the pre-existing regex alone and just ship the fix. |
Makes sense! Using unicode categories might make sense as a future change, but just the tweak to the regex fixes my problem. Thanks for your work on this @ExplodingCabbage. |
useful for me, just be replace the length of value with Segmenter's length , and then I get the correct traditional position. but still need to use value.length to cut the string. |
First up, thanks for all your work on jsdiff, and for the v6 release.
We have a bunch of tests around diffs, and I noticed an unexpected change:
On v5:
On v6:
Not sure if this is a deliberate change or not? I think where words and numbers are combined it's probably best to treat them as a single Word.
The text was updated successfully, but these errors were encountered: