Skip to content

Newlines are not properly compared when using diffWords #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sdickson opened this issue Sep 22, 2016 · 2 comments
Closed

Newlines are not properly compared when using diffWords #140

sdickson opened this issue Sep 22, 2016 · 2 comments
Labels

Comments

@sdickson
Copy link

sdickson commented Sep 22, 2016

Passing the following 2 strings (in order) into the diffWords function results in some strange results. Some newlines that are added in the second string come back in the array with no "added" indication. This was producing some very strange behavior for our users where newlines would appear 1 word early in the string when we would try to discard all added strings (rather than not appearing at all which is how I would expect it to behave).

Please excuse the bizarre strings - you do what you have to do to stay sane when trying to fix bugs :)

Lastly when testing this please replace the ↵ characters with actual new lines. Probably obvious, but clarifying can't hurt too much.

Board games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard↵↵games are better than sportsBoard↵↵games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard games are better than sports
Board games are better than sports↵  ↵Board games are better than sportsBoard games are better than sportsBoard games are better than sports↵  ↵Board games are better than sports↵  ↵Board games are better than sports↵  ↵Board games are better than sportsBoard games are better than sportsBoard games are better than sports↵  ↵It's the truth.↵  ↵Except sports board games, those aren't that great.↵  ↵More and more and more pushes↵  ↵4 times is too many
@ExplodingCabbage
Copy link
Collaborator

Yeah, this certainly looks like a bug to me:

> old = `Board games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard
... 
... games are better than sportsBoard
... 
... games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard games are better than sports
... `
'Board games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard\n' +
  '\n' +
  'games are better than sportsBoard\n' +
  '\n' +
  'games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard games are better than sports\n'
> nyoo = `Board games are better than sports
...   
... Board games are better than sportsBoard games are better than sportsBoard games are better than sports
...   
... Board games are better than sports
...   
... Board games are better than sports
...   
... Board games are better than sportsBoard games are better than sportsBoard games are better than sports
...   
... It's the truth.
...   
... Except sports board games, those aren't that great.
...   
... More and more and more pushes
...   
... 4 times is too many`
'Board games are better than sports\n' +
  '  \n' +
  'Board games are better than sportsBoard games are better than sportsBoard games are better than sports\n' +
  '  \n' +
  'Board games are better than sports\n' +
  '  \n' +
  'Board games are better than sports\n' +
  '  \n' +
  'Board games are better than sportsBoard games are better than sportsBoard games are better than sports\n' +
  '  \n' +
  "It's the truth.\n" +
  '  \n' +
  "Except sports board games, those aren't that great.\n" +
  '  \n' +
  'More and more and more pushes\n' +
  '  \n' +
  '4 times is too many'
> 
> diff.diffWords(old, nyoo)
[
  { count: 10, value: 'Board games are better than ' },
  { count: 1, added: undefined, removed: true, value: 'sportsBoard' },
  { count: 1, added: true, removed: undefined, value: 'sports' },
  { count: 1, value: '\n' },
  { count: 4, added: true, removed: undefined, value: '  \nBoard ' },
  {
    count: 28,
    value: 'games are better than sportsBoard games are better than sportsBoard games are better than '
  },
  { count: 1, added: undefined, removed: true, value: 'sportsBoard' },
  { count: 1, added: true, removed: undefined, value: 'sports' },
  { count: 2, value: '\n  ' },
  { count: 3, added: true, removed: undefined, value: '\nBoard ' },
  { count: 8, value: 'games are better than ' },
  { count: 1, added: undefined, removed: true, value: 'sportsBoard' },
  { count: 1, added: true, removed: undefined, value: 'sports' },
  { count: 2, value: '\n  ' },
  { count: 3, added: true, removed: undefined, value: '\nBoard ' },
  { count: 8, value: 'games are better than ' },
  { count: 1, added: undefined, removed: true, value: 'sportsBoard' },
  { count: 1, added: true, removed: undefined, value: 'sports' },
  { count: 1, value: '\n' },
  { count: 4, added: true, removed: undefined, value: '  \nBoard ' },
  {
    count: 30,
    value: 'games are better than sportsBoard games are better than sportsBoard games are better than sports\n'
  },
  {
    count: 58,
    added: true,
    removed: undefined,
    value: '  \n' +
      "It's the truth.\n" +
      '  \n' +
      "Except sports board games, those aren't that great.\n" +
      '  \n' +
      'More and more and more pushes\n' +
      '  \n' +
      '4 times is too many'
  }
]

The fourth change object - { count: 1, value: '\n' } - corresponds to a newline being preserved from the first text, but comes waaaay before we've reached the first newline in the first text.

I'll dig into it tomorrow and see if I can make sense of what's causing this (and whether it has implications beyond diffWords, for e.g. diffChars or diffArrays).

@ExplodingCabbage
Copy link
Collaborator

I'm dumb - this isn't a bug, but rather an intentional feature of diffWords (for better or worse!). Per the README...

Diff.diffWords(oldStr, newStr[, options]) - diffs two blocks of text, comparing word by word, ignoring whitespace.

Diff.diffWordsWithSpace(oldStr, newStr[, options]) - diffs two blocks of text, comparing word by word, treating whitespace as significant.

There might be better ways of showing the result when we've "ignored" a whitespace difference (see my prior commentary at #219 (comment)), but the current output is intentional.

diffWordsWithSpace doesn't exhibit the "bug" and correctly reports the first newline as added, not preserved:

> txt1 = "Board games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard↵↵games are better than sportsBoard↵↵games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard games are better than sports".replace('↵', '\n')
'Board games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard\n' +
  '↵games are better than sportsBoard↵↵games are better than sportsBoard games are better than sportsBoard games are better than sportsBoard games are better than sports'
> txt2 = "Board games are better than sports↵  ↵Board games are better than sportsBoard games are better than sportsBoard games are better than sports↵  ↵Board games are better than sports↵  ↵Board games are better than sports↵  ↵Board games are better than sportsBoard games are better than sportsBoard games are better than sports↵  ↵It's the truth.↵  ↵Except sports board games, those aren't that great.↵  ↵More and more and more pushes↵  ↵4 times is too many".replace('↵', '\n')
'Board games are better than sports\n' +
  "  ↵Board games are better than sportsBoard games are better than sportsBoard games are better than sports↵  ↵Board games are better than sports↵  ↵Board games are better than sports↵  ↵Board games are better than sportsBoard games are better than sportsBoard games are better than sports↵  ↵It's the truth.↵  ↵Except sports board games, those aren't that great.↵  ↵More and more and more pushes↵  ↵4 times is too many"
> diff.diffWordsWithSpace(txt1, txt2)
[
  { count: 10, value: 'Board games are better than ' },
  { count: 1, added: undefined, removed: true, value: 'sportsBoard' },
  {
    count: 5,
    added: true,
    removed: undefined,
    value: 'sports\n  ↵Board'
  },
  {
    count: 29,
    value: ' games are better than sportsBoard games are better than sportsBoard games are better than '
  },
  { count: 2, added: undefined, removed: true, value: 'sportsBoard\n' },
  { count: 1, added: true, removed: undefined, value: 'sports' },
  { count: 1, value: '↵' },
  { count: 4, added: true, removed: undefined, value: '  ↵Board ' },
  { count: 8, value: 'games are better than ' },
  { count: 2, added: undefined, removed: true, value: 'sportsBoard↵↵' },
  {
    count: 6,
    added: true,
    removed: undefined,
    value: 'sports↵  ↵Board '
  },
  { count: 8, value: 'games are better than ' },
  { count: 1, added: undefined, removed: true, value: 'sportsBoard' },
  {
    count: 5,
    added: true,
    removed: undefined,
    value: 'sports↵  ↵Board'
  },
  {
    count: 30,
    value: ' games are better than sportsBoard games are better than sportsBoard games are better than sports'
  },
  {
    count: 57,
    added: true,
    removed: undefined,
    value: "↵  ↵It's the truth.↵  ↵Except sports board games, those aren't that great.↵  ↵More and more and more pushes↵  ↵4 times is too many"
  }
]

@ExplodingCabbage ExplodingCabbage closed this as not planned Won't fix, can't repro, duplicate, stale Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants