Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix diffTrimmedLines not keep whitespaces in the output #219

Closed
wants to merge 1 commit into from

Conversation

Mingun
Copy link
Contributor

@Mingun Mingun commented Apr 21, 2018

This fix a bug with use of ignoreWhitespace option. Before this fix you get back array with trimmed versions of the lines in the input, that not the expected behavior

@Mingun
Copy link
Contributor Author

Mingun commented Oct 1, 2018

@kpdecker could you please look at this (and other my PRs)?

@Mingun Mingun force-pushed the diffTrim-use-payload branch from d163f66 to 1b186fd Compare January 10, 2019 19:45
@ghedwards
Copy link

I found what seems to be an easier solution , remove the trim from the tokenize and move it to the extractCommon , something like this ?

if ( this.options.ignoreWhitespace ) {
				while (newPos + 1 < newLen && oldPos + 1 < oldLen && this.equals(newString[newPos + 1].trim(), oldString[oldPos + 1].trim())) {
					newPos++;
					oldPos++;
					commonCount++;
				}
			} else {
				while (newPos + 1 < newLen && oldPos + 1 < oldLen && this.equals(newString[newPos + 1], oldString[oldPos + 1])) {
					newPos++;
					oldPos++;
					commonCount++;
				}
      }

@danowensdev
Copy link

@kpdecker Did you take a look at this?

@ExplodingCabbage
Copy link
Collaborator

This fix a bug with use of ignoreWhitespace option. Before this fix you get back array with trimmed versions of the lines in the input, that not the expected behavior

Hmmm... isn't it? After reading the docs on ignoreWhitespace and diffTrimmedLines for the first time just now, I was honestly pretty unsure what to expect:

  1. diffTrimmedLines trims leading and trailing whitespace from each line, then runs diffLines, returning a diff as if none of the lines in the texts being differed ever had leading or trailing whitespace
  2. When two lines differ only by whitespace, diffTrimmedLines treats them as equal and returns a diff showing the line from the old string being preserved (as if the line in the new string had always had the old string's leading and trailing whitespace)
  3. When two lines differ only by whitespace, diffTrimmedLines treats them as equal and returns a diff showing the line from the new string being preserved (as if the line in the old string had always had the new string's leading and trailing whitespace)
  4. When two lines differ only by whitespace, diffTrimmedLines returns some kind of special component in the result indicating a line was preserved but with whitespace-only changes

From some quick experimentation, I gather that what's actually happening under the hood is option 1, but you were intuitively expecting option 3. Maybe that's reasonable but it's really nonobvious to me that your expectation will be universal, and this is a breaking change, which makes me against merging it. (Also, mightn't some people expect option 2 - i.e. that the value properties they see in the result will reflect the whitespace the line had in the old text, rather than in the new text?)

The existing API and names IMO add confusion here that we can never escape from without a backwards compatibility break. To me, the name diffTrimmedLines suggests behaviour 1 above, but the name ignoreWhitespace suggests behaviour 2, 3, or 4, yet in fact they're the same thing. (Why do we have two ways to invoke the same behaviour anyway?)

One way to give the people expecting behaviour 3 what they want without breaking backwards compatibility would be to have diff components returned by diffTrimmedLines contain value, oldValue, and newValue properties. Here's a mocked up example of what I'm imagining the output being:

> old = 'There was an old man named Michael Finnegan    \n' +
... 'He had whiskers on his chinnegan    \n' +
... 'The wind came along and blew them in again    \n' +
... 'Poor old Michael Finnegan    \n'
> nyoo = '    There was an old man named Michael Finnegan\n' +
... '    He went fishing with a pin again\n' +
... '    Caught a fish but it flopped back in again\n' +
... '    Poor old Michael Finnegan\n'
> diff.diffTrimmedLines(old, nyoo)
[
  {
    count: 1,
    oldValue: 'There was an old man named Michael Finnegan    \n',
    value: 'There was an old man named Michael Finnegan\n',
    newValue: '    There was an old man named Michael Finnegan\n'
  },
  {
    count: 2,
    added: undefined,
    removed: true,
    oldValue: 'He had whiskers on his chinnegan    \n' +
      'The wind came along and blew them in again    \n',
    value: 'He had whiskers on his chinnegan\n' +
      'The wind came along and blew them in again\n'
  },
  {
    count: 2,
    added: true,
    removed: undefined,
    value: 'He went fishing with a pin again\n' +
      'Caught a fish but it flopped back in again\n',
    newValue: '    He went fishing with a pin again\n' +
      '    Caught a fish but it flopped back in again\n'
  },
  {
    count: 1,
    oldValue: 'Poor old Michael Finnegan    \n',
    value: 'Poor old Michael Finnegan\n',
    newValue: '    Poor old Michael Finnegan\n'
  }
]

Ugh, but... I notice that jsdiff already has an ignoreCase option that you'd expect to behave analogously to ignoreWhitespace, but ignoreCase behaves in the way this PR suggests implementing - i.e. it puts the value from the new text into the value property:

> diff.diffChars("hello world", "Hello, World!", {ignoreCase: true})
[
  { count: 5, value: 'Hello' },
  { count: 1, added: true, removed: undefined, value: ',' },
  { count: 6, value: ' World' },
  { count: 1, added: true, removed: undefined, value: '!' }
]

So my proposed compability-preserving compromise idea has the downside of leaving the behaviours of ignoreCase and ignoreWhitespace inconsistent with each other, which this PR resolves.

A kinda horrible situation. I'll think about this some more and come back to it.

@ExplodingCabbage
Copy link
Collaborator

I've just realised that the ignoreWhitespace option of diffWords also puts the whitespace from the new text into value, unlike the ignoreWhitespace option of diffLines - so the behaviour isn't even consistent between the two, currently! An argument in favour of this PR...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants