-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to strip trailing CR #344
Conversation
src/diff/line.js
Outdated
// remove all CR (\r) characters from the input string | ||
value = value.replace(/\r+\n/g, '\n'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment, code, and analogy to GNU diff
's --strip-trailing-cr
option have a three-way disagreement about what the behaviour here is meant to be:
--strip-trailing-cr
only removes at most ONE\r
before an\n
, and so by choosing that name we imply we've got the same behaviour- the actual code here removes a sequence of ANY number of consecutive
\r
characters before a\n
- the comment says that all
\r
characters will be removed from ANYWHERE in the string, even if they don't appear before a\n
We should tweak the behaviour to be consistent with the GNU diff option whose name we're copying, and we should make the comment accurately reflect the behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! Have pushed an update!
README.md
Outdated
@@ -37,6 +37,8 @@ npm install diff --save | |||
|
|||
Options | |||
* `ignoreWhitespace`: `true` to ignore leading and trailing whitespace. This is the same as `diffTrimmedLines` | |||
* `stripTrailingCr`: `true` to remove all trailing CR (`\r`) characters before perfoming the diff. | |||
This helps to get a useful diff when comparing files from UNIX/Windows respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
English nit: "respectively" here doesn't make sense.
Suggested wording:
This helps to get a useful diff when diffing UNIX text files against Windows text files.
I am broadly in favour of this change; it's useful and consistent with the precedent provided by |
// remove all CR (\r) characters from the input string | ||
value = value.replace(/\r+\n/g, '\n'); | ||
} | ||
|
||
let retLines = [], | ||
linesAndNewlines = value.split(/(\n|\r\n)/); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably just be value.split("\n")
after this change, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well only if stripTrailingCr===true
, to not be breaking. But even in that case, I'm not sure I understand the benefit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was getting confused by #275 (comment). After reading that issue, but not testing anything to confirm, I had wrongly believed that the \n
appearing first in the pipe-separated list of alternatives \n|\r\n/
meant that it would always match in preference to \r\n
, making the entire regex equivalent to just /(\n)/
, and that the logic using that regex had thus always been fundamentally broken. I thus thought that with my comment above I was just proposing cleaning up some misleading code that never really worked.
But some quick experimentation suggests that I'm wrong, at least in Node, Chrome, and Firefox!
> "foo\r\nbar\r\nbaz\r\n".split(/(\n|\r\n)/)
[
'foo', '\r\n',
'bar', '\r\n',
'baz', '\r\n',
''
]
I'm not sure, then, why @cctakaso thought, in #275, that reordering the \r\n
and \n
in the regex would fix anything. Is it possible there is some JavaScript environment out there, that @cctakaso was using, with a regex engine where the order of alternatives really does affect the result in the way that issue suggests? I would've thought this would be standardised and all implementations would follow the standard, but tbh I don't want to spend hours unpicking the meaning of the ECMAScript 3 standard where regexes got added to JavaScript to confirm that the behaviour here has always been unambiguously specified!
My new suggestion, then, is that I'd be in favour of reversing the order as @cctakaso suggested, to /(\r\n|\n)/
, just to make absolutely sure there's no ambiguity and avoid any need to research this - even though I'm now like 85% sure the order doesn't matter and this won't actually have any effect!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the problem #275 presents will be fixed by the option, but the solution they suggested wouldn't change much since the array that the regex produces would still contain all new lines as separate elements, and '\n' !== '\r\n'
.
I think there's just a misunderstanding and we shouldn't change the code for it.
d8d1617
to
9b1bd6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the contribution and for coming back to tweak & comment & set me straight through multiple rounds of review! :)
Simple idea/suggestion.
Fixes #343, #275
See #343 for in-depth explanation.