Skip to content

Prefer to order deletions before insertions when the edit cost is the same, like the Myers diff paper does #439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
[Commits](https://github.com/kpdecker/jsdiff/compare/master...v6.0.0-staging)

- [#435](https://github.com/kpdecker/jsdiff/pull/435) Fix `parsePatch` handling of control characters. `parsePatch` used to interpret various unusual control characters - namely vertical tabs, form feeds, lone carriage returns without a line feed, and EBCDIC NELs - as line breaks when parsing a patch file. This was inconsistent with the behavior of both JsDiff's own `diffLines` method and also the Unix `diff` and `patch` utils, which all simply treat those control characters as ordinary characters. The result of this discrepancy was that some well-formed patches - produced either by `diff` or by JsDiff itself and handled properly by the `patch` util - would be wrongly parsed by `parsePatch`, with the effect that it would disregard the remainder of a hunk after encountering one of these control characters.
- [#439](https://github.com/kpdecker/jsdiff/pull/439) Prefer diffs that order deletions before insertions. When faced with a choice between two diffs with an equal total edit distance, the Myers diff algorithm generally prefers one that does deletions before insertions rather than insertions before deletions. For instance, when diffing `abcd` against `acbd`, it will prefer a diff that says to delete the `b` and then insert a new `b` after the `c`, over a diff that says to insert a `c` before the `b` and then delete the existing `c`. JsDiff deviated from the published Myers algorithm in a way that led to it having the opposite preference in many cases, including that example. This is now fixed, meaning diffs output by JsDiff will more accurately reflect what the published Myers diff algorithm would output.

## Development

Expand Down
13 changes: 1 addition & 12 deletions src/diff/base.js
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,7 @@ Diff.prototype = {
// Select the diagonal that we want to branch from. We select the prior
// path whose position in the old string is the farthest from the origin
// and does not pass the bounds of the diff graph
// TODO: Remove the `+ 1` here to make behavior match Myers algorithm
// and prefer to order removals before insertions.
if (!canRemove || (canAdd && removePath.oldPos + 1 < addPath.oldPos)) {
if (!canRemove || (canAdd && removePath.oldPos < addPath.oldPos)) {
basePath = self.addToPath(addPath, true, undefined, 0);
} else {
basePath = self.addToPath(removePath, undefined, true, 1);
Expand Down Expand Up @@ -223,15 +221,6 @@ function buildValues(diff, lastComponent, newString, oldString, useLongestToken)
} else {
component.value = diff.join(oldString.slice(oldPos, oldPos + component.count));
oldPos += component.count;

// Reverse add and remove so removes are output first to match common convention
// The diffing algorithm is tied to add then remove output and this is the simplest
// route to get the desired output with minimal overhead.
if (componentPos && components[componentPos - 1].added) {
let tmp = components[componentPos - 1];
components[componentPos - 1] = components[componentPos];
components[componentPos] = tmp;
}
}
}

Expand Down
18 changes: 9 additions & 9 deletions test/diff/array.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,27 +10,27 @@ describe('diff/array', function() {
console.log(diffResult);
expect(diffResult).to.deep.equals([
{count: 1, value: [a]},
{count: 1, value: [c], removed: undefined, added: true},
{count: 1, value: [b]},
{count: 1, value: [c], removed: true, added: undefined}
{count: 1, value: [b], removed: true, added: undefined},
{count: 1, value: [c]},
{count: 1, value: [b], removed: undefined, added: true}
]);
});
it('should diff falsey values', function() {
const a = false;
const b = 0;
const c = '';
// Example sequences from Myers 1986
const arrayA = [c, b, a, b, a, c];
const arrayB = [a, b, c, a, b, b, a];
const arrayA = [a, b, c, a, b, b, a];
const arrayB = [c, b, a, b, a, c];
const diffResult = diffArrays(arrayA, arrayB);
expect(diffResult).to.deep.equals([
{count: 2, value: [a, b], removed: undefined, added: true},
{count: 2, value: [a, b], removed: true, added: undefined},
{count: 1, value: [c]},
{count: 1, value: [b], removed: true, added: undefined},
{count: 2, value: [a, b]},
{count: 1, value: [b], removed: undefined, added: true},
{count: 2, value: [a, b]},
{count: 1, value: [b], removed: true, added: undefined},
{count: 1, value: [a]},
{count: 1, value: [c], removed: true, added: undefined}
{count: 1, value: [c], removed: undefined, added: true}
]);
});
describe('anti-aliasing', function() {
Expand Down
15 changes: 15 additions & 0 deletions test/diff/line.js
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,21 @@ describe('diff/line', function() {
expect(convertChangesToXML(diffResult)).to.equal('<del>line\n\nold value \n\nline</del>');
});

it('Should prefer to do deletions before insertions, like Unix diff does', function() {
const diffResult = diffLines('a\nb\nc\nd\n', 'a\nc\nb\nd\n');

// There are two possible diffs with equal edit distance here; either we
// can delete the "b" and insert it again later, or we can insert a "c"
// before the "b" and then delete the original "c" later.
// For consistency with the convention of other diff tools, we want to
// prefer the diff where we delete and then later insert over the one
// where we insert and then later delete.
expect(convertChangesToXML(diffResult)).to.equal('a\n<del>b\n</del>c\n<ins>b\n</ins>d\n');

const diffResult2 = diffLines('a\nc\nb\nd\n', 'a\nb\nc\nd\n');
expect(convertChangesToXML(diffResult2)).to.equal('a\n<del>c\n</del>b\n<ins>c\n</ins>d\n');
});

describe('given options.maxEditLength', function() {
it('terminates early', function() {
const diffResult = diffLines(
Expand Down
4 changes: 2 additions & 2 deletions test/diff/word.js
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ describe('WordDiff', function() {

it('should diff multiple whitespace values', function() {
const diffResult = diffWordsWithSpace('New Value ', 'New ValueMoreData ');
expect(convertChangesToXML(diffResult)).to.equal('New<ins> ValueMoreData</ins> <del>Value </del>');
expect(convertChangesToXML(diffResult)).to.equal('New<del> Value</del> <ins>ValueMoreData </ins>');
});

it('should inserts values in parenthesis', function() {
Expand Down Expand Up @@ -220,7 +220,7 @@ describe('WordDiff', function() {

it('should perform async operations', function(done) {
diffWordsWithSpace('New Value ', 'New ValueMoreData ', function(err, diffResult) {
expect(convertChangesToXML(diffResult)).to.equal('New<ins> ValueMoreData</ins> <del>Value </del>');
expect(convertChangesToXML(diffResult)).to.equal('New<del> Value</del> <ins>ValueMoreData </ins>');
done();
});
});
Expand Down