Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite applyPatch #533

Merged
merged 33 commits into from
Jul 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
ff45a4c
Add stubs of three new tests
ExplodingCabbage Jul 4, 2024
835eb75
Flesh out one test
ExplodingCabbage Jul 4, 2024
2d637b0
Add more stubs
ExplodingCabbage Jul 4, 2024
d0b5408
First draft of new applyPatch; seems to be fucked currently
ExplodingCabbage Jul 12, 2024
a25ea84
Fix some bugs in my work
ExplodingCabbage Jul 15, 2024
b6f0bf1
Fix more bugs
ExplodingCabbage Jul 15, 2024
28a8d33
Fix another bug
ExplodingCabbage Jul 15, 2024
50ec72e
Fix another bug
ExplodingCabbage Jul 15, 2024
8bdb035
Fix a bad test
ExplodingCabbage Jul 15, 2024
1fb2337
Fix another bug
ExplodingCabbage Jul 15, 2024
16fd7ba
Fix a test bug
ExplodingCabbage Jul 19, 2024
9aa7ca8
Make distance-iterator behaviour match its documentation
ExplodingCabbage Jul 19, 2024
d35e724
More tests
ExplodingCabbage Jul 19, 2024
0ca0cb1
New test
ExplodingCabbage Jul 19, 2024
3a738f6
rearrange planned tests a bit
ExplodingCabbage Jul 19, 2024
7db2467
Add a test
ExplodingCabbage Jul 19, 2024
977f8a9
Add failing test - a bug, I think?
ExplodingCabbage Jul 19, 2024
ab8eff0
Fix bug and fix test
ExplodingCabbage Jul 19, 2024
db083ec
Add another test case
ExplodingCabbage Jul 19, 2024
e6784aa
Add more tests
ExplodingCabbage Jul 19, 2024
5e4ac46
Delete redundant test
ExplodingCabbage Jul 19, 2024
0c0b99f
Add another test
ExplodingCabbage Jul 19, 2024
997126b
Add another test
ExplodingCabbage Jul 19, 2024
104555a
Placate eslint
ExplodingCabbage Jul 19, 2024
8308179
Add a silly additional test to placate istanbul
ExplodingCabbage Jul 19, 2024
92d48d1
Fix something silly I did, and further improve coverage metric in doi…
ExplodingCabbage Jul 19, 2024
826de6c
Add more tests to pump up coverage metric further
ExplodingCabbage Jul 19, 2024
431ca68
Eliminate for...of loops to placate the coverage checker
ExplodingCabbage Jul 19, 2024
533692e
Add yet more tests to reach 100% coverage
ExplodingCabbage Jul 19, 2024
d8b5d11
Add docs
ExplodingCabbage Jul 19, 2024
bd0954b
Tweak docs
ExplodingCabbage Jul 19, 2024
7bbf583
Tweak docs
ExplodingCabbage Jul 19, 2024
4d41449
Add release notes
ExplodingCabbage Jul 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,13 +119,21 @@ Broadly, jsdiff's diff functions all take an old text and a new text and perform

* `Diff.applyPatch(source, patch[, options])` - attempts to apply a unified diff patch.

If the patch was applied successfully, returns a string containing the patched text. If the patch could not be applied (because some hunks in the patch couldn't be fitted to the text in `source`), returns false.
Hunks are applied first to last. `applyPatch` first tries to apply the first hunk at the line number specified in the hunk header, and with all context lines matching exactly. If that fails, it tries scanning backwards and forwards, one line at a time, to find a place to apply the hunk where the context lines match exactly. If that still fails, and `fuzzFactor` is greater than zero, it increments the maximum number of mismatches (missing, extra, or changed context lines) that there can be between the hunk context and a region where we are trying to apply the patch such that the hunk will still be considered to match. Regardless of `fuzzFactor`, lines to be deleted in the hunk *must* be present for a hunk to match, and the context lines *immediately* before and after an insertion must match exactly.

Once a hunk is successfully fitted, the process begins again with the next hunk. Regardless of `fuzzFactor`, later hunks must be applied later in the file than earlier hunks.

If a hunk cannot be successfully fitted *anywhere* with fewer than `fuzzFactor` mismatches, `applyPatch` fails and returns `false`.

If a hunk is successfully fitted but not at the line number specified by the hunk header, all subsequent hunks have their target line number adjusted accordingly. (e.g. if the first hunk is applied 10 lines below where the hunk header said it should fit, `applyPatch` will *start* looking for somewhere to apply the second hunk 10 lines below where its hunk header says it goes.)

If the patch was applied successfully, returns a string containing the patched text. If the patch could not be applied (because some hunks in the patch couldn't be fitted to the text in `source`), `applyPatch` returns false.

`patch` may be a string diff or the output from the `parsePatch` or `structuredPatch` methods.

The optional `options` object may have the following keys:

- `fuzzFactor`: Number of lines that are allowed to differ before rejecting a patch. Defaults to 0.
- `fuzzFactor`: Maximum Levenshtein distance (in lines deleted, added, or subtituted) between the context shown in a patch hunk and the lines found in the file. Defaults to 0.
- `autoConvertLineEndings`: If `true`, and if the file to be patched consistently uses different line endings to the patch (i.e. either the file always uses Unix line endings while the patch uses Windows ones, or vice versa), then `applyPatch` will behave as if the line endings in the patch were the same as those in the source file. (If `false`, the patch will usually fail to apply in such circumstances since lines deleted in the patch won't be considered to match those in the source file.) Defaults to `true`.
- `compareLine(lineNumber, line, operation, patchContent)`: Callback used to compare to given lines to determine if they should be considered equal when patching. Defaults to strict equality but may be overridden to provide fuzzier comparison. Should return false if the lines should be rejected.

Expand Down
4 changes: 4 additions & 0 deletions release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@
- [#521](https://github.com/kpdecker/jsdiff/pull/521) **the `callback` option is now supported by `structuredPatch`, `createPatch
- [#529](https://github.com/kpdecker/jsdiff/pull/529) **`parsePatch` can now parse patches where lines starting with `--` or `++` are deleted/inserted**; previously, there were edge cases where the parser would choke on valid patches or give wrong results.
- [#530](https://github.com/kpdecker/jsdiff/pull/530) **Added `ignoreNewlineAtEof` option` to `diffLines`**
- [#533](https://github.com/kpdecker/jsdiff/pull/533) **`applyPatch` uses an entirely new algorithm for fuzzy matching.** Differences between the old and new algorithm are as follows:
* The `fuzzFactor` now indicates the maximum [*Levenshtein* distance](https://en.wikipedia.org/wiki/Levenshtein_distance) that there can be between the context shown in a hunk and the actual file content at a location where we try to apply the hunk. (Previously, it represented a maximum [*Hamming* distance](https://en.wikipedia.org/wiki/Hamming_distance), meaning that a single insertion or deletion in the source file could stop a hunk from applying even with a high `fuzzFactor`.)
* A hunk containing a deletion can now only be applied in a context where the line to be deleted actually appears verbatim. (Previously, as long as enough context lines in the hunk matched, `applyPatch` would apply the hunk anyway and delete a completely different line.)
* The context line immediately before and immediately after an insertion must match exactly between the hunk and the file for a hunk to apply. (Previously this was not required.)

## v5.2.0

Expand Down
266 changes: 195 additions & 71 deletions src/patch/apply.js
Original file line number Diff line number Diff line change
Expand Up @@ -29,104 +29,228 @@ export function applyPatch(source, uniDiff, options = {}) {
hunks = uniDiff.hunks,

compareLine = options.compareLine || ((lineNumber, line, operation, patchContent) => line === patchContent),
errorCount = 0,
fuzzFactor = options.fuzzFactor || 0,
minLine = 0,
offset = 0,
minLine = 0;

removeEOFNL,
addEOFNL;
if (fuzzFactor < 0 || !Number.isInteger(fuzzFactor)) {
throw new Error('fuzzFactor must be a non-negative integer');
}

// Special case for empty patch.
if (!hunks.length) {
return source;
}

// Before anything else, handle EOFNL insertion/removal. If the patch tells us to make a change
// to the EOFNL that is redundant/impossible - i.e. to remove a newline that's not there, or add a
// newline that already exists - then we either return false and fail to apply the patch (if
// fuzzFactor is 0) or simply ignore the problem and do nothing (if fuzzFactor is >0).
// If we do need to remove/add a newline at EOF, this will always be in the final hunk:
let prevLine = '',
removeEOFNL = false,
addEOFNL = false;
for (let i = 0; i < hunks[hunks.length - 1].lines.length; i++) {
const line = hunks[hunks.length - 1].lines[i];
if (line[0] == '\\') {
if (prevLine[0] == '+') {
removeEOFNL = true;
} else if (prevLine[0] == '-') {
addEOFNL = true;
}
break;
}
prevLine = line;
}
if (removeEOFNL) {
if (lines[lines.length - 1] == '') {
lines.pop();
} else if (!fuzzFactor) {
return false;
}
} else if (addEOFNL) {
if (lines[lines.length - 1] != '') {
lines.push('');
} else if (!fuzzFactor) {
return false;
}
}

/**
* Checks if the hunk exactly fits on the provided location
* Checks if the hunk can be made to fit at the provided location with at most `maxErrors`
* insertions, substitutions, or deletions, while ensuring also that:
* - lines deleted in the hunk match exactly, and
* - wherever an insertion operation or block of insertion operations appears in the hunk, the
* immediately preceding and following lines of context match exactly
*
* `toPos` should be set such that lines[toPos] is meant to match hunkLines[0].
*
* If the hunk can be applied, returns an object with properties `oldLineLastI` and
* `replacementLines`. Otherwise, returns null.
*/
function hunkFits(hunk, toPos) {
for (let j = 0; j < hunk.lines.length; j++) {
let line = hunk.lines[j],
operation = (line.length > 0 ? line[0] : ' '),
content = (line.length > 0 ? line.substr(1) : line);

if (operation === ' ' || operation === '-') {
// Context sanity check
if (!compareLine(toPos + 1, lines[toPos], operation, content)) {
errorCount++;

if (errorCount > fuzzFactor) {
return false;
function applyHunk(
hunkLines,
toPos,
maxErrors,
hunkLinesI = 0,
lastContextLineMatched = true,
patchedLines = [],
patchedLinesLength = 0,
) {
let nConsecutiveOldContextLines = 0;
let nextContextLineMustMatch = false;
for (; hunkLinesI < hunkLines.length; hunkLinesI++) {
let hunkLine = hunkLines[hunkLinesI],
operation = (hunkLine.length > 0 ? hunkLine[0] : ' '),
content = (hunkLine.length > 0 ? hunkLine.substr(1) : hunkLine);

if (operation === '-') {
if (compareLine(toPos + 1, lines[toPos], operation, content)) {
toPos++;
nConsecutiveOldContextLines = 0;
} else {
if (!maxErrors || lines[toPos] == null) {
return null;
}
patchedLines[patchedLinesLength] = lines[toPos];
return applyHunk(
hunkLines,
toPos + 1,
maxErrors - 1,
hunkLinesI,
false,
patchedLines,
patchedLinesLength + 1,
);
}
}

if (operation === '+') {
if (!lastContextLineMatched) {
return null;
}
patchedLines[patchedLinesLength] = content;
patchedLinesLength++;
nConsecutiveOldContextLines = 0;
nextContextLineMustMatch = true;
}

if (operation === ' ') {
nConsecutiveOldContextLines++;
patchedLines[patchedLinesLength] = lines[toPos];
if (compareLine(toPos + 1, lines[toPos], operation, content)) {
patchedLinesLength++;
lastContextLineMatched = true;
nextContextLineMustMatch = false;
toPos++;
} else {
if (nextContextLineMustMatch || !maxErrors) {
return null;
}

// Consider 3 possibilities in sequence:
// 1. lines contains a *substitution* not included in the patch context, or
// 2. lines contains an *insertion* not included in the patch context, or
// 3. lines contains a *deletion* not included in the patch context
// The first two options are of course only possible if the line from lines is non-null -
// i.e. only option 3 is possible if we've overrun the end of the old file.
return (
lines[toPos] && (
applyHunk(
hunkLines,
toPos + 1,
maxErrors - 1,
hunkLinesI + 1,
false,
patchedLines,
patchedLinesLength + 1
) || applyHunk(
hunkLines,
toPos + 1,
maxErrors - 1,
hunkLinesI,
false,
patchedLines,
patchedLinesLength + 1
)
) || applyHunk(
hunkLines,
toPos,
maxErrors - 1,
hunkLinesI + 1,
false,
patchedLines,
patchedLinesLength
)
);
}
toPos++;
}
}

return true;
// Before returning, trim any unmodified context lines off the end of patchedLines and reduce
// toPos (and thus oldLineLastI) accordingly. This allows later hunks to be applied to a region
// that starts in this hunk's trailing context.
patchedLinesLength -= nConsecutiveOldContextLines;
toPos -= nConsecutiveOldContextLines;
patchedLines.length = patchedLinesLength;
return {
patchedLines,
oldLineLastI: toPos - 1
};
}

const resultLines = [];

// Search best fit offsets for each hunk based on the previous ones
let prevHunkOffset = 0;
for (let i = 0; i < hunks.length; i++) {
let hunk = hunks[i],
maxLine = lines.length - hunk.oldLines,
localOffset = 0,
toPos = offset + hunk.oldStart - 1;

let iterator = distanceIterator(toPos, minLine, maxLine);

for (; localOffset !== undefined; localOffset = iterator()) {
if (hunkFits(hunk, toPos + localOffset)) {
hunk.offset = offset += localOffset;
const hunk = hunks[i];
let hunkResult;
let maxLine = lines.length - hunk.oldLines + fuzzFactor;
let toPos;
for (let maxErrors = 0; maxErrors <= fuzzFactor; maxErrors++) {
toPos = hunk.oldStart + prevHunkOffset - 1;
let iterator = distanceIterator(toPos, minLine, maxLine);
for (; toPos !== undefined; toPos = iterator()) {
hunkResult = applyHunk(hunk.lines, toPos, maxErrors);
if (hunkResult) {
break;
}
}
if (hunkResult) {
break;
}
}

if (localOffset === undefined) {
if (!hunkResult) {
return false;
}

// Set lower text limit to end of the current hunk, so next ones don't try
// to fit over already patched text
minLine = hunk.offset + hunk.oldStart + hunk.oldLines;
}
// Copy everything from the end of where we applied the last hunk to the start of this hunk
for (let i = minLine; i < toPos; i++) {
resultLines.push(lines[i]);
}

// Apply patch hunks
let diffOffset = 0;
for (let i = 0; i < hunks.length; i++) {
let hunk = hunks[i],
toPos = hunk.oldStart + hunk.offset + diffOffset - 1;
diffOffset += hunk.newLines - hunk.oldLines;
// Add the lines produced by applying the hunk:
for (let i = 0; i < hunkResult.patchedLines.length; i++) {
const line = hunkResult.patchedLines[i];
resultLines.push(line);
}

for (let j = 0; j < hunk.lines.length; j++) {
let line = hunk.lines[j],
operation = (line.length > 0 ? line[0] : ' '),
content = (line.length > 0 ? line.substr(1) : line);
// Set lower text limit to end of the current hunk, so next ones don't try
// to fit over already patched text
minLine = hunkResult.oldLineLastI + 1;

if (operation === ' ') {
toPos++;
} else if (operation === '-') {
lines.splice(toPos, 1);
/* istanbul ignore else */
} else if (operation === '+') {
lines.splice(toPos, 0, content);
toPos++;
} else if (operation === '\\') {
let previousOperation = hunk.lines[j - 1] ? hunk.lines[j - 1][0] : null;
if (previousOperation === '+') {
removeEOFNL = true;
} else if (previousOperation === '-') {
addEOFNL = true;
}
}
}
// Note the offset between where the patch said the hunk should've applied and where we
// applied it, so we can adjust future hunks accordingly:
prevHunkOffset = toPos + 1 - hunk.oldStart;
}

// Handle EOFNL insertion/removal
if (removeEOFNL) {
while (!lines[lines.length - 1]) {
lines.pop();
}
} else if (addEOFNL) {
lines.push('');
// Copy over the rest of the lines from the old text
for (let i = minLine; i < lines.length; i++) {
resultLines.push(lines[i]);
}
return lines.join('\n');

return resultLines.join('\n');
}

// Wrapper that supports multiple file patches via callbacks.
Expand Down
4 changes: 2 additions & 2 deletions src/util/distance-iterator.js
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ export default function(start, minLine, maxLine) {
// Check if trying to fit beyond text length, and if not, check it fits
// after offset location (or desired location on first iteration)
if (start + localOffset <= maxLine) {
return localOffset;
return start + localOffset;
}

forwardExhausted = true;
Expand All @@ -32,7 +32,7 @@ export default function(start, minLine, maxLine) {
// Check if trying to fit before text beginning, and if not, check it fits
// before offset location
if (minLine <= start - localOffset) {
return -localOffset++;
return start - localOffset++;
}

backwardExhausted = true;
Expand Down
Loading