-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
path: remove unreachable branch in normalizeString #22273
path: remove unreachable branch in normalizeString #22273
Conversation
|
The change to the code is beneficial because it helps readers understand The fact that the branch is not reachable is far from obvious simply The assertion itself also makes the code more robust to changes. If
Understood; it is rather long. The proof itself is approximately as short as I can make it without (The actual commit message preceding the proof itself is just a I’m open to suggestions. |
@wchargin thanks a lot for your contribution but adding these cases is very uncommon in Node core so far. Even if the else path is not taken, there are hundreds if not thousands of like wise cases.
Each error should have a test on it's own. Since this path can not be tested it is actually already proof that this code should not exist. At least that is how it is handled so far. For these reasons I am -1 on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making it explicit. I also don't believe this one particular instance is necessary, especially if we don't have a test for it.
Thanks for your input.
It bears noting the proposed patch doesn’t add any new untested paths. Here is a way to resolve the underlying problem while also addressing diff --git a/lib/path.js b/lib/path.js
index 90129f1f52..c8d92f14af 100644
--- a/lib/path.js
+++ b/lib/path.js
@@ -77,18 +77,16 @@ function normalizeString(path, allowAboveRoot, separator, isPathSeparator) {
res.charCodeAt(res.length - 2) !== CHAR_DOT) {
if (res.length > 2) {
const lastSlashIndex = res.lastIndexOf(separator);
- if (lastSlashIndex !== res.length - 1) {
- if (lastSlashIndex === -1) {
- res = '';
- lastSegmentLength = 0;
- } else {
- res = res.slice(0, lastSlashIndex);
- lastSegmentLength = res.length - 1 - res.lastIndexOf(separator);
- }
- lastSlash = i;
- dots = 0;
- continue;
+ if (lastSlashIndex === -1) {
+ res = '';
+ lastSegmentLength = 0;
+ } else {
+ res = res.slice(0, lastSlashIndex);
+ lastSegmentLength = res.length - 1 - res.lastIndexOf(separator);
}
+ lastSlash = i;
+ dots = 0;
+ continue;
} else if (res.length === 2 || res.length === 1) {
res = '';
lastSegmentLength = 0; This removes the opportunity for confusion about whether the branch can Tests continue to pass with this change, of course. What do you think? |
44ca206
to
f6438c6
Compare
As for the commit message length—I’ll remove the long proof from the Proof that the branch in question is not reachableThe function
We will therefore take these as preconditions on the function, which is Below is the source code of the function
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@wchargin thanks a lot. Removing seems adequate. There could be more cases like this, so please feel encouraged to add tests for uncovered cases / remove these cases if they can not be reached :-) |
The current changes are much better than the original changes, provided all tests still pass everywhere.
@BridgeAR, @mscdex: Great; thanks. I agree that this is a better patch Travis check from earlier seemed potentially spurious (“no output has Jenkins failure seems spurious; it looks like Jenkins has a dirty For reference, Anything that you need from me? I’m assuming, by the way, that the two commits in question will be |
This PR needs a rebase against master to avoid the git failure in the CI. |
There is an `if`-statement in `normalizeString` (a helper function for `path.normalize`) whose `else`-branch is never taken. This patch adds a runtime assertion to document this fact to others reading the code. The remainder of this commit message contains a proof that the branch is in fact never taken. The function `normalizeString` is called in four places (on lines 284, 377, 1110, and 1137). At each call site, we may observe that: - the `separator` argument is either "/" or "\\", and - the `isPathSeparator` argument is a function that returns `true` when given the code of `separator` and returns `true` when given `CHAR_FORWARD_SLASH`. We will therefore take these as preconditions on the function, which is valid because the function is module-private. Below is the source code of the function `normalizeString`, annotated with proof terms. Of particular import is the line marked "CLAIM(*)". ``` > // Resolves . and .. elements in a path with directory names PRECONDITION(A): `separator` must be "/" or "\\". PRECONDITION(B): `isPathSeparator` must return `true` given `separator.charCodeAt(0)`. PRECONDITION(C): `isPathSeparator` must return `true` given `CHAR_FORWARD_SLASH`. > function normalizeString(path, allowAboveRoot, separator, isPathSeparator) { INVARIANT(D): `res` does not end with `separator`. Proof: By induction, at initialization and at every assignment to `res`. The base case holds because `res` is empty and `separator` is not, by PRECONDITION(A). Assignments will be justified inline. INVARIANT(E): `res` does not contain two consecutive separators. Proof: By induction, at initialization and at every assignment to `res`. The base case is immediate. Assignments will be justified inline. > var res = ''; var lastSegmentLength = 0; INVARIANT(F): `lastSlash` is always an integer, and `i` is always an integer. Proof: By induction. The initial values of each are integers. The only assignment to `i` is to increment it (`++i` in the declaration), which preserves integrality. The only reassignment to `lastSlash` is to assign it the value of `i`, which is known by induction to be an integer. INVARIANT(G): Once the loop index `i` is initialized, it holds that `lastSlash <= i`. Proof: By induction, at initialization of `i` and at every assignment to `i` or `lastSlash`. The base case is clear: `i` is initialized to `0`, at which point `lastSlash` is `-1`. The only assignment to `i` is `++i`, which preserves the invariant. The only assignments to `lastSlash` are to set its value to `i`, which also preserve the invariant. > var lastSlash = -1; > var dots = 0; > var code; INVARIANT(H): Loop invariant: `path.slice(lastSlash + 1, i)` does not contain a `separator` (once `i` has been initialized). We refer to this expression as "the slice". Proof: By induction: at initialization of `i`, and at every assignment to `lastSlash`, `i`, or `path`. The base case is clear: initially, the slice has domain `(0, 0)`, so is empty. Assignments will be justified inline. LEMMA(I): If `lastSlash` is assigned the value `i` and neither `lastSlash` nor `i` nor `path` is modified before the next iteration of the loop, then INVARIANT(H) is preserved both (a) at the assignment and (b) at the iteration boundary. Proof: At the assignment, the slice has domain `(i + 1, i)`, so is empty. After `++i`, the slice has domain `(i + 1, i + 1)`, which is still empty. The empty string does not contain a `separator`, because `separator` is non-empty by PRECONDITION(A). This is sufficient to maintain the INVARIANT(H). INVARIANT(J): At the top of the loop, `lastSlash < i`. Proof: By cases on the iteration of the loop. For the first iteration of the loop, `lastSlash === -1` and `i === 0`. For subsequent iterations, note that INVARIANT(G) held at the bottom of the previous iteration of the loop, before `i` was incremented: that is, the previous value of `lastSlash` was less than or equal to the previous value of `i`. Since then, `lastSlash` has not been reassigned, and `i` has been incremented, so it follows that `lastSlash <= i - 1`, and therefore `lastSlash < i`. > for (var i = 0; i <= path.length; ++i) { > if (i < path.length) > code = path.charCodeAt(i); > else if (isPathSeparator(code)) > break; > else > code = CHAR_FORWARD_SLASH; > > if (isPathSeparator(code)) { > if (lastSlash === i - 1 || dots === 1) { > // NOOP > } else if (lastSlash !== i - 1 && dots === 2) { > if (res.length < 2 || lastSegmentLength !== 2 || > res.charCodeAt(res.length - 1) !== CHAR_DOT || > res.charCodeAt(res.length - 2) !== CHAR_DOT) { > if (res.length > 2) { > const lastSlashIndex = res.lastIndexOf(separator); > if (lastSlashIndex !== res.length - 1) { > if (lastSlashIndex === -1) { JUSTIFICATION: This assignment trivially preserves INVARIANT(D). JUSTIFICATION: This assignment trivially preserves INVARIANT(E). > res = ''; > lastSegmentLength = 0; > } else { JUSTIFICATION: This assignment preserves INVARIANT(D): - By control flow, we know that `lastSlashIndex !== -1`. - By definition of `lastIndexOf`, this means that `res` contains a `separator` at index `lastSlashIndex`. - By INVARIANT(E), `res` does not contain two consecutive `separator`s. Therefore, `res` does not contain a `separator` at index `lastSlashIndex - 1`. - Therefore, the new value for `res` does not contain a `separator` at index `lastSlashIndex - 1`, so it does not end with a `separator`. JUSTIFICATION: This assignment preserves INVARIANT(E). By INVARIANT(E), we know that `res` does not contain two consecutive `separator`s. It is immediate that no slice of `res` contains two consecutive `separator`s. > res = res.slice(0, lastSlashIndex); > lastSegmentLength = res.length - 1 - res.lastIndexOf(separator); > } JUSTIFICATION: This assignment preserves INVARIANT(H), by LEMMA(I). > lastSlash = i; > dots = 0; JUSTIFICATION: This loop boundary preserves INVARIANT(H), by LEMMA(I). > continue; > } else { CLAIM(*): This branch is not reachable. Proof: INVARIANT(D) indicates that `res` does not end in a `separator`, which means that `lastSlashIndex !== res.length - 1`, which is the guard for the enclosing `if`-statement. > throw new ERR_ASSERTION( > 'invariant violation: unreachable: ' + > JSON.stringify({ path, allowAboveRoot, separator }) > ); > } > } else if (res.length === 2 || res.length === 1) { JUSTIFICATION: This assignment trivially preserves INVARIANT(D). JUSTIFICATION: This assignment trivially preserves INVARIANT(E). > res = ''; > lastSegmentLength = 0; JUSTIFICATION: This assignment preserves INVARIANT(H), by LEMMA(I). > lastSlash = i; > dots = 0; JUSTIFICATION: This loop boundary preserves INVARIANT(H), by LEMMA(I). > continue; > } > } > if (allowAboveRoot) { > if (res.length > 0) JUSTIFICATION: This assignment preserves INVARIANT(D) because `separator` is either "/" or "\\" by PRECONDITION(A), and so the new value of `res`, which ends with ".", does not end with `separator`. JUSTIFICATION: This assignment preserves INVARIANT(E). We know by INVARIANT(D) that `res` does not end with a separator. Therefore, appending a `separator` does not introduce two consecutive `separator`s, and appending two copies of "." does not introduce two consecutive separators because, by PRECONDITION(A), `separator` is either "/" or "\\" and so does not contain ".". > res += `${separator}..`; > else JUSTIFICATION: This assignment preserves INVARIANT(D) and INVARIANT(E) because `separator` is either "/" or "\\" by PRECONDITION(A), and so does not appear in the new value for `res` at all (either at the end or twice consecutively). > res = '..'; > lastSegmentLength = 2; > } > } else { > if (res.length > 0) JUSTIFICATION: This assignment preserves INVARIANT(D) and INVARIANT(E): - By INVARIANT(J), `lastSlash` was less than `i` at the top of the loop body. By control flow, neither `lastSlash` nor `i` has since been reassigned, so it still holds that `lastSlash < i`. - At this point in the loop body, we have not assigned to `lastSlash`. - By control flow, we also have `lastSlash !== i - 1`. - By INVARIANT(F), both `lastSlash` and `i` are integers. - Therefore, `lastSlash < i - 1`, so `lastSlash + 1 < i`. - By the loop guard, `i <= path.length`. - The previous two facts imply that `lastSlash + 1 < path.length`. - Therefore, `path.slice(lastSlash + 1, i)` is nonempty. - By INVARIANT(H), this slice does not contain a `separator`. - Because the slice is nonempty, the new value of `res` will end in the last character of the slice, which is not a `separator`, so INVARIANT(D) is preserved. - Because `res` does not end with a separator, appending a separator to `res` does not introduce two consecutive separators. Because the slice does not contain a separator, subsequently appending the slice also does not introduce two consecutive separators, so INVARIANT(E) is preserved. > res += separator + path.slice(lastSlash + 1, i); > else JUSTIFICATION: This assignment preserves INVARIANT(D) and INVARIANT(E), because we know from INVARIANT(H) that the slice does not contain a separator at all, so the new value of `res` neither ends in a separator nor contains two consecutive separators. > res = path.slice(lastSlash + 1, i); > lastSegmentLength = i - lastSlash - 1; > } JUSTIFICATION: This assignment preserves INVARIANT(H), by LEMMA(I). > lastSlash = i; > dots = 0; JUSTIFICATION: This loop boundary preserves INVARIANT(H), by LEMMA(I). > } else if (code === CHAR_DOT && dots !== -1) { > ++dots; JUSTIFICATION: This loop boundary preserves INVARIANT(H). We know by induction that `path.slice(lastSlash + 1, i)` does not contain a separator. We know from control flow that `code` is `CHAR_DOT`, so `path[i]` is not a separator. Thus, `path.slice(lastSlash + 1, i + 1)` does not contain a separator, so INVARIANT(H) holds. > } else { > dots = -1; JUSTIFICATION: This loop boundary preserves INVARIANT(H). We know by induction that `path.slice(lastSlash + 1, i)` does not contain a separator. We know from control flow that `!isPathSeparator(code)`. We also know from control flow that `code` is either `path.charCodeAt(i)` or `CHAR_FORWARD_SLASH`. PRECONDITION(C) shows that `code` cannot be `CHAR_FORWARD_SLASH`, because `!isPathSeparator(code)`, so `code` must be `path.charCodeAt(i)`. PRECONDITION(B) shows that `code` cannot be `separator.charCodeAt(0)`. This implies that `path[i]` is not `separator`. Thus, `path.slice(lastSlash + 1, i + 1)` does not contain a separator, so INVARIANT(H) holds. > } > } > return res; > } ``` Test Plan: Existing unit tests pass. No additional tests are required. wchargin-branch: normalizeString-dead-branch
See review comments on nodejs#22273 for rationale. wchargin-branch: normalizeString-dead-branch
f6438c6
to
013a527
Compare
Rebased; |
Failure is a segfault in |
Coverage link prior to the patch (noting that «else path is not taken»): |
@ChALkeR: That’s good to see; thanks for posting. |
Landed in aecfea4 |
There is an `if`-statement in `normalizeString` (a helper function for `path.normalize`) whose `else`-branch is never taken. This patch removes it. PR-URL: #22273 Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
There is an `if`-statement in `normalizeString` (a helper function for `path.normalize`) whose `else`-branch is never taken. This patch removes it. PR-URL: #22273 Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
There is an `if`-statement in `normalizeString` (a helper function for `path.normalize`) whose `else`-branch is never taken. This patch removes it. PR-URL: #22273 Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
There is an `if`-statement in `normalizeString` (a helper function for `path.normalize`) whose `else`-branch is never taken. This patch removes it. PR-URL: #22273 Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
There is an
if
-statement innormalizeString
(a helper function forpath.normalize
) whoseelse
-branch is never taken. This patch removesthe branch entirely to make the code easier to understand, which
incidentally improves test coverage.
For a proof that the branch is in fact not reachable, see:
#22273 (comment)
(Summary of proof: the key invariants, proven by simultaneous induction,
are (a) that
res
does not end in a separator and (b) thatres
doesnot contain two consecutive separators.)
Test Plan:
Existing unit tests pass. No additional tests are required.
wchargin-branch: normalizeString-dead-branch
Checklist
make -j4 test
(UNIX) passes