-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't ignore newlines when fuzzing #332
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Update test fixtures for PowerShell on Windows ... to include examples of whitespace and how it should be escaped. The examples were obtained through fuzzing. Existing examples with relevant characters have been updated as well. The test macro used when running the fixtures has been updated to render the various whitespace characters being added here. This is to ensure unique test names. * Escape necessary whitespace characters for PowerShell For simplicity, this escapes all whitespace by converting it into spaces (U+0020). The alternative, escaping whitespace with a backtick (`) seems to be possible, but is fragile. Based on fuzzing, it doesn't always work as expected, leading to unexpected changes in the input string. It is deemed preferable to just replace all spaces as it's simpler to explain/ understand than seemingly random changes to the input. In the future more granular escaping for (certain) whitespace characters can be considered.
* Update test fixtures for Bash, Dash, and Zsh on Unix ... to include examples of whitespace and how it should be escaped. The examples were obtained based on the tests for PowerShell, as well as the definition of the `\s` group for Regular Expression in JavaScript (the former only includes `\u0085` that is not included in the latter). All characters were manually testing using the script: import { execSync } from "node:child_process"; import assert from "node:assert"; import * as shescape from "shescape"; const options = { interpolation: true, shell: "zsh", encoding: "utf8", }; const whitespaceChars = [ // Excluded because they will fail: // "\u0009", <- Converted to a normal space by Bash, Dash, and Zsh // "\u000A", <- Errors on Bash, Dash, and Zsh "\u000B", "\u000C", "\u000D", " ", "\u0085", "\u00A0", "\u1680", "\u2000", "\u2001", "\u2002", "\u2003", "\u2004", "\u2005", "\u2006", "\u2007", "\u2008", "\u2009", "\u200A", "\u2028", "\u2029", "\u202F", "\u205F", "\u3000", "\uFEFF", ]; for (const char of whitespaceChars) { const userInput = `foo${char}bar`; const escapeOutput = execSync( `node test/_echo.js ${shescape.escape(userInput, options)}`, options ); const quoteOutput = execSync( `node test/_echo.js ${shescape.quote(userInput, options)}`, options ); assert.strictEqual(escapeOutput, quoteOutput); } The test macro used when running the fixtures has been updated further to render all whitespace characters correctly (to ensure unique test names). This was updated to 1) output `\u{hhhh}` for readability (more clearly demarking unicode characters from "normal" characters) and 2) use unique character indices for all characters to more clearly communicate the chosen order. * Escape necessary whitespace characters for Bash, Dash, and Zsh Only the newline character is necessary to escape for each shell following the experiment outlined above. In the future more granular escaping for whitespace characters can be considered.
Omit many characters from being escaped by PowerShell and instead rely on PowerShell's own conversion from many whitespace characters to a "normal" space character (' '). Both LF and CR characters are still escaped to space characters as both cause whatever follows them to be interpreted as a new command. A notable exception is \uFEFF, which is not converted into a space by PowerShell but also doesn't cause any issues. Additionally, the whitespace fixtures for PowerShell were updated to follow the same order as whitespace fixtures for Unix and the replacing logic for the macro title (`test/unit/_macros.js`). The script used to verify the above claims: import { execSync } from "node:child_process"; import assert from "node:assert"; import * as shescape from "shescape"; const options = { interpolation: true, shell: "powershell.exe", encoding: "utf8", }; const whitespaceChars = [ "\u0009", // converted to space by PowerShell "\u000A", // escaped to " " by Shescape "\u000B", // converted to space by PowerShell "\u000C", // converted to space by PowerShell "\u000D", // escaped to " " by Shescape " ", // normal whitespace "\u0085", // converted to space by PowerShell "\u00A0", // converted to space by PowerShell "\u1680", // converted to space by PowerShell "\u2000", // converted to space by PowerShell "\u2001", // converted to space by PowerShell "\u2002", // converted to space by PowerShell "\u2003", // converted to space by PowerShell "\u2004", // converted to space by PowerShell "\u2005", // converted to space by PowerShell "\u2006", // converted to space by PowerShell "\u2007", // converted to space by PowerShell "\u2008", // converted to space by PowerShell "\u2009", // converted to space by PowerShell "\u200A", // converted to space by PowerShell "\u2028", // converted to space by PowerShell "\u2029", // converted to space by PowerShell "\u202F", // converted to space by PowerShell "\u205F", // converted to space by PowerShell "\u3000", // converted to space by PowerShell // "\uFEFF", // see end of script ]; for (const char of whitespaceChars) { const userInput = `foo${char}bar`; const escapeOutput = execSync( `node test/_echo.js ${shescape.escape(userInput, options)}`, options ); assert.strictEqual(escapeOutput, "foo bar\n"); } for (const char of ["\uFEFF"]) { const userInput = `foo${char}bar`; const escapeOutput = execSync( `node test/_echo.js ${shescape.escape(userInput, options)}`, options ); const quoteOutput = execSync( `node test/_echo.js ${shescape.quote(userInput, options)}`, options ); assert.strictEqual(escapeOutput, quoteOutput); }
* Don't ignore newlines when fuzzing CMD Update the `_common.js` fuzz logic to not strip all newline characters from the argument if the shell being fuzzed is CMD, and update the `getExpectedOutput` logic to omit/change whitespace following how the shell does it. * Update test fixtures for CMD on Windows ... to include examples of whitespace and how it should be escaped. The examples were obtained based on the tests for Bash, Dash, PowerShell, and Zsh. All characters were manually testing using the script: import { execSync } from "node:child_process"; import assert from "node:assert"; import * as shescape from "shescape"; const options = { interpolation: true, shell: "cmd.exe", encoding: "utf8", }; const whitespaceChars = [ // "\u0009", <- Converted to a normal space by cmd.exe // "\u000A", <- Trailing content is ignored on cmd.exe // "\u000D", <- Removed by cmd.exe "\u000B", "\u000C", " ", "\u0085", "\u00A0", "\u1680", "\u2000", "\u2001", "\u2002", "\u2003", "\u2004", "\u2005", "\u2006", "\u2007", "\u2008", "\u2009", "\u200A", "\u2028", "\u2029", "\u202F", "\u205F", "\u3000", "\uFEFF", ]; for (const char of whitespaceChars) { const userInput = `foo${char}bar`; const escapeOutput = execSync( `node test/_echo.js ${shescape.escape(userInput, options)}`, options ); const quoteOutput = execSync( `node test/_echo.js ${shescape.quote(userInput, options)}`, options ); assert.strictEqual(escapeOutput, quoteOutput); } const escapeOutputTab = execSync( `node test/_echo.js ${shescape.escape("foo\u0009bar", options)}`, options ); assert.strictEqual(escapeOutputTab, "foo bar\n"); const escapeOutputLF = execSync( `node test/_echo.js ${shescape.escape("foo\u000Abar", options)}`, options ); assert.strictEqual(escapeOutputLF, "foo\n"); const escapeOutputCR = execSync( `node test/_echo.js ${shescape.escape("foo\u000Dbar", options)}`, options ); assert.strictEqual(escapeOutputCR, "foobar\n"); * Escape necessary whitespace characters for CMD Only the newline character is necessary to escape for each shell following the experiment outlined above. In contrast to Bash, Dash, PowerShell, and Zsh this is also required when the argument is being quoted (based on fuzzing). In the future more granular escaping for whitespace characters can be considered. The `getExpectedOutput` logic is also updated to make the same transformation as Shescape does.
* Add items to fuzz corpus Add two item to fuzz corpus that trigger the bug on Bash where the tilde in combination with line terminating characters isn't escaped correctly. - `f28febc`: contains a carriage return character, this is the item that uncovered the bug in the continuous integration system. - `dece2a6`: contains a newline character, this was added separately as this currently does not require escaping the tilde (due to line 49 of `unix.js`). By having this case in the corpus, changes to line 49 that will require changes to the escaping of the tilde in the future will be detectable by fuzzing. * Update test fixtures for Bash on Unix ...to include examples of the line terminating characters where the tilde should be escaped but currently isn't. This list of characters was obtained using the following script: import { execSync } from "node:child_process"; import assert from "node:assert"; import * as shescape from "shescape"; const options = { interpolation: true, shell: "bash", encoding: "utf8", }; const safeWhitespaceChars = [ // "\u0009", "\u000A", "\u000B", "\u000C", " ", "\u0085", // "\u00A0", "\u1680", "\u2000", "\u2001", "\u2002", "\u2003", // "\u2004", "\u2005", "\u2006", "\u2007", "\u2008", "\u2009", // "\u200A", "\u202F", "\u205F", "\u3000", "\uFEFF", ]; for (const char of safeWhitespaceChars) { const userInput = `foo=${char}:~`; const escapeOutput = execSync( `node test/_echo.js ${shescape.escape(userInput, options)}`, options ); assert.match(escapeOutput, /foo=[\s\u0085]:~\n/); } const unsafeWhitespaceChars = [ "\u000D", "\u2028", "\u2029", ]; for (const char of unsafeWhitespaceChars) { const userInput = `foo=${char}:~`; const escapeOutput = execSync( `node test/_echo.js ${shescape.escape(userInput, options)}`, options ); assert.doesNotMatch(escapeOutput, /foo=[\u000D\u2028\u2029]:~\n/); } * Update escaping of tilde for Bash Update the escaping of the tilde character for bash to explicitly allow line terminating characters. It turns out that the `.` wildcard in regular expression in JavaScript do not include line terminating characters. Note that the newline character is absant from this list, this is because it's omitted entirely (on line 49).
* Add item to fuzz corpus Add an item to fuzz corpus that triggers the bug on CMD where a trailing forward slash in combination with a carriage return character results in a double quote being outputted (similar to PowerShell's behaviour). * Update test fixtures for CMD on Windows So that it is now expected that carriage return characters are replaced by a single space character. Only escaping carriage return characters when `interpolation:true` was manually tested through fuzzing, this showed that carriage return characters always need to be escaped. The following script was verified whether or not other line terminating characters needed to be escaped as well: import { execSync } from "node:child_process"; import assert from "node:assert"; import fs from "node:fs"; import * as shescape from "shescape"; const tmp = fs.readFileSync( "./crash-" + "31ed7643aba69fe2d776af3aee587bb7899165af5ed3846c6f70327f2eec4713" ).toString(); const newlineChars = [ "\u000A", // "\u000D", <- '\r', does not work as we know "\u2028", "\u2029", ]; for (const char of newlineChars) { const userInput = tmp.replace(/\r/, char); const escapeOutput = execSync( `node test/_echo.js ${shescape.quote(userInput)}` ).toString(); assert.strictEqual(escapeOutput.trim(), tmp.trim()); } * Update escaping of whitespace for CMD on Windows Note: the `_common.js` fuzz logic was updated accordingly to get the correct expected output.
Based on [1]: > Note that the m multiline flag doesn't change the dot behavior. So to > match a pattern across multiple lines, the character class `[^]` can > be used - it will match any character including newlines. -- 1. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes
* Add item to fuzz corpus Add an item to the fuzz corpus that triggers the bug on Bash/Dash where brace expansion happens if there's a line terminating character within the braces. * Update test fixtures for Bash/Dash ...to include exmaples of the line terminating characters where the curly brakcet should be escaped but currently isn't. The list of characters is based on the experiments carried out for the tilde. * Update escaping of curly bracket for Bash/Dash Like the escaping of tildes for bash, use `[^]` to match _all_ characters, including newlines. Note that Zsh does not (currently) need to be changed as it always has curly braces escaped. * Update the CHANGELOG
Include "newlines" (line feeds) in the item regarding tidles and curly braces for bash as prior to release containing these changes the problem also existed for line feeds (just not now because line feeds are removed entirely). As a result, the language can be simplified as well.
ericcornelissen
commented
Jul 15, 2022
.replace(/(^|\s)(~|#)/g, "$1\\$2") | ||
.replace(/(\*|\?)/g, "\\$1") | ||
.replace(/(\$|\;|\&|\|)/g, "\\$1") | ||
.replace(/(\(|\)|\<|\>)/g, "\\$1") | ||
.replace(/("|'|`)/g, "\\$1") | ||
.replace(/\{(?=(.*?(?:\,|\.).*?)\})/g, "\\{"); | ||
.replace(/\{(?=([^]*?(?:\,|\.)[^]*?)\})/g, "\\{"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Dash, this change isn't actually relevant - see #336
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.