Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't ignore newlines when fuzzing #332

Merged
merged 14 commits into from
Jul 15, 2022

Conversation

ericcornelissen
Copy link
Owner

No description provided.

* Update test fixtures for PowerShell on Windows

... to include examples of whitespace and how it should be escaped. The
examples were obtained through fuzzing. Existing examples with relevant
characters have been updated as well.

The test macro used when running the fixtures has been updated to render
the various whitespace characters being added here. This is to ensure
unique test names.

* Escape necessary whitespace characters for PowerShell

For simplicity, this escapes all whitespace by converting it into spaces
(U+0020). The alternative, escaping whitespace with a backtick (`) seems
to be possible, but is fragile. Based on fuzzing, it doesn't always work
as expected, leading to unexpected changes in the input string. It is
deemed preferable to just replace all spaces as it's simpler to explain/
understand than seemingly random changes to the input.

In the future more granular escaping for (certain) whitespace characters
can be considered.
* Update test fixtures for Bash, Dash, and Zsh on Unix

... to include examples of whitespace and how it should be escaped. The
examples were obtained based on the tests for PowerShell, as well as the
definition of the `\s` group for Regular Expression in JavaScript (the
former only includes `\u0085` that is not included in the latter).

All characters were manually testing using the script:

    import { execSync } from "node:child_process";
    import assert from "node:assert";
    import * as shescape from "shescape";

    const options = {
      interpolation: true,
      shell: "zsh",
      encoding: "utf8",
    };

    const whitespaceChars = [
      // Excluded because they will fail:
      // "\u0009", <- Converted to a normal space by Bash, Dash, and Zsh
      // "\u000A", <- Errors on Bash, Dash, and Zsh
      "\u000B", "\u000C", "\u000D", " ", "\u0085", "\u00A0", "\u1680",
      "\u2000", "\u2001", "\u2002", "\u2003", "\u2004", "\u2005",
      "\u2006", "\u2007", "\u2008", "\u2009", "\u200A", "\u2028",
      "\u2029", "\u202F", "\u205F", "\u3000", "\uFEFF",
    ];

    for (const char of whitespaceChars) {
      const userInput = `foo${char}bar`;

      const escapeOutput = execSync(
        `node test/_echo.js ${shescape.escape(userInput, options)}`,
        options
      );

      const quoteOutput = execSync(
        `node test/_echo.js ${shescape.quote(userInput, options)}`,
        options
      );

      assert.strictEqual(escapeOutput, quoteOutput);
    }

The test macro used when running the fixtures has been updated further
to render all whitespace characters correctly (to ensure unique test
names). This was updated to 1) output `\u{hhhh}` for readability (more
clearly demarking unicode characters from "normal" characters) and 2)
use unique character indices for all characters to more clearly
communicate the chosen order.

* Escape necessary whitespace characters for Bash, Dash, and Zsh

Only the newline character is necessary to escape for each shell
following the experiment outlined above.

In the future more granular escaping for whitespace characters can be
considered.
Omit many characters from being escaped by PowerShell and instead rely
on PowerShell's own conversion from many whitespace characters to a
"normal" space character (' '). Both LF and CR characters are still
escaped to space characters as both cause whatever follows them to be
interpreted as a new command. A notable exception is \uFEFF, which is
not converted into a space by PowerShell but also doesn't cause any
issues.

Additionally, the whitespace fixtures for PowerShell were updated to
follow the same order as whitespace fixtures for Unix and the replacing
logic for the macro title (`test/unit/_macros.js`).

The script used to verify the above claims:

    import { execSync } from "node:child_process";
    import assert from "node:assert";
    import * as shescape from "shescape";

    const options = {
      interpolation: true,
      shell: "powershell.exe",
      encoding: "utf8",
    };

    const whitespaceChars = [
      "\u0009", // converted to space by PowerShell
      "\u000A", // escaped to " " by Shescape
      "\u000B", // converted to space by PowerShell
      "\u000C", // converted to space by PowerShell
      "\u000D", // escaped to " " by Shescape
      " ", // normal whitespace
      "\u0085", // converted to space by PowerShell
      "\u00A0", // converted to space by PowerShell
      "\u1680", // converted to space by PowerShell
      "\u2000", // converted to space by PowerShell
      "\u2001", // converted to space by PowerShell
      "\u2002", // converted to space by PowerShell
      "\u2003", // converted to space by PowerShell
      "\u2004", // converted to space by PowerShell
      "\u2005", // converted to space by PowerShell
      "\u2006", // converted to space by PowerShell
      "\u2007", // converted to space by PowerShell
      "\u2008", // converted to space by PowerShell
      "\u2009", // converted to space by PowerShell
      "\u200A", // converted to space by PowerShell
      "\u2028", // converted to space by PowerShell
      "\u2029", // converted to space by PowerShell
      "\u202F", // converted to space by PowerShell
      "\u205F", // converted to space by PowerShell
      "\u3000", // converted to space by PowerShell
      // "\uFEFF", // see end of script
    ];

    for (const char of whitespaceChars) {
      const userInput = `foo${char}bar`;

      const escapeOutput = execSync(
        `node test/_echo.js ${shescape.escape(userInput, options)}`,
        options
      );
      assert.strictEqual(escapeOutput, "foo bar\n");
    }

    for (const char of ["\uFEFF"]) {
      const userInput = `foo${char}bar`;

      const escapeOutput = execSync(
        `node test/_echo.js ${shescape.escape(userInput, options)}`,
        options
      );

      const quoteOutput = execSync(
        `node test/_echo.js ${shescape.quote(userInput, options)}`,
        options
      );

      assert.strictEqual(escapeOutput, quoteOutput);
    }
@ericcornelissen ericcornelissen added bug Something isn't working test Relates to testing labels Jul 13, 2022
* Don't ignore newlines when fuzzing CMD

Update the `_common.js` fuzz logic to not strip all newline characters
from the argument if the shell being fuzzed is CMD, and update the
`getExpectedOutput` logic to omit/change whitespace following how the
shell does it.

* Update test fixtures for CMD on Windows

... to include examples of whitespace and how it should be escaped. The
examples were obtained based on the tests for Bash, Dash, PowerShell,
and Zsh.

All characters were manually testing using the script:

    import { execSync } from "node:child_process";
    import assert from "node:assert";
    import * as shescape from "shescape";

    const options = {
      interpolation: true,
      shell: "cmd.exe",
      encoding: "utf8",
    };

    const whitespaceChars = [
      // "\u0009", <- Converted to a normal space by cmd.exe
      // "\u000A", <- Trailing content is ignored on cmd.exe
      // "\u000D", <- Removed by cmd.exe
      "\u000B", "\u000C", " ", "\u0085", "\u00A0", "\u1680", "\u2000",
      "\u2001", "\u2002", "\u2003", "\u2004", "\u2005", "\u2006",
      "\u2007", "\u2008", "\u2009", "\u200A", "\u2028", "\u2029",
      "\u202F", "\u205F", "\u3000", "\uFEFF",
    ];
    for (const char of whitespaceChars) {
      const userInput = `foo${char}bar`;

      const escapeOutput = execSync(
        `node test/_echo.js ${shescape.escape(userInput, options)}`,
        options
      );

      const quoteOutput = execSync(
        `node test/_echo.js ${shescape.quote(userInput, options)}`,
        options
      );

      assert.strictEqual(escapeOutput, quoteOutput);
    }

    const escapeOutputTab = execSync(
      `node test/_echo.js ${shescape.escape("foo\u0009bar", options)}`,
      options
    );
    assert.strictEqual(escapeOutputTab, "foo bar\n");

    const escapeOutputLF = execSync(
      `node test/_echo.js ${shescape.escape("foo\u000Abar", options)}`,
      options
    );
    assert.strictEqual(escapeOutputLF, "foo\n");

    const escapeOutputCR = execSync(
      `node test/_echo.js ${shescape.escape("foo\u000Dbar", options)}`,
      options
    );
    assert.strictEqual(escapeOutputCR, "foobar\n");

* Escape necessary whitespace characters for CMD

Only the newline character is necessary to escape for each shell
following the experiment outlined above. In contrast to Bash, Dash,
PowerShell, and Zsh this is also required when the argument is being
quoted (based on fuzzing).

In the future more granular escaping for whitespace characters can be
considered.

The `getExpectedOutput` logic is also updated to make the same
transformation as Shescape does.
@ericcornelissen ericcornelissen marked this pull request as ready for review July 14, 2022 19:10
* Add items to fuzz corpus

Add two item to fuzz corpus that trigger the bug on Bash where the tilde
in combination with line terminating characters isn't escaped correctly.

- `f28febc`: contains a carriage return character, this is the item that
  uncovered the bug in the continuous integration system.
- `dece2a6`: contains a newline character, this was added separately as
  this currently does not require escaping the tilde (due to line 49 of
  `unix.js`). By having this case in the corpus, changes to line 49 that
  will require changes to the escaping of the tilde in the future will
  be detectable by fuzzing.

* Update test fixtures for Bash on Unix

...to include examples of the line terminating characters where the
tilde should be escaped but currently isn't. This list of characters was
obtained using the following script:

    import { execSync } from "node:child_process";
    import assert from "node:assert";
    import * as shescape from "shescape";

    const options = {
      interpolation: true,
      shell: "bash",
      encoding: "utf8",
    };

    const safeWhitespaceChars = [
      // "\u0009", "\u000A", "\u000B", "\u000C", " ", "\u0085",
      // "\u00A0", "\u1680", "\u2000", "\u2001", "\u2002", "\u2003",
      // "\u2004", "\u2005", "\u2006", "\u2007", "\u2008", "\u2009",
      // "\u200A", "\u202F", "\u205F", "\u3000", "\uFEFF",
    ];

    for (const char of safeWhitespaceChars) {
      const userInput = `foo=${char}:~`;

      const escapeOutput = execSync(
        `node test/_echo.js ${shescape.escape(userInput, options)}`,
        options
      );

      assert.match(escapeOutput, /foo=[\s\u0085]:~\n/);
    }

    const unsafeWhitespaceChars = [
      "\u000D",
      "\u2028",
      "\u2029",
    ];

    for (const char of unsafeWhitespaceChars) {
      const userInput = `foo=${char}:~`;

      const escapeOutput = execSync(
        `node test/_echo.js ${shescape.escape(userInput, options)}`,
        options
      );

      assert.doesNotMatch(escapeOutput, /foo=[\u000D\u2028\u2029]:~\n/);
    }

* Update escaping of tilde for Bash

Update the escaping of the tilde character for bash to explicitly allow
line terminating characters. It turns out that the `.` wildcard in
regular expression in JavaScript do not include line terminating
characters. Note that the newline character is absant from this list,
this is because it's omitted entirely (on line 49).
src/unix.js Fixed Show resolved Hide resolved
* Add item to fuzz corpus

Add an item to fuzz corpus that triggers the bug on CMD where a trailing
forward slash in combination with a carriage return character results in
a double quote being outputted (similar to PowerShell's behaviour).

* Update test fixtures for CMD on Windows

So that it is now expected that carriage return characters are replaced
by a single space character.

Only escaping carriage return characters when `interpolation:true` was
manually tested through fuzzing, this showed that carriage return
characters always need to be escaped.

The following script was verified whether or not other line terminating
characters needed to be escaped as well:

    import { execSync } from "node:child_process";
    import assert from "node:assert";
    import fs from "node:fs";
    import * as shescape from "shescape";

    const tmp = fs.readFileSync(
      "./crash-" +
      "31ed7643aba69fe2d776af3aee587bb7899165af5ed3846c6f70327f2eec4713"
    ).toString();

    const newlineChars = [
      "\u000A",
      // "\u000D", <- '\r', does not work as we know
      "\u2028",
      "\u2029",
    ];

    for (const char of newlineChars) {
      const userInput = tmp.replace(/\r/, char);

      const escapeOutput = execSync(
        `node test/_echo.js ${shescape.quote(userInput)}`
      ).toString();

      assert.strictEqual(escapeOutput.trim(), tmp.trim());
    }

* Update escaping of whitespace for CMD on Windows

Note: the `_common.js` fuzz logic was updated accordingly to get the
correct expected output.
Based on [1]:

> Note that the m multiline flag doesn't change the dot behavior. So to
> match a pattern across multiple lines, the character class `[^]` can
> be used - it will match any character including newlines.

--
1. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes
* Add item to fuzz corpus

Add an item to the fuzz corpus that triggers the bug on Bash/Dash where
brace expansion happens if there's a line terminating character within
the braces.

* Update test fixtures for Bash/Dash

...to include exmaples of the line terminating characters where the
curly brakcet should be escaped but currently isn't. The list of
characters is based on the experiments carried out for the tilde.

* Update escaping of curly bracket for Bash/Dash

Like the escaping of tildes for bash, use `[^]` to match _all_
characters, including newlines.

Note that Zsh does not (currently) need to be changed as it always has
curly braces escaped.

* Update the CHANGELOG
Include "newlines" (line feeds) in the item regarding tidles and curly
braces for bash as prior to release containing these changes the problem
also existed for line feeds (just not now because line feeds are
removed entirely). As a result, the language can be simplified as well.
@ericcornelissen ericcornelissen merged commit aceea73 into main Jul 15, 2022
@ericcornelissen ericcornelissen deleted the test/fuzz-interpolation-w-newlines branch July 15, 2022 07:37
.replace(/(^|\s)(~|#)/g, "$1\\$2")
.replace(/(\*|\?)/g, "\\$1")
.replace(/(\$|\;|\&|\|)/g, "\\$1")
.replace(/(\(|\)|\<|\>)/g, "\\$1")
.replace(/("|'|`)/g, "\\$1")
.replace(/\{(?=(.*?(?:\,|\.).*?)\})/g, "\\{");
.replace(/\{(?=([^]*?(?:\,|\.)[^]*?)\})/g, "\\{");
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Dash, this change isn't actually relevant - see #336

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fuzz Relates to fuzzing test Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant