Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

[Security] Make formatted expression free from CVE-2021-42574 (Trojan Source) #276

Open
ShamrockLee opened this issue Nov 25, 2021 · 5 comments
Labels
formatting Issue with the output format needs triage

Comments

@ShamrockLee
Copy link

ShamrockLee commented Nov 25, 2021

Proposal

Close the Unicode RTL sections, remove trailing unclosed RTL characters in strings and comments, and warn the user about their existence to prevent CVE-2021-42574.

CVE-2021-42574, a.k.a. the Trojan Source vulnerability, is resulted from the unclosed Unicode RTL control characters. It allows the attacker to alter the order of sections of words in a line, and to make a commented section / a section inside a string to appear at the right side of the comment/string.

https://github.com/nickboucher/trojan-source
https://www.trojansource.codes/
Boucher and Anderson (2021). Trojan Source: Invisible Vulnerabilities. https://trojansource.codes/trojan-source.pdf

Input

{ lib, hello }:
hello.overrideAttrs (oldAttrs:
  let
    scrSecure = builtins.trace "Using the secure source" oldAttrs.src;
  in {
    pname = oldAttrs.pname + "-secure";
    /*Replace the source with a secure one‮⁦src = srcSecure;⁩⁦*/
})

Output

{ lib, hello }:
hello.overrideAttrs (oldAttrs:
  let
    scrSecure = builtins.trace "Using the secure source" oldAttrs.src;
  in
  {
    pname = oldAttrs.pname + "-secure";
    /*Replace the source with a secure one‮⁦src = srcSecure;⁩⁦*/
  })

Desired output

{ lib, hello }:
hello.overrideAttrs (oldAttrs:
  let
    scrSecure = builtins.trace "Using the secure source" oldAttrs.src;
  in
  {
    pname = oldAttrs.pname + "-secure";
    /*Replace the source with a secure one‮⁦src = srcSecure;⁩‬*/
  })

The trick

The execution result of the input is equivalent to that of the desired output, which would be surprising to reviewers. This is due to the explicit formatting support of the Unicode BIDI (bi-direction) algorithm with control characters:

<RLE>...<PDF>
<LRE>...<PDF>
<RLO>...<PDF>
<LRO>...<PDF>
<RLI>...<PDI>
<LRI>...<PDI>
<FSI>...<PDI>

When a explicitly-directional-changing section is not closed, it may infect the following characters in the same line.
See https://www.unicode.org/reports/tr9/tr9-42.html for more detail.
When a explicitly-directional-changing section is not closed, it may infect the following characters in the same line, and that's what happen to the line 8 of the Output:

/*Replace the source with a secure one<RLO><LRI>src = srcSecure;<PDI><LRI>*/

The solution is to:

  1. Remove the <RLE>, <LRE>, <RLO>, <LRO>, <RLI>, <LRI>, <FSI> followed by ", '', */ or the end of line.
  2. Otherwise, add the corresponding <PDF> or <PDI> before ", '', */ or the end of line.
  3. Show warnings about the existence of the RTL control characters, especially when they are not closed in the unformatted file.

After that, the line should become

/*Replace the source with a secure one<RLO><LRI>src = srcSecure;<PDI><PDF>*/

(with the trailing <RLI> removed and <RLO> closed with <PDF>)

I'm not a Unicode expert. It would need someone who are more familiar with this topic to refine the proposal.

Reproduction of the example Nix expressions

  1. Prepare the following python scripts:

gen_code.py

#!/usr/bin/env python3

# Note the unclosed RTL section
code_string = '''
{ lib, hello }:
hello.overrideAttrs (oldAttrs:
  let
    scrSecure = builtins.trace "Using the secure source" oldAttrs.src;
  in {
    pname = oldAttrs.pname + "-secure";
    /*Replace the source with a secure one\N{RLO}\N{LRI}src = srcSecure;\N{PDI}\N{LRI}*/
})
''';

if __name__ == '__main__':
    print(code_string)

gen_code_clean.py

#!/usr/bin/env python3

# The RTL sections are now closed
code_string = '''
{ lib, hello }:
hello.overrideAttrs (oldAttrs:
  let
    scrSecure = builtins.trace "Using the secure source" oldAttrs.src;
  in {
    pname = oldAttrs.pname + "-secure";
    /*Replace the source with a secure one\N{RLO}\N{LRI}src = srcSecure;\N{PDI}\N{PDF}*/
})
''';

if __name__ == '__main__':
    print(code_string)
  1. Run
python3 ./gen_code.py > reproduce.nix
cp reproduce{,_formatted}.nix
nixpkgs-fmt reproduce_formatted.nix
python3 ./gen_code_clean.py > reproduce_clean.py
cp reproduce_clean{,_formatted}.nix
nixpkgs-fmt reproduce_clean_formatted.nix
  1. Get the content:
    • Input: reproduce.nix
    • Output: reproduce_formatted.nix
    • Desired output: reproduce_clean_formatted.nix
@ShamrockLee ShamrockLee added formatting Issue with the output format needs triage labels Nov 25, 2021
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/protect-nix-codebases-against-trojan-source-cve-2021-42574/16316/1

@erikarvstedt
Copy link

Related:
https://research.swtch.com/trojan
ziglang/zig#10074

Summary:
This has to be addressed in editors/code review tools, not at lower levels of the tooling stack.

@ShamrockLee
Copy link
Author

Related: https://research.swtch.com/trojan ziglang/zig#10074

Summary: This has to be addressed in editors/code review tools, not at lower levels of the tooling stack.

The main point in the two links is that the fix shouldn't be done in compilers (interpreters in the case of Nix). This formatter may be used as part of the editors or review tools (e.g. the jnoortheen.nix-ide VSCode extension), and the implementation may benefits developers using the related tools.

@zimbatm
Copy link
Member

zimbatm commented Dec 23, 2021

I'm fine having this in nixpkgs-fmt, we just need somebody to send in a PR. It would also be nice to have that as part of the Nix language as well, and the nixpkgs CI. It doesn't have to be an either/or situtation.

@ShamrockLee
Copy link
Author

I'm currently busy working on the finals, and my knowledge to Rust is limited. It would take several weeks before trying to implement it.

I'll open a PR containing the test data in case someone wants to take a look.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
formatting Issue with the output format needs triage
Projects
None yet
Development

No branches or pull requests

4 participants