This tool checks your files for existence of Unicode BIDI characters which can be misused for supply chain attacks to mitigate CVE-2021-42574. This tool was written in Rust and is distributed as a small (< 3MB) docker compatible container to allow fast and easy usage.
For an explanation of the attack, have a look at GitHub's blog entry or the original paper where the attack was published.
This package is mostly intended to be used via it's docker container. But local installation is possible of course.
Compilation requires at least Rust 1.56.0 because it uses the Rust 2021 Edition.
Clone this repository and run cargo install --path .
to install the binary for your current user. After that, you can invoke the bidi_detector
command.
Running the tool via it's official docker container is probably the easiest way to get started. To run it via docker, the following command should work to scan all files inside your current working directory:
docker run --rm -it -v $(pwd):/data ghcr.io/maweil/bidi_char_detector:latest
Depending on your system you may have to adapt this command slightly. If you use e.g. podman and have SELinux enabled, try the following command instead:
podman run --rm -it -v $(pwd):/data:Z ghcr.io/maweil/bidi_char_detector:latest
By default, all files will be checked. If you have binary files inside the current directory, the command will fail because it can't decode a non-UTF8 encoded file.
To adapt the command to your needs, place a file called bidi_config.toml
inside the root of your project.
You can find an example for it in this repository, see an example below. The options will be described in more detail below the example:
[general]
includes = [
"src/**/*",
"**/*.patch",
"**/*.json",
"**/Dockerfile",
"test/*.js"
]
excludes = [
".git/*",
"target/*"
]
[display]
show_details = true
This section includes two arrays (includes
and excludes
) where you can specify patterns of files to be scanned (or to be excluded from the scan).
Please make sure your patterns actually match the files inside the directory, not the directory name itself, otherwise your files will not be scanned.
If you want to scan all files and only exclude e.g. your .git
directory, the following configuration would do the trick:
[general]
includes = [
"**/*"
]
excludes = [
".git/*",
]
[display]
show_details = true
If you want to intead explicitly define which folder contains your source files, the following configuration example would scan all files in the src directory (without ignoring anything):
[general]
includes = [
"src/**/*"
]
excludes = [
]
[display]
show_details = true
The following settings are available in this section:
Setting | Description | Optional | Default | Introduced in |
---|---|---|---|---|
show_details |
Decides whether to print out line/pos and type of the detected BIDI character if found (see examples below) | No | - | v0.1.1 |
ignore_invalid_data |
Whether to print an error message when binary/non-UTF8 files are detected | Yes | true | v0.1.2 |
verbose |
Whether to print the filename even if no suspicious characters were found | Yes | true | v0.1.2 |
Example: show_details = true
, verbose = true
src/lib.rs - 0 BIDI characters
src/main.rs - 0 BIDI characters
test/example-commenting-out.js - 6 BIDI characters
Found character RLO (Right-to-Left Override), test/example-commenting-out.js:4:3
Found character LRI (Left-to-Right Isolate), test/example-commenting-out.js:4:7
Found character PDI (Pop Directional Isolate), test/example-commenting-out.js:4:20
Found character LRI (Left-to-Right Isolate), test/example-commenting-out.js:4:22
Found character RLO (Right-to-Left Override), test/example-commenting-out.js:6:20
Found character LRI (Left-to-Right Isolate), test/example-commenting-out.js:6:24
Found 6 potentially dangerous Unicode BIDI characters!
Example: show_details = false
, verbose = true
src/lib.rs - 0 BIDI characters
src/main.rs - 0 BIDI characters
test/example-commenting-out.js - 6 BIDI characters
Found 6 potentially dangerous Unicode BIDI characters!
Example: show_details = false
, verbose = false
test/example-commenting-out.js - 6 BIDI characters
Found 6 potentially dangerous Unicode BIDI characters!
All credits for detecting the attack including the list of relevant BIDI characters go to the original authors of the corresponding paper. Please cite their original paper when building on their work.
The file test/example-commenting-out.js
in this repository is a copy of commenting-out.js in their original repository. It's licensing follows the original repository (MIT License)
It is used for test purposes only here.
@article{boucher_trojansource_2021,
title = {Trojan {Source}: {Invisible} {Vulnerabilities}},
author = {Nicholas Boucher and Ross Anderson},
year = {2021},
journal = {Preprint},
eprint = {2111.00169},
archivePrefix = {arXiv},
primaryClass = {cs.CR},
url = {https://arxiv.org/abs/2111.00169}
}