Skip to content

Conversation

@enthropy7
Copy link

@enthropy7 enthropy7 commented Dec 31, 2025

What does this PR try to resolve?

This PR implements the text_direction_codepoint lint for Cargo.toml files:
Fixes #16373 and #16374

lint detects Unicode BiDi (bidirectional) control codepoints that can be used in "Trojan Source" attacks. These invisible characters can alter the visual display of text without changing its underlying representation, potentially making malicious manifest content appear benign. All like descritped in rustc's text_direction_codepoint_in_comment/literal lint but for Cargo manifests.

Detected codepoints:

U+202A LEFT-TO-RIGHT EMBEDDING
U+202B RIGHT-TO-LEFT EMBEDDING
U+202C POP DIRECTIONAL FORMATTING
U+202D LEFT-TO-RIGHT OVERRIDE
U+202E RIGHT-TO-LEFT OVERRIDE
U+2066 LEFT-TO-RIGHT ISOLATE
U+2067 RIGHT-TO-LEFT ISOLATE
U+2068 FIRST STRONG ISOLATE
U+2069 POP DIRECTIONAL ISOLATE

Default level: deny

How to test and review this PR?

cargo test -p cargo --test testsuite -- lints::text_direction_codepoint

Manual testing:

Create a manifest with BiDi codepoint (U+202E)

printf '[package]\nname = "foo"\nversion = "0.1.0"\nedition = "2024"\ndescription = "A \xE2\x80\xAEtest"\n' > Cargo.toml
mkdir -p src && echo 'fn main(){}' > src/main.rs
cargo +nightly check -Zcargo-lints

The lint can be configured via [lints.cargo]:

[lints.cargo]
text_direction_codepoint = "allow"  # or "warn"

@rustbot rustbot added A-workspaces Area: workspaces S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 31, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 31, 2025

r? @weihanglo

rustbot has assigned @weihanglo.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@epage
Copy link
Contributor

epage commented Dec 31, 2025

Feel free to clean up the commits for how they should be reviewed and merged, rather than reflecting the implementation.

On top of that, our contrib guide suggests adding tests in a commit before the feature, showing the current behavior. The feature commit would then update the tests to show the new behavior. When reviewing the feature commit, this helps remove all of the boilerplate from the view and see exactly what changed in behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes #16373 and #16374

I had forgotten that we had separate issues. The question is whether we should mirror rustc or lint them together. I had overlooked this conversation in #16374, focusing on the other parts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good starting point would be to check the history of rustc as to why they made them separate lints.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it’s easier to keep lints together cause we parse cargo in raw format so, but i will check and come back later after new year :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustc separated the lints because in Rust code, BiDi in string literals might be legitimate (RTL language support) while in comments it's always suspicious. In TOML there's no such distinction - BiDi anywhere in a manifest is suspicious (no legitimate RTL use case for dependency names/versions). Single lint makes sense for Cargo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable to me. Thanks for the investigation!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package.description, package.metadata, and workspace.metadata do up some possibilities.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those cases users can set text_direction_codepoint = "warn" or "allow" in [lints.cargo]. Deny-by-default is still appropriate since legitimate RTL in manifests is rare

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are likely valid reasons for package.description to have text direction code points, e.g. if they are written in a non-english language that uses a different direction. This becomes even bigger of a deal for *.metadata.

My main concern is actually around performance. I was going to suggest we start this as one lint and if there is enough of a motivating case, we split this into two lints with the current lint becoming a lint group. However, in thinking that through, I realized we can easily implement this lint without much of a performance impact.

The way I see this working is we have one function for checking both lints. This function would do a string search for the different code points in question in the Cargo.toml, collecting their spans. If they are not present, we bail out. If they are present, we use toml_parser to create Events. We then iterate through all of the Events to see if they include the span in question and based on EventKind, we choose which lint to fire.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i already started to work on your suggestion, forgot to answer. thanks, i like your idea, will push new method soon

@enthropy7
Copy link
Author

Feel free to clean up the commits for how they should be reviewed and merged, rather than reflecting the implementation.

On top of that, our contrib guide suggests adding tests in a commit before the feature, showing the current behavior. The feature commit would then update the tests to show the new behavior. When reviewing the feature commit, this helps remove all of the boilerplate from the view and see exactly what changed in behavior.

I can restructure commits if needed, but the tests wouldn't show meaningful info because before there was no lint to trigger. sorry btw, for next PR's i would take that info

@epage
Copy link
Contributor

epage commented Dec 31, 2025

I can restructure commits if needed, but the tests wouldn't show meaningful info because before there was no lint to trigger. sorry btw, for next PR's i would take that info

"before there was no lint to trigger": that is exactly what we want to see with this.

@enthropy7
Copy link
Author

enthropy7 commented Dec 31, 2025

I can restructure commits if needed, but the tests wouldn't show meaningful info because before there was no lint to trigger. sorry btw, for next PR's i would take that info

"before there was no lint to trigger": that is exactly what we want to see with this.

k, when i get to my PC in an hour will do it correctly 💯

@enthropy7
Copy link
Author

I can restructure commits if needed, but the tests wouldn't show meaningful info because before there was no lint to trigger. sorry btw, for next PR's i would take that info

"before there was no lint to trigger": that is exactly what we want to see with this.

Done, restructured commits

@epage
Copy link
Contributor

epage commented Dec 31, 2025

Thanks!

@enthropy7
Copy link
Author

Thanks!

np, for future PR's - will keep it in mind. thanks for marking this

@enthropy7 enthropy7 requested a review from epage January 1, 2026 12:53
@enthropy7
Copy link
Author

if anything is need - i’m here 24/7 :3 just wanna merge it and take the next task

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to fix CI job failures (rustfmt).

View changes since this review

Comment on lines 1381 to 1383
// Only check for BiDi codepoints in virtual workspace manifests.
// For package roots, the lint is run in emit_pkg_lints.
if matches!(self.root_maybe(), MaybePackage::Virtual(_)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this logic into the lint function itself?

In the future we might want to make the lint infra a LintContext, and register lints all at once, instead of the current manual calling functions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected it, thanks, i haven't thought about it (my bad)

.with_stderr_data(str![[r#"
[CHECKING] foo v0.0.1 ([ROOT]/foo)
[FINISHED] `dev` profile [unoptimized + debuginfo] target(s) in [ELAPSED]s
[ERROR] unicode codepoint changing visible direction of text present in manifest: `/u{202E}` (RIGHT-TO-LEFT OVERRIDE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[ERROR] unicode codepoint changing visible direction of text present in manifest: `/u{202E}` (RIGHT-TO-LEFT OVERRIDE)
[ERROR] unicode codepoint changing visible direction of text present in manifest

Can we instead putting "/u{202E} (RIGHT-TO-LEFT OVERRIDE)" as a label of the annotated span, like what rustc lint displays?

https://doc.rust-lang.org/rustc/lints/listing/deny-by-default.html#text-direction-codepoint-in-comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, done

Comment on lines 57 to 72
[ERROR] unicode codepoint changing visible direction of text present in manifest: `/u{202E}` (RIGHT-TO-LEFT OVERRIDE)
--> Cargo.toml:6:18
|
6 | description = "A �test� package"
| ^ this invisible unicode codepoint changes text flow direction
|
= [NOTE] `cargo::text_direction_codepoint` is set to `deny` by default
= [HELP] if their presence wasn't intentional, you can remove them
[ERROR] unicode codepoint changing visible direction of text present in manifest: `/u{202D}` (LEFT-TO-RIGHT OVERRIDE)
--> Cargo.toml:6:23
|
6 | description = "A �test� package"
| ^ this invisible unicode codepoint changes text flow direction
|
= [HELP] if their presence wasn't intentional, you can remove them
[ERROR] encountered 2 errors while running lints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not a blocker)

Do we want to have one error per finding, or one per line, or whathever rustc does if making sense to cargo? I found rustc's approach looks nicer btw

error: unicode codepoint changing visible direction of text present in comment
  --> $DIR/unicode-control-codepoints.rs:40:1
   |
LL | //"/*� } �if isAdmin� � begin admins only */"
   | ^^^^^-^^^-^^^^^^^^^^-^-^^^^^^^^^^^^^^^^^^^^^^
   | |    |   |          | |
   | |    |   |          | '\u{2066}'
   | |    |   |          '\u{2069}'
   | |    |   '\u{2066}'
   | |    '\u{202e}'
   | this comment contains invisible unicode text flow control codepoints
   |
   = note: these kind of unicode codepoints change the way text flows on applications that support them, but can cause confusion because they change the order of characters on the screen
   = help: if their presence wasn't intentional, you can remove them

https://github.com/rust-lang/rust/blob/fcd630976c460c819c4bbcaf107d0c94501205d8/tests/ui/parser/unicode-control-codepoints.stderr#L212-L225

6 | description = "A �test package"
| ^ this invisible unicode codepoint changes text flow direction
|
= [NOTE] `cargo::text_direction_codepoint` is set to `deny` by default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustc has this note, which I found useful explaining why the lint.

   = note: these kind of unicode codepoints change the way text flows on applications that support them, but can cause confusion because they change the order of characters on the screen

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added too

@enthropy7
Copy link
Author

@weihanglo thanks for your suggestions! it was very helpful. small diffs make final work more beautiful and user-friendly.

@weihanglo
Copy link
Member

Nice. Feel free to force-push. The commit history is expected to be what it should be merged and reviewed, not how it was developed.

@enthropy7
Copy link
Author

done, i think , thanks for all the help

@enthropy7
Copy link
Author

enthropy7 commented Jan 2, 2026

@epage hi, please take a look if you have some time, waiting for your opinion :)

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the question, looks good to me.

View changes since this review

let mut line_map = Vec::new();
line_map.push(0);
for (idx, ch) in contents.char_indices() {
if ch == '\n' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this match how annotate-snippets detect new lines?

cc @Muscraft

@enthropy7 enthropy7 force-pushed the master branch 2 times, most recently from 34744de to 209c2cc Compare January 6, 2026 23:38
@rustbot

This comment has been minimized.

Tests for detecting Unicode BiDi control codepoints in Cargo.toml.
Currently these pass as there is no lint implementation yet.
@rustbot
Copy link
Collaborator

rustbot commented Jan 6, 2026

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

group = group.element(
Level::HELP.message("if their presence wasn't intentional, you can remove them"),
);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust suggests escaping the codepoints in literals, and we should do the same.

Note: Basic strings are the only strings that support Unicode literal escaping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to provide suggestions for literal strings and bare keys, you can decode the string and then re-encode it using the equivalent basic string type using

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have been more specific in the style of suggestion I was referring to, as I didn't intend for this to get added to the "presence wasn't intentional" help message. I intended for it to be its own "group" in annotate-snippets terms, utilizing Patch, like we do here or here.

Comment on lines 229 to 241
let mut findings_by_line: std::collections::BTreeMap<usize, Vec<_>> =
std::collections::BTreeMap::new();
for finding in disallowed_findings {
let line_num = line_map
.iter()
.rposition(|&line_start| line_start <= finding.byte_idx)
.map(|i| i + 1)
.unwrap_or(1);
findings_by_line
.entry(line_num)
.or_insert_with(Vec::new)
.push(finding);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust groups codepoints by comment/literal, and we should match that. In our case, we would match by toml_parser::Event.

@rustbot rustbot added A-credential-provider Area: credential provider for storing and retreiving credentials A-testing-cargo-itself Area: cargo's tests labels Jan 7, 2026
@enthropy7 enthropy7 force-pushed the master branch 3 times, most recently from a026552 to 0c21d4f Compare January 7, 2026 16:21
@enthropy7
Copy link
Author

enthropy7 commented Jan 8, 2026

i think now it's closed all of the suggestions. looks pretty nice to me, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-credential-provider Area: credential provider for storing and retreiving credentials A-testing-cargo-itself Area: cargo's tests A-workspaces Area: workspaces S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add text-direction-codepoint-in-literal lint

5 participants