Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disallow unbalanced bidirectional formatting in strings and comments #42918

Merged
merged 1 commit into from
Nov 11, 2021

Conversation

JeffBezanson
Copy link
Sponsor Member

See https://www.trojansource.codes/, rust-lang/rust#90461 for reference.

I don't believe anything needs to be done for single-line comments, since line (paragraph) endings also terminate bidi formatting directives.

@JeffBezanson JeffBezanson added domain:unicode Related to unicode characters and encodings parser Language parsing and surface syntax needs tests Unit tests are required for this change needs news A NEWS entry is required for this change security System security concerns and vulnerabilities kind:minor change Marginal behavior change acceptable for a minor release labels Nov 3, 2021
@JeffBezanson
Copy link
Sponsor Member Author

Since it's so easy, might as well check to see if this occurs in any packages :)

@nanosoldier runtests(ALL, vs = ":master")

@DilumAluthge
Copy link
Member

Would be good to backport this to 1.6 and 1.7.

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@Seelengrab
Copy link
Contributor

Looks good!

Are there other places where bidi overrides are allowed, other than strings and comments? A quick cursory test with Meta.parse suggests no, but I'm not intimately familiar with which characters are allowed where and the only mention about which characters are allowed that I could find are from the docs of isidentifier (should these be updated as well? This fix prevents Symbol("foo\u202a") by lieu of a parser error during string construction after all).

Would be good to backport this to 1.6 and 1.7.

Agreed - having this in the upcoming LTS is definitely a good thing.

possible new issues were detected.

A number of these seem to be due to DNS failures 🤔 Is there a way to check whether any of the failed runs threw the new parse error?

@Seelengrab
Copy link
Contributor

Just checking in what the status/plan for this is - since the fix seems proper, let's make sure this doesn't get forgotten and backported to 1.6 & 1.7.

@fredrikekre fredrikekre added backport 1.6 Change should be backported to release-1.6 backport 1.7 labels Nov 8, 2021
@JeffBezanson JeffBezanson added status:merge me PR is reviewed. Merge when all tests are passing and removed needs tests Unit tests are required for this change needs news A NEWS entry is required for this change labels Nov 10, 2021
@DilumAluthge DilumAluthge merged commit 2cfebad into master Nov 11, 2021
@DilumAluthge DilumAluthge deleted the jb/bidi branch November 11, 2021 23:03
@DilumAluthge DilumAluthge removed the status:merge me PR is reviewed. Merge when all tests are passing label Nov 11, 2021
JeffBezanson added a commit that referenced this pull request Nov 12, 2021
KristofferC pushed a commit that referenced this pull request Nov 12, 2021

@test_throws ParseError Meta.parse("""
function checkUserAccess(u::User)
if u.accessLevel != "user\u202e \u2066# users are not allowed\u2069\u2066"
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this was indented with tabs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be from the initial report/gist I sent - I just checked my local version and it seems to have a tab character there as well, though that wasn't intentional. Not sure how it got there. It certainly isn't a functional part of the vulnerability though.

KristofferC pushed a commit that referenced this pull request Nov 13, 2021
@KristofferC KristofferC removed backport 1.6 Change should be backported to release-1.6 backport 1.7 labels Nov 15, 2021
daviehh pushed a commit to daviehh/julia that referenced this pull request Nov 16, 2021
@moustachio-belvedere
Copy link

moustachio-belvedere commented Dec 5, 2021

I'm not sure how much of an issue this is in practice, but I thought it worth mentioning that the homoglyph-function (from the trojan-source repository) approach still works in 1.6.4:

https://github.com/moustachio-belvedere/trojan-source/blob/main/Julia/homoglyph-function.jl

@DilumAluthge
Copy link
Member

Hmmm. That's unfortunate. Even if the risk is mostly theoretical, it would be nice for us to get the mitigation working in 1.6.x.

@moustachio-belvedere
Copy link

moustachio-belvedere commented Dec 5, 2021

(To mention, just incase my comment may be seen to imply that it doesn't work in 1.7, this may also work for 1.7 I just haven't checked yet as my tiny laptop doesn't have sufficient SSD space left for multiple Julia installs simultaneously.)

would be nice for us to get the mitigation working in 1.6.x.

Sounds good to me 👍 I'm happy to create an issue if you think it might be missed here - I don't think I'm knowledgeable enough about the parser to push a fix myself.

@Seelengrab
Copy link
Contributor

(To mention, just incase my comment may be seen to imply that it doesn't work in 1.7, this may also work for 1.7 I just haven't checked yet as my tiny laptop doesn't have sufficient SSD space left for multiple Julia installs simultaneously.)

Just checked, also works on 1.7.

The trouble with homoglyph attacks are that to us, that's a valid identifier right now. IIRC we basically allow some Unicode Letter categories in identifiers, which has the sideeffect of allowing those. I think those would best be mitigated with either a parser warning (which I don't think we have right now) or in a linter.

@DilumAluthge
Copy link
Member

How breaking would it be to just disallow (i.e. make it a parser error) to have any of the relevant characters?

We could run PkgEval and just fix any cases that we find.

@Seelengrab
Copy link
Contributor

Seelengrab commented Dec 6, 2021

I'm not sure I'd use "fixing" here - it's imo perfectly valid to write some piece of code in a non-western script. "Fixing" would just mean arbitrarily favoring "our" script over another, to make it easier on some people while maybe being alienating to someone who just uses the letter their keyboard produces (rightfully so!). That's why I'm in favor of a linter or parser warning and not making this a hard error, preferably only when there really is a visual collision, instead of outright forbidding some characters.

So I think making that an error would be extremely breaking.

@JeffBezanson JeffBezanson added the status:triage This should be discussed on a triage call label Feb 14, 2022
@JeffBezanson
Copy link
Sponsor Member Author

From triage: we don't think we can do anything about this, because there are a lot of confusable characters and it is always possible to write confusing or misleading code. IDEs are a good place to help with this though.

@JeffBezanson JeffBezanson removed the status:triage This should be discussed on a triage call label Feb 17, 2022
staticfloat pushed a commit that referenced this pull request Dec 23, 2022
c42f added a commit to JuliaLang/JuliaSyntax.jl that referenced this pull request May 18, 2023
Disallow unbalanced Unicode bidirectional formatting directives within
strings and comments, to mitigate the "trojan source" vulnerability
https://www.trojansource.codes

See also JuliaLang/julia#42918
c42f added a commit to JuliaLang/JuliaSyntax.jl that referenced this pull request May 18, 2023
Disallow unbalanced Unicode bidirectional formatting directives within
strings and comments, to mitigate the "trojan source" vulnerability
https://www.trojansource.codes

See also JuliaLang/julia#42918
c42f added a commit to JuliaLang/JuliaSyntax.jl that referenced this pull request May 18, 2023
Disallow unbalanced Unicode bidirectional formatting directives within
strings and comments, to mitigate the "trojan source" vulnerability
https://www.trojansource.codes

See also JuliaLang/julia#42918
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:unicode Related to unicode characters and encodings kind:minor change Marginal behavior change acceptable for a minor release parser Language parsing and surface syntax security System security concerns and vulnerabilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants