-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: cmd/vet: detect homograph attacks #20115
Comments
How do you suggest your tool could be made free from false positives? That should happen before inclusion in vet is considered. Edit: I've noticed the heuristics comment, but I'd be more convinced by an implementation in your tool. |
You could make this slightly less prone to false positives if you collect two sets, one of plain identifiers, the other of homograph-containing identifiers mapped to their apparent strings, and if those two sets overlap, that's suspicious. Homographs attacks in strings (targeting file names or URLs, for example) are more problematic because strings can be broken up, for example "goo"+"gle" and "g00g"+"1e" (using not-a-real-homograph for clarity) to defeat the simple set-matching. Even reporting all such strings and requiring a //vet:homograph comment to note permitted cases is still vulnerable to trickier string constructions in arrays of bytes, etc. We could go down the rabbit-hole of flow-analysis (if the string can be copied to a filename context or a URL context, care a lot more about its provenance), and perhaps we should. |
@mvdan is it a requirement that
There are also quite a few issues on the tracker about vet false positives, e.g. #11843, in which Rob says:
To me this suggests that As far as I can tell, if |
I believe there has been a push recently to make I assume you want this check to be in the group that would be run by default, and not in the group that have to be enabled explicitly due to false positives. |
For context re vet with no false positives: #18084 Rob's comment that you referenced predates this proposal and effort. |
I think a good place to start would be the overlap check that @dr2chase suggested, and a check for mixing ASCII and homographs in string literals (including import paths). This should catch the majority of viable attacks, and both checks can be circumvented by refactoring if needed. I will update |
That sounds like a good idea. To find false positives, on top of the Go source code itself, you could use @rsc's corpus of well-known Go code: https://github.com/rsc/corpus |
If I may ask a naive question, what is the threat model here, and how will vet address that threat? If the concern is import paths, it might be better for go get to reject such import paths. If the concern is variable names, where will that code come from such that vet is a relevant factor in helping to catch it? If the concern is strings containing urls and user-provided data, maybe it would be better to have good library support for detecting these, say in net/url or a golang.org/x/text package. |
https://godoc.org/golang.org/x/text/secure/precis#Profile.Compare (or https://godoc.org/golang.org/x/net/idna for URL's) |
Using the new detection rules, the standard library contains only one string literal that gets flagged: There are also some packages where homographs are mixed with explicit escape sequences. Example from Most of the other false-positives are foreign-language names that contain accented characters. Example from I did not detect any variable names that contain a mix of ASCII and homographs. These results show that most packages containing false-positives contain a lot of false positives, so the maintainers of those packages would probably disable homograph checking. This shouldn't incur much of a security risk as long as these packages contain mostly text and little logic.
This seems reasonable to me. Since import paths are typically domain names, we could employ the same strategies used to thwart the IDN homograph attack. Non-URL import paths probably aren't a threat.
The attack scenario is code submitted via pull request to an open-source project, either by an anonymous user or a compromised team member. Running
I definitely support runtime detection of malicious strings, but it's a separate problem. In fact, it's probably a more important problem to address, since users are much more numerous than developers. |
See also #20209. TL;DR, I think we should solve this in code review tools, not the language. |
Blocked on #20209. |
Unicode® Technical Standard #39 UNICODE SECURITY MECHANISMS FYI. |
Oops, sorry, I didn't see the status is Hold. Forwarding to 20209... |
A homograph attack is an attack that exploits the visual similarity of two glyphs. Traditionally, this has been used in phishing attacks to trick a user into visiting a malicious domain that looks identical to the real domain. However, homographs can also be used to sneak malicious source code past review. This is possible in any language that supports Unicode source code.
Here is a simple example of a homograph attack in Go source code:
(Playground link)
The expected output is
write test: file already closed
, but the actual program prints nothing. This happens becausee
andе
are homographs, and thuserr
andеrr
refer to different variables. This example is not very threatening, but it should demonstrate that sophisticated attacks could be constructed using this mechanism.In my analysis, strings are the most likely vector for a homograph attack. For example, a homograph could be inserted in a
switch
statement over runes or strings, such that a particularcase
branch would never be taken. A homograph could also be used in animport
path, although sites like GitHub seem to do a good job of preventing users from registering names or repos that contain homographs. Finally, as in the example above, homographs could be inserted in variable names where scoping rules make the duplication difficult to detect.I propose that
go vet
make a "best effort" to detect homograph attacks. This is a bit nuanced because there are many valid reasons to include Unicode characters in source code; distinguishing malicious homographs from harmless ones is probably impossible in general. However, I believe that a few simple heuristics can catch the vast majority of viable attacks. For example,go vet
could flag identifiers that mix ASCII with known homographs.I have already developed a simple tool for this purpose here, though it is currently too strict to be used in projects that contain harmless homographs. Perhaps an external tool is sufficient, but adding it to
go vet
increases the chance that a security-conscious project will be protected from such attacks. At any rate, I recommend that open-source projects at Google run any publicly-submitted patches through a homograph detector.The text was updated successfully, but these errors were encountered: