Description
A homograph attack is an attack that exploits the visual similarity of two glyphs. Traditionally, this has been used in phishing attacks to trick a user into visiting a malicious domain that looks identical to the real domain. However, homographs can also be used to sneak malicious source code past review. This is possible in any language that supports Unicode source code.
Here is a simple example of a homograph attack in Go source code:
package main
import (
"log"
"os"
)
func main() {
log.SetFlags(0)
f, err := os.Create("test")
if err != nil {
log.Fatal(err)
}
f.Close()
if _, еrr := f.Write([]byte("data")); err != nil {
log.Fatal(еrr)
}
}
The expected output is write test: file already closed
, but the actual program prints nothing. This happens because e
and е
are homographs, and thus err
and еrr
refer to different variables. This example is not very threatening, but it should demonstrate that sophisticated attacks could be constructed using this mechanism.
In my analysis, strings are the most likely vector for a homograph attack. For example, a homograph could be inserted in a switch
statement over runes or strings, such that a particular case
branch would never be taken. A homograph could also be used in an import
path, although sites like GitHub seem to do a good job of preventing users from registering names or repos that contain homographs. Finally, as in the example above, homographs could be inserted in variable names where scoping rules make the duplication difficult to detect.
I propose that go vet
make a "best effort" to detect homograph attacks. This is a bit nuanced because there are many valid reasons to include Unicode characters in source code; distinguishing malicious homographs from harmless ones is probably impossible in general. However, I believe that a few simple heuristics can catch the vast majority of viable attacks. For example, go vet
could flag identifiers that mix ASCII with known homographs.
I have already developed a simple tool for this purpose here, though it is currently too strict to be used in projects that contain harmless homographs. Perhaps an external tool is sufficient, but adding it to go vet
increases the chance that a security-conscious project will be protected from such attacks. At any rate, I recommend that open-source projects at Google run any publicly-submitted patches through a homograph detector.