proposal: cmd/vet: detect homograph attacks

A [homograph attack](https://en.wikipedia.org/wiki/IDN_homograph_attack) is an attack that exploits the visual similarity of two glyphs. Traditionally, this has been used in phishing attacks to trick a user into visiting a malicious domain that looks identical to the real domain. However, homographs can also be used to sneak malicious source code past review. This is possible in any language that supports Unicode source code.

Here is a simple example of a homograph attack in Go source code:

```go
package main

import (
	"log"
	"os"
)

func main() {
	log.SetFlags(0)
	f, err := os.Create("test")
	if err != nil {
		log.Fatal(err)
	}
	f.Close()
	if _, еrr := f.Write([]byte("data")); err != nil {
		log.Fatal(еrr)
	}
}
```
([Playground link](https://play.golang.org/p/EturtcBpds))

The expected output is `write test: file already closed`, but the actual program prints nothing.  This happens because `e` and `е` are homographs, and thus `err` and `еrr` refer to different variables. This example is not very threatening, but it should demonstrate that sophisticated attacks could be constructed using this mechanism.
In my analysis, strings are the most likely vector for a homograph attack. For example, a homograph could be inserted in a `switch` statement over runes or strings, such that a particular `case` branch would never be taken. A homograph could also be used in an `import`  path, although sites like GitHub seem to do a good job of preventing users from registering names or repos that contain homographs. Finally, as in the example above, homographs could be inserted in variable names where scoping rules make the duplication difficult to detect.

I propose that `go vet` make a "best effort" to detect homograph attacks. This is a bit nuanced because there are many valid reasons to include Unicode characters in source code; distinguishing malicious homographs from harmless ones is probably impossible in general. However, I believe that a few simple heuristics can catch the vast majority of viable attacks. For example, `go vet` could flag identifiers that mix ASCII with known homographs.
I have already developed a simple tool for this purpose [here](https://github.com/NebulousLabs/glyphcheck), though it is currently too strict to be used in projects that contain harmless homographs. Perhaps an external tool is sufficient, but adding it to `go vet` increases the chance that a security-conscious project will be protected from such attacks. At any rate, I recommend that open-source projects at Google run any publicly-submitted patches through a homograph detector.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal: cmd/vet: detect homograph attacks #20115

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: cmd/vet: detect homograph attacks #20115

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions