Skip to content

proposal: cmd/vet: detect homograph attacks #20115

Open
@lukechampine

Description

@lukechampine

A homograph attack is an attack that exploits the visual similarity of two glyphs. Traditionally, this has been used in phishing attacks to trick a user into visiting a malicious domain that looks identical to the real domain. However, homographs can also be used to sneak malicious source code past review. This is possible in any language that supports Unicode source code.

Here is a simple example of a homograph attack in Go source code:

package main

import (
	"log"
	"os"
)

func main() {
	log.SetFlags(0)
	f, err := os.Create("test")
	if err != nil {
		log.Fatal(err)
	}
	f.Close()
	if _, еrr := f.Write([]byte("data")); err != nil {
		log.Fatal(еrr)
	}
}

(Playground link)

The expected output is write test: file already closed, but the actual program prints nothing. This happens because e and е are homographs, and thus err and еrr refer to different variables. This example is not very threatening, but it should demonstrate that sophisticated attacks could be constructed using this mechanism.
In my analysis, strings are the most likely vector for a homograph attack. For example, a homograph could be inserted in a switch statement over runes or strings, such that a particular case branch would never be taken. A homograph could also be used in an import path, although sites like GitHub seem to do a good job of preventing users from registering names or repos that contain homographs. Finally, as in the example above, homographs could be inserted in variable names where scoping rules make the duplication difficult to detect.

I propose that go vet make a "best effort" to detect homograph attacks. This is a bit nuanced because there are many valid reasons to include Unicode characters in source code; distinguishing malicious homographs from harmless ones is probably impossible in general. However, I believe that a few simple heuristics can catch the vast majority of viable attacks. For example, go vet could flag identifiers that mix ASCII with known homographs.
I have already developed a simple tool for this purpose here, though it is currently too strict to be used in projects that contain harmless homographs. Perhaps an external tool is sufficient, but adding it to go vet increases the chance that a security-conscious project will be protected from such attacks. At any rate, I recommend that open-source projects at Google run any publicly-submitted patches through a homograph detector.

Metadata

Metadata

Assignees

No one assigned

    Labels

    AnalysisIssues related to static analysis (vet, x/tools/go/analysis)ProposalProposal-Hold

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions