Go的抽象语法树AST

拿到一个golang的工程后，如何从词法，语法的角度来分析源代码呢？golang提供了一系列的工具供我们使用：

go/scanner包提供词法分析功能，将源代码转换为一系列的token，以供go/parser使用
go/parser包提供语法分析功能，将这些token转换为AST(Abstract Syntax Tree 抽象语法树)

Scanner

任何编译器所做的第一步都是将源代码转换成token，这就是scanner的工作。
token可以是关键字，字符串值，变量名和函数名等等。
在golang中，每个token都以它所处的位置，类型和原始字面量来表示。

比如我门用如下代码扫描源代码的token

func TestScanner(t *testing.T) {
    src := []byte(`package main
import "fmt"
//comment
func main() {
  fmt.Println("Hello, world!")
}
`)

    var s scanner.Scanner
    fset := token.NewFileSet()
    file := fset.AddFile("", fset.Base(), len(src))
    s.Init(file, src, nil, 0)

    for {
        pos, tok, lit := s.Scan()
        fmt.Printf("%-6s%-8s%q\n", fset.Position(pos), tok, lit)

        if tok == token.EOF {
            break
        }
    }
}

结果(注意没有扫描出注释，需要的话要将s.Init的最后一个参数改为scanner.ScanComments)：

1:1   package "package"
1:9   IDENT   "main"
1:13  ;       "\n"
2:1   import  "import"
2:8   STRING  "\"fmt\""
2:13  ;       "\n"
4:1   func    "func"
4:6   IDENT   "main"
4:10  (       ""
4:11  )       ""
4:13  {       ""
5:3   IDENT   "fmt"
5:6   .       ""
5:7   IDENT   "Println"
5:14  (       ""
5:15  STRING  "\"Hello, world!\""
5:30  )       ""
5:31  ;       "\n"
6:1   }       ""
6:2   ;       "\n"
6:3   EOF     ""

看下go/token/token.go的源代码可知，token就是一堆定义好的枚举类型，对于每种类型的字面值都有对应的token。

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

// Package token defines constants representing the lexical tokens of the Go
// programming language and basic operations on tokens (printing, predicates).
//
package token

import "strconv"

// Token is the set of lexical tokens of the Go programming language.
type Token int

// The list of tokens.
const (
    // Special tokens
    ILLEGAL Token = iota
    EOF
    COMMENT

    literal_beg
    // Identifiers and basic type literals
    // (these tokens stand for classes of literals)
    IDENT  // main
    INT    // 12345
    FLOAT  // 123.45
    IMAG   // 123.45i
    CHAR   // 'a'
    STRING // "abc"
    literal_end
        ...
)

Parser

当源代码被扫描成token以后，结果将会被传递给parser。
parser负责将token转换成抽象语法树(AST)
编译时的错误就是在这个时候报告的

什么是AST呢，简单来说，AST(Abstract Syntax Tree)是使用树状结构表示源代码的语法结构，树的每一个节点就代表源代码中的一个结构。

package main

import (
	"go/ast"
	"go/parser"
	"go/token"
	"log"
	"path/filepath"
)

func main() {
	fset := token.NewFileSet()
	// 这里取绝对路径，方便打印出来的语法树可以转跳到编辑器
	path, _ := filepath.Abs("./demo.go")
	f, err := parser.ParseFile(fset, path, nil, parser.AllErrors)
	if err != nil {
		log.Println(err)
		return
	}
	// 打印语法树
	ast.Print(fset, f)
}

上面的代码将打印demo.go文件的语法树，demo.go文件如下：

package main

import (
	"context"
)

// Foo 结构体
type Foo struct {
	i int
}

// Bar 接口
type Bar interface {
	Do(ctx context.Context) error
}

// main方法
func main() {
    a := 1
}

demo.go文件已尽量简化，但其语法树的输出内容依旧十分庞大。我们截取部分来做一些简要的说明。

首先是文件所属的包名，和其声明在文件中的位置：

 0  *ast.File {
     1  .  Package: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:1:1
     2  .  Name: *ast.Ident {
     3  .  .  NamePos: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:1:9
     4  .  .  Name: "main"
     5  .  }
     ...

紧接着是Decls，也就是Declarations，其包含了声明的一些变量，方法，接口等：

...
     6  .  Decls: []ast.Decl (len = 4) {
     7  .  .  0: *ast.GenDecl {
     8  .  .  .  TokPos: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:3:1
     9  .  .  .  Tok: import
    10  .  .  .  Lparen: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:3:8
    11  .  .  .  Specs: []ast.Spec (len = 1) {
    12  .  .  .  .  0: *ast.ImportSpec {
    13  .  .  .  .  .  Path: *ast.BasicLit {
    14  .  .  .  .  .  .  ValuePos: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:4:2
    15  .  .  .  .  .  .  Kind: STRING
    16  .  .  .  .  .  .  Value: "\"context\""
    17  .  .  .  .  .  }
    18  .  .  .  .  .  EndPos: -
    19  .  .  .  .  }
    20  .  .  .  }
    21  .  .  .  Rparen: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:5:1
    22  .  .  }
 ....

可以看到该语法树包含了4条Decl记录，我们取第一条记录为例，该记录为*ast.GenDecl类型。不难看出这条记录对应的是我们的import代码段。始位置(TokPos)，左右括号的位置(Lparen,Rparen)，和import的包（Specs）等信息都能从语法树中得到。

语法树的打印信来自ast.File结构体：

// 该结构体位于标准包 go/ast/ast.go 中，有兴趣可以转跳到源码阅读更详尽的注释
type File struct {
	Doc        *CommentGroup   // associated documentation; or nil
	Package    token.Pos       // position of "package" keyword
	Name       *Ident          // package name
	Decls      []Decl          // top-level declarations; or nil
	Scope      *Scope          // package scope (this file only)
	Imports    []*ImportSpec   // imports in this file
	Unresolved []*Ident        // unresolved identifiers in this file
	Comments   []*CommentGroup // list of all comments in the source file
}

结合注释和字段名我们大概知道每个字段的含义，接下来我们详细梳理一下语法树的组成结构。

Node

整个语法树由不同的node组成，从源码注释中可以得知主要有如下三种node：

There are 3 main classes of nodes: Expressions and type nodes, statement nodes, and declaration nodes.

但实际在代码，出现了第四种node：Spec Node，每种node都有专门的接口定义：

...
// All node types implement the Node interface.
type Node interface {
	Pos() token.Pos // position of first character belonging to the node
	End() token.Pos // position of first character immediately after the node
}

// All expression nodes implement the Expr interface.
type Expr interface {
	Node
	exprNode()
}

// All statement nodes implement the Stmt interface.
type Stmt interface {
	Node
	stmtNode()
}

// All declaration nodes implement the Decl interface.
type Decl interface {
	Node
	declNode()
}
...

// A Spec node represents a single (non-parenthesized) import,
// constant, type, or variable declaration.
//
type (
	// The Spec type stands for any of *ImportSpec, *ValueSpec, and *TypeSpec.
	Spec interface {
		Node
		specNode()
	}
....
)

可以看到所有的node都继承Node接口，记录了node的开始和结束位置。上面的Decls正是declaration nodes。除去上述四种使用接口进行分类的node，还有些node没有再额外定义接口细分类别，仅实现了Node接口，为了方便描述，在本篇中我把这些节点称为common node。

Expression and Type

...
	// An Ident node represents an identifier.
	Ident struct {
		NamePos token.Pos // identifier position
		Name    string    // identifier name
		Obj     *Object   // denoted object; or nil
	}
...

Indent（identifier）表示一个标识符，比如Quick Start示例中表示包名的Name字段就是一个expression node：

 0  *ast.File {
     1  .  Package: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:1:1
     2  .  Name: *ast.Ident { <----
     3  .  .  NamePos: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:1:9
     4  .  .  Name: "main"
     5  .  }
     ...

type node很好理解，它包含一些复合类型，例如例子中的StructType,FuncType和InterfaceType。

...
	// A StructType node represents a struct type.
	StructType struct {
		Struct     token.Pos  // position of "struct" keyword
		Fields     *FieldList // list of field declarations
		Incomplete bool       // true if (source) fields are missing in the Fields list
	}

	// Pointer types are represented via StarExpr nodes.

	// A FuncType node represents a function type.
	FuncType struct {
		Func    token.Pos  // position of "func" keyword (token.NoPos if there is no "func")
		Params  *FieldList // (incoming) parameters; non-nil
		Results *FieldList // (outgoing) results; or nil
	}

	// An InterfaceType node represents an interface type.
	InterfaceType struct {
		Interface  token.Pos  // position of "interface" keyword
		Methods    *FieldList // list of methods
		Incomplete bool       // true if (source) methods are missing in the Methods list
	}
...

Statement

赋值语句，控制语句（if，else,for，select...）等均属于statement node。

...
	// An AssignStmt node represents an assignment or
	// a short variable declaration.
	//
	AssignStmt struct {
		Lhs    []Expr
		TokPos token.Pos   // position of Tok
		Tok    token.Token // assignment token, DEFINE
		Rhs    []Expr
	}
...

	// An IfStmt node represents an if statement.
	IfStmt struct {
		If   token.Pos // position of "if" keyword
		Init Stmt      // initialization statement; or nil
		Cond Expr      // condition
		Body *BlockStmt
		Else Stmt // else branch; or nil
	}
...

上面的例子中，在main函数中对变量a赋值的程序片段就属于AssignStmt:

...
 174  .  .  .  Body: *ast.BlockStmt {
   175  .  .  .  .  Lbrace: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:18:13
   176  .  .  .  .  List: []ast.Stmt (len = 1) {
   177  .  .  .  .  .  0: *ast.AssignStmt { <--- 这里
   178  .  .  .  .  .  .  Lhs: []ast.Expr (len = 1) {
   179  .  .  .  .  .  .  .  0: *ast.Ident {
   180  .  .  .  .  .  .  .  .  NamePos: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:19:2
   181  .  .  .  .  .  .  .  .  Name: "a"
...

Spec Node

Spec node只有3种，分别是ImportSpec，ValueSpec和TypeSpec：

	// An ImportSpec node represents a single package import.
	ImportSpec struct {
		Doc     *CommentGroup // associated documentation; or nil
		Name    *Ident        // local package name (including "."); or nil
		Path    *BasicLit     // import path
		Comment *CommentGroup // line comments; or nil
		EndPos  token.Pos     // end of spec (overrides Path.Pos if nonzero)
	}

	// A ValueSpec node represents a constant or variable declaration
	// (ConstSpec or VarSpec production).
	//
	ValueSpec struct {
		Doc     *CommentGroup // associated documentation; or nil
		Names   []*Ident      // value names (len(Names) > 0)
		Type    Expr          // value type; or nil
		Values  []Expr        // initial values; or nil
		Comment *CommentGroup // line comments; or nil
	}

	// A TypeSpec node represents a type declaration (TypeSpec production).
	TypeSpec struct {
		Doc     *CommentGroup // associated documentation; or nil
		Name    *Ident        // type name
		Assign  token.Pos     // position of '=', if any
		Type    Expr          // *Ident, *ParenExpr, *SelectorExpr, *StarExpr, or any of the *XxxTypes
		Comment *CommentGroup // line comments; or nil
	}

ImportSpec表示一个单独的import，ValueSpec表示一个常量或变量的声明，TypeSpec则表示一个type声明。上面的例子中，出现了ImportSpec和TypeSpec。

import (
	"context" // <--- 这里是一个ImportSpec node
)

// Foo 结构体
type Foo struct { // <--- 这里是一个TypeSpec node
	i int
}

Declaration Node

Declaration node也只有三种：BadDecl表示一个有语法错误的节点； GenDecl用于表示import, const，type或变量声明；FunDecl用于表示函数声明

...
type (
	// A BadDecl node is a placeholder for declarations containing
	// syntax errors for which no correct declaration nodes can be
	// created.
	//
	BadDecl struct {
		From, To token.Pos // position range of bad declaration
	}

	// A GenDecl node (generic declaration node) represents an import,
	// constant, type or variable declaration. A valid Lparen position
	// (Lparen.IsValid()) indicates a parenthesized declaration.
	//
	// Relationship between Tok value and Specs element type:
	//
	//	token.IMPORT  *ImportSpec
	//	token.CONST   *ValueSpec
	//	token.TYPE    *TypeSpec
	//	token.VAR     *ValueSpec
	//
	GenDecl struct {
		Doc    *CommentGroup // associated documentation; or nil
		TokPos token.Pos     // position of Tok
		Tok    token.Token   // IMPORT, CONST, TYPE, VAR
		Lparen token.Pos     // position of '(', if any
		Specs  []Spec
		Rparen token.Pos // position of ')', if any
	}

	// A FuncDecl node represents a function declaration.
	FuncDecl struct {
		Doc  *CommentGroup // associated documentation; or nil
		Recv *FieldList    // receiver (methods); or nil (functions)
		Name *Ident        // function/method name
		Type *FuncType     // function signature: parameters, results, and position of "func" keyword
		Body *BlockStmt    // function body; or nil for external (non-Go) function
	}
)
...

Common Node

除去上述四种类别划分的node,还有一些node不属于上面四种类别：

// Comment 注释节点，代表单行的 //-格式 或 /*-格式的注释.
type Comment struct {
    ...
}
...
// CommentGroup 注释块节点，包含多个连续的Comment
type CommentGroup struct {
    ...
}

// Field 字段节点, 可以代表结构体定义中的字段，接口定义中的方法列表，函数前面中的入参和返回值字段
type Field struct {
    ...
}
...
// FieldList 包含多个Field
type FieldList struct {
    ...
}

// File 表示一个文件节点
type File struct {
	...
}

// Package 表示一个包节点
type Package struct {
    ...
}

应用：为文件中所有接口方法添加context参数

实现这个功能我们需要四步：

遍历整个语法树
判断是否已经importcontext包，如果没有则import
遍历所有的接口方法，判断方法列表中是否有context.Context类型的入参，如果没有我们将其添加到方法的第一个参数
将修改过后的语法树转换成Go代码并输出

遍历语法树

语法树层级较深，嵌套关系复杂，如果不能完全掌握node之间的关系和嵌套规则，我们很难自己写出正确的遍历方法。不过好在ast包已经为我们提供了遍历方法：

func Walk(v Visitor, node Node) 

type Visitor interface {
	Visit(node Node) (w Visitor)
}

Walk方法会按照深度优先搜索方法（depth-first order）遍历整个语法树，我们只需按照我们的业务需要，实现Visitor接口即可。

Walk每遍历一个节点就会调用Visitor.Visit方法，传入当前节点。如果Visit返回nil，则停止遍历当前节点的子节点。本示例的Visitor实现如下：

// Visitor
type Visitor struct {
}
func (v *Visitor) Visit(node ast.Node) ast.Visitor {
	switch node.(type) {
	case *ast.GenDecl:
		genDecl := node.(*ast.GenDecl)
		// 查找有没有import context包
		// Notice：没有考虑没有import任何包的情况
		if genDecl.Tok == token.IMPORT {
			v.addImport(genDecl)
			// 不需要再遍历子树
			return nil
		}
	case *ast.InterfaceType:
		// 遍历所有的接口类型
		iface := node.(*ast.InterfaceType)
		addContext(iface)
		// 不需要再遍历子树
		return nil
	}
	return v
}

添加import

// addImport 引入context包
func (v *Visitor) addImport(genDecl *ast.GenDecl) {
	// 是否已经import
	hasImported := false
	for _, v := range genDecl.Specs {
		imptSpec := v.(*ast.ImportSpec)
		// 如果已经包含"context"
		if imptSpec.Path.Value == strconv.Quote("context") {
			hasImported = true
		}
	}
	// 如果没有import context，则import
	if !hasImported {
		genDecl.Specs = append(genDecl.Specs, &ast.ImportSpec{
			Path: &ast.BasicLit{
				Kind:  token.STRING,
				Value: strconv.Quote("context"),
			},
		})
	}
}

为接口方法添加参数

// addContext 添加context参数
func addContext(iface *ast.InterfaceType) {
	// 接口方法不为空时，遍历接口方法
	if iface.Methods != nil || iface.Methods.List != nil {
		for _, v := range iface.Methods.List {
			ft := v.Type.(*ast.FuncType)
			hasContext := false
			// 判断参数中是否包含context.Context类型
			for _, v := range ft.Params.List {
				if expr, ok := v.Type.(*ast.SelectorExpr); ok {
					if ident, ok := expr.X.(*ast.Ident); ok {
						if ident.Name == "context" {
							hasContext = true
						}
					}
				}
			}
			// 为没有context参数的方法添加context参数
			if !hasContext {
				ctxField := &ast.Field{
					Names: []*ast.Ident{
						ast.NewIdent("ctx"),
					},
					// Notice: 没有考虑import别名的情况
					Type: &ast.SelectorExpr{
						X:   ast.NewIdent("context"),
						Sel: ast.NewIdent("Context"),
					},
				}
				list := []*ast.Field{
					ctxField,
				}
				ft.Params.List = append(list, ft.Params.List...)
			}
		}
	}
}

将语法树转换成Go代码

format包为我们提供了转换函数，format.Node会将语法树按照gofmt的格式输出：

...
	var output []byte
	buffer := bytes.NewBuffer(output)
	err = format.Node(buffer, fset, f)
	if err != nil {
		log.Fatal(err)
	}
	// 输出Go代码
	fmt.Println(buffer.String())
...

输出结果如下：

package main

import (
        "context"
)

type Foo interface {
        FooA(ctx context.Context, i int)
        FooB(ctx context.Context, j int)
        FooC(ctx context.Context)
}

type Bar interface {
        BarA(ctx context.Context, i int)
        BarB(ctx context.Context)
        BarC(ctx context.Context)
}

可以看到我们所有的接口方的第一个参数都变成了context.Context。建议将示例中的语法树先打印出来，再对照着代码看，方便理解。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bec7c420b84c4e848e9259480d8eb11a.md

bec7c420b84c4e848e9259480d8eb11a.md

Go的抽象语法树AST

Scanner

Parser

Node

Expression and Type

Statement

Spec Node

Declaration Node

Common Node

应用：为文件中所有接口方法添加context参数

遍历语法树

添加import

为接口方法添加参数

将语法树转换成Go代码

Files

bec7c420b84c4e848e9259480d8eb11a.md

Latest commit

History

bec7c420b84c4e848e9259480d8eb11a.md

File metadata and controls

Go的抽象语法树AST

Scanner

Parser

Node

Expression and Type

Statement

Spec Node

Declaration Node

Common Node

应用：为文件中所有接口方法添加context参数

遍历语法树

添加import

为接口方法添加参数

将语法树转换成Go代码