Skip to content

Latest commit

 

History

History
520 lines (396 loc) · 21.9 KB

Design.rst

File metadata and controls

520 lines (396 loc) · 21.9 KB

Architecture of Gazelle

Gazelle is a tool that generates and updates Bazel build files for Go projects that follow the conventional "go build" project layout. It is intended to simplify the maintenance of Bazel Go projects as much as possible.

This document describes how Gazelle works. It should help users understand why Gazelle behaves as it does, and it should help developers understand how to modify Gazelle and how to write similar tools.

Gazelle generates and updates build files according the algorithm outlined below. Each of the steps here is described in more detail in the sections below.

  • Build a configuration from command line arguments and special comments in the top-level build file. See Configuration.
  • For each directory in the repository:
    • Read the build file if one is present.
    • If the build file should be updated (based on configuration):
      • Apply transformations to the build file to migrate away from deprecated APIs. See Fixing build files.
      • Scan the source files and collect metadata needed to generate rules for the directory. See Scanning source files.
      • Generate new rules from the build metadata collected earlier. See Generating rules.
      • Merge the new rules into the directory's build file. Delete any rules which are now empty. See Merging and deleting rules.
    • Add the library rules in the directory's build file to a global table, indexed by import path.
  • For each updated build file:
    • Use the library table to map import paths to Bazel labels for rules that were added or merged earlier. See Resolving dependencies.
    • Merge the resolved rules back into the file.
    • Format the file using buildifier and emit it according to the output mode: write to disk, print the whole file, or print the diff.

Godoc: config

Gazelle stores configuration information in Config objects. These objects contain settings that affect the behavior of most packages in the program. For example:

  • The list of directories that Gazelle should update.
  • The path of the repository root directory. Bazel package names are based on paths relative to this location.
  • The current import path prefix and the directory where it was set. Gazelle uses this to infer import paths for go_library rules.
  • A list of build tags that Gazelle considers to be true on all platforms.

Config objects apply to individual directories. Each directory inherits the Config from its parent. Values in a Config may be modified within a directory using directives written in the directory's build file. A directive is a special comment formatted like this:

# gazelle:key value

Here are a few examples. See the full list of directives.

  • # gazelle:prefix - sets the Go import path prefix for the current directory.
  • # gazelle:build_tags - sets the list of build tags which Gazelle considers to be true on all platforms.

There are a few directives which are not applied to the Config object but are interpreted directly in packages where they are relevant.

  • # gazelle:ignore - the build file should not be updated by Gazelle. Gazelle may still index its contents so it can resolve dependencies in other build files.
  • # gazelle:exclude path/to/file - the named file should not be read by Gazelle and should not be included in srcs lists. If this refers to a directory, Gazelle won't recurse into the directory. This directive may appear multiple times.

Godoc: merger

From time to time, APIs in rules_go are changed or updated. Gazelle helps users stay up to date with these changes by automatically fixing deprecated usage.

Minor fixes are applied by Gazelle automatically every time it runs. However, some fixes may delete or rename existing rules. Users must run gazelle fix to apply these fixes. By default, Gazelle will only warn users that gazelle fix should be run.

Here are a few of the fixes Gazelle performs. See Fix command transformations for a full list.

  • Squash cgo libraries: Gazelle will remove cgo_library rules and merge their attributes into go_library rules that reference them. This is a major fix and is only applied with gazelle fix.
  • Migrate library attributes: Gazelle replaces library attributes with embed attributes. The only difference between these is that library (which is now deprecated) accepts a single label, while embed accepts a list. This is a minor fix and is always applied.

Users can prevent Gazelle from modifying rules, attributes, or individual values by writing # keep comments above them.

Godoc: packages

Nearly all of the information needed to build a program with the standard Go SDK is implied by directory structure, file names, and file contents. This is why go build doesn't require any sort of build file. The go/build package in the standard library collects this information.

Unfortunately, go/build can only collect information for one platform at a time. Gazelle needs to generate build files that work on all platforms, so we have our own implementation of this logic.

Gazelle extracts build metadata from source files and contents in much the same way that the standard go/build package does. It gets the following information from file names:

  • File extension (e.g., .go, .c, .proto). Normally, only .go, .s, and .h files are included in Go rules. If any cgo code is present, then C/C++ files are also included. .proto files are also used to build proto rules. Other files (e.g., .txt) are ignored.
  • Test suffix. For example, if a file is named foo_test.go, it will be included in a test target instead of a library or binary target.
  • OS and architecture suffixes. For example, a file named foo_linux_amd64.go will be listed in the linux_amd64 section of the target it belongs to.

Gazelle gets the following information from file contents:

  • Package name. This is syntactically the first part of every .go file. All files in the same directory must have the same package name (except for external test sources, which have a package name ending with _test). If there are multiple packages, Gazelle will choose one that matches the directory name (if present) or report an error.
  • Imported libraries. Go import paths are usually URLs. Imports in platform-specific source files are also platform-specific.
  • Build tags. The Go toolchain recognizes comments beginning with // +build before the package declaration. These tags tell the build system that a file should only be built for specific platforms. See this article for more information.
  • Whether cgo code is present. This affects how packages are built and whether C/C++ files are included.
  • C/C++ compile and link options (specified in #cgo directives in cgo comments). These may be platform-specific.

In most cases, only the top of the file is parsed. For Go files, we use the standard go/parser package. For proto files, we use regular expressions that match package, go_package, and import statements.

Gazelle stores build metadata in a Package object. Currently, we only support one Package per directory (which is also what the Go SDK supports), but this will be expanded in the future. Package objects contain some top-level metadata (like the package name and directory path), along with several target objects (GoTarget and ProtoTarget).

Target objects correspond directly to rules that will be generated later. They store lists of sources, imports, and flags in PlatformStrings objects.

PlatformStrings objects store strings in four sections: a generic list, an OS-specific dictionary, an architecture-specific dictionary, and an OS-and-architecture-specific dictionary. The keys in the dictionaries are OS names, architecture names, or OS-and-architecture pairs; the values are lists of strings. The same string may not appear more than once in a list and may not appear in more than one section. This is due to a Bazel requirement: the same label may not appear more than once in a deps list.

Godoc: rules

Once build metadata has been extracted from the sources in a directory, Gazelle generates rules for building those sources.

Generated rules are formatted as CallExpr objects. CallExpr is defined in the buildifier build library. This is the same library used to parse and format build files. This lets us manipulate newly generated rules and existing rules with the same code.

We may generate the following rules:

  • proto_library and go_proto_library are generated if there was at least one .proto source file.
  • go_library is generated if there was at least one non-test source. This may embed the go_proto_library if there was one.
  • go_test rules are generated for internal and external tests. Internal tests embed the go_library while external tests depend on the go_library as a separate package.
  • go_binary is generated if the package name was main. It embeds the go_library.

Rules are named according to a pluggable naming policy, but there is currently only one policy: libraries are named go_default_library, tests are named go_default_test, and binaries are named after the directory. The go_default_library name is an historical artifact from before we had index-based dependency resolution. We'll need to move away from this naming scheme in the future (#5) before we support multiple packages (#7).

Sources, imports, and flags within each target are converted to expressions in a straightforward fashion. The lists within PlatformStrings are converted to list expressions. Dictionaries are converted to calls to select expressions (when Bazel evaluates a select expression, it will choose one of several provided lists, based on config_setting rules). Lists and select expressions may be added together. For example:

go_library(
    name = "go_default_library",
    srcs = [
        "terminal.go",
    ] + select({
        "@io_bazel_rules_go//go/platform:darwin": [
            "util.go",
            "util_bsd.go",
        ],
        "@io_bazel_rules_go//go/platform:linux": [
            "util.go",
            "util_linux.go",
        ],
        "@io_bazel_rules_go//go/platform:windows": [
            "util_windows.go",
        ],
        "//conditions:default": [],
    }),
    ...
)

At this point, Gazelle does not have enough information to generate expressions deps attributes. We only have a list of import strings extracted from source files. These imports are stored temporarily in a special _gazelle_imports attribute in each rule. Later, the imports are converted to Bazel labels (see Resolving dependencies), and this attribute is replaced with deps.

Godoc: merger

Merging is the process of combining generated rules with the corresponding rules in an existing build file. If no build file exists in a directory, a new file is created with generated rules, and no merging is performed.

Merging occurs in two phases: pre-resolve, and post-resolve. This is due to an interdependence with dependency resolution. Dependency resolution uses a table of merged library rules, so it can't be performed until the pre-resolve merge has occurred. After dependency resolution, we need to merge newly generated deps attributes; this is done in the post-resolve merge. The two phases use the same algorithm.

During the merge process, Gazelle attempts to match generated rules with existing rules that have the same name and same kind. Rules are only merged if both name and kind match. If an existing rule has the same name as a generated rule but a different kind, the generated rule will not be merged. If no existing rule matches a generated rule, the generated rule is simply appended to the end of the file. Existing rules that don't match any generated rule are not modified.

When Gazelle identifies a matching pair of rules, it combines each attribute according to the algorithm below. If an attribute is present in the generated rule but not in the existing rule, it is copied to the merged rule verbatim. If an attribute is present in the existing rule but not the generated rule, Gazelle behaves as if the generated attribute were present but empty.

  • For each value in the existing rule's attribute:
    • If the value also appears in the generated rule's attribute or is marked with a # keep comment, preserve it. Otherwise, delete it.
  • For each value in the generated rule's attribute:
    • If the value appears in the generated rule's attribute, ignore it. Otherwise, add it to the merged rule.
  • If the merged attribute is empty, delete it.

When a value is present in both the existing and generated attributes, we use the existing value instead of the generated value, since this preserves comments.

Some attributes are considered unmergeable, for example, visibility and gc_goopts. Gazelle may add these attributes to existing rules if they are not already present, but existing values won't be modified or deleted.

Gazelle has several mechanisms for preserving manual modifications to build files. Some of these mechanisms work automatically; others require explicit comments.

  • Gazelle will not modify or delete rules that don't appear to have been generated by Gazelle.
  • As mentioned above, some attributes are considered unmergeable. Gazelle may set initial values for these but won't delete or replace existing values.
  • # keep comments may be attached to any rule, attribute, or value to prevent Gazelle from modifying it.
  • # gazelle:exclude <file> directives can be used to prevent Gazelle from adding files to source lists (for example, checked-in .pb.go files). They can also prevent Gazelle from recursing into directories that contain unbuildable code (e.g., testdata).
  • # gazelle:ignore directives prevent Gazelle from making any modifications to build files that contain them.

Deletion is a special case of the merging algorithm.

When Gazelle generates rules for a package (see Generating rules), it actually produces two lists of rules: a list of rules for buildable targets, and a list of empty rules that may be deleted. The empty rules have no attributes other than name.

The empty rules are merged using the same algorithm as the other generated rules. If, after merging, an empty rule has no attributes that would make the rule buildable (for example, srcs, or deps), the rule will be deleted.

Godoc: resolve

When Gazelle generates rules for a package (see Generating rules), it stores names of the libraries imported by each rule in a special _gazelle_imports attribute. During dependency resolution, Gazelle maps these imports to Bazel labels and replaces _gazelle_imports with deps.

Before dependency resolution starts, Gazelle builds a table of all known libraries. This includes go_library, go_proto_library, and proto_library rules. The table is populated by scanning build files after the pre-resolve merge, so existing and newly generated rules are included in the table, and deleted rules are excluded. Once all library rules have been added, Gazelle indexes the table by language-specific import path.

Gazelle resolves each import string in _gazelle_imports as follows:

  • If the import is part of the standard library, it is dropped. Standard library dependencies are implicit.
  • If the import is provided by exactly one rule in the library table, the label for that rule is used.
  • If the import is provided by multiple libraries, we attempt to resolve the ambiguity.
    • For Go, we apply the vendoring algorithm. Vendored libraries aren't visible outside of the vendor directory's parent.
    • Go libraries that are embedded by other Go libraries are not considered. Embedded libraries may be incomplete.
    • When an ambiguity can't be resolved, Gazelle logs an error and skips the dependency.
  • If the import is not provided by any rule in the import table, we attempt to resolve the dependency using heuristics:
    • If the import path starts with the current prefix (set with a # gazelle:prefix directive or on the command line), we construct a label by concatenating the prefix directory and the portion of the import path below the prefix into a package name.
    • Otherwise, the import path is considered external and is resolved according to the external mode set on the command line.
      • In external mode, Gazelle determines the portion of the import path that corresponds to a repository using golang.org/x/tools/go/vcs. This part of the path is converted into a repository name (for example, @org_golang_x_tools_go_vcs), and the rest is converted to a package name.
      • In vendored mode, Gazelle constructs a label by prepending vendor/ to the import path.

Note that visibility attributes are not considered when resolving imports. This was part of an initial prototype, but it was confusing in many situations.

Gazelle is a regular Go program. It can be built, installed, and run without Bazel, using the regular Go SDK.

$ go install github.com/bazelbuild/bazel-gazelle/cmd/gazelle@latest
$ gazelle -go_prefix example.com/project

We lightly discourage this method of running Gazelle. All developers on a project should use the same version of Gazelle to ensure the build files they generate are consistent. The easiest way to accomplish this is to build and run Gazelle through Bazel. Gazelle may added to a WORKSPACE file, built as a normal go_binary, then installed or run from the bazel-bin/ directory.

$ bazel build @bazel_gazelle//cmd/gazelle
$ bazel-bin/external/bazel_gazelle/cmd/gazelle/gazelle -go_prefix example.com/project

It's usually better to invoke Gazelle through a wrapper script though. This saves typing and ensures Gazelle is run with a consistent set of arguments. We provide a Bazel rule that generates such a wrapper script. Developers may add a snippet like the one below to a build file:

load("@bazel_gazelle//:def.bzl", "gazelle")

gazelle(
    name = "gazelle",
    command = "fix",
    external = "vendored",
    prefix = "example.com/project",
)

This script may be built and executed in a single command with bazel run.

$ bazel run //:gazelle

This is the most convenient way to run Gazelle, and it's what we recommend to users. However, there are two issues with running Gazelle in this fashion. First, binaries executed by bazel run are run in the Bazel execroot, not the user's current directory. The wrapper script uses a hack (dereferencing symlinks) to jump to the top of the workspace source tree before running Gazelle. Second, bazel run holds a lock on the Bazel output directory. This means Gazelle cannot invoke Bazel without deadlocking. Commands like bazel query would be helpful for detecting generated code, but it's not safe to use them.

To avoid these limitations, the wrapper script may be copied to the workspace and optionally checked into version control. When the wrapper script is run directly (without bazel run), it will rebuild itself to ensure no changes are needed. If the rebuilt script differs from the running script, it will prompt the user to copy the rebuilt script into the workspace again.

$ bazel build //:gazelle
Target //:gazelle up-to-date:
  bazel-bin/gazelle.bash
____Elapsed time: 1.326s, Critical Path: 0.00s
$ cp bazel-bin/gazelle.bash gazelle.bash
$ ./gazelle.bash

Gazelle has the following dependencies:

github.com/bazelbuild/bazel-skylib
Skylark utility used to generate wrapper script in the gazelle rule.
github.com/bazelbuild/buildtools/build
Used to parse and rewrite build files.
github.com/bazelbuild/rules_go
Used to build and test Gazelle through Bazel. Gazelle can aslo be built on its own with the Go SDK.
golang.org/x/tools/vcs
Used during dependency resolution to determine the repository prefix for a given import path. This uses the network.