Skip to content

Commit

Permalink
design: add go13compiler, go13linker, go15bootstrap
Browse files Browse the repository at this point in the history
These are ported from the original Google docs.
I am going to update the short links to point here,
and then perhaps I will stop getting requests for
access to the Google docs from people whose
companies block all access to Google docs outside
their domain.

Change-Id: Ic3dfb3566c7335e98865dfcc188230d295bf40d5
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/517535
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
  • Loading branch information
rsc committed Aug 9, 2023
1 parent f87e19c commit d661ed1
Show file tree
Hide file tree
Showing 3 changed files with 203 additions and 0 deletions.
81 changes: 81 additions & 0 deletions design/go13compiler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Go 1.3+ Compiler Overhaul


Russ Cox \
December 2013 \
golang.org/s/go13compiler

## Abstract

The Go compiler today is written in C. It is time to move to Go.
[**Update, 2014**: This work was completed and presented at GopherCon. See “[Go from C to Go](https://www.youtube.com/watch?v=QIE5nV5fDwA)”.]

[**Update, 2023.** This plan was originally published as a Google document. For easier access, it was converted to Markdown in this repository in 2023. Later work has overhauled the compiler further in a number of ways, validating the conjectures about the benefits of converting to Go. This document has only minor historical value now.]

Background

The “gc” Go toolchain is derived from the Plan 9 compiler toolchain. The assemblers, C compilers, and linkers are adopted essentially unchanged, and the Go compilers (in cmd/gc, cmd/5g, cmd/6g, and cmd/8g) are new C programs that fit into the toolchain.

Writing the compiler in C had some important advantages over using Go at the start of the project, most prominent among them the fact that, at first, Go did not exist and so could not be used to write a compiler, and the fact that, once Go did exist, it often changed in significant, backwards-incompatible ways. Using C instead of Go avoided both the initial and ongoing bootstrapping problems. Today, however, Go does exist, and its definition is stable as of Go 1, so the problems of bootstrapping are greatly reduced.

As the bootstrapping problems have receded, other engineering concerns have arisen that make Go much more attractive than C for the compiler implementation. The concerns include:

- It is easier to write correct Go code than to write correct C code.
- It is easier to debug incorrect Go code than to debug incorrect C code.
- Work on a Go compiler necessarily requires a good understanding of Go. Implementing the compiler in C adds an unnecessary second requirement.
- Go makes parallel execution trivial compared to C.
- Go has better standard support than C for modularity, for automated rewriting, for unit testing, and for profiling.
- Go is much more fun to use than C.

For all these reasons, we believe it is time to switch to Go compilers written in Go.

## Proposed Plan

We plan to translate the existing compilers from C to Go by writing and then applying an automatic translator. The conversion will proceed in phases, starting in Go 1.3 but continuing into future releases.

_Phase 1_. Develop and debug the translator. This can be done in parallel with ordinary development. In particular, it is fine for people to continue making changes to the C version of the compiler during this phase. The translator is a fair amount of work, but we are confident that we can build one that works for the specific case of translating the compilers. There are many corners of C that have no direct translation into Go; macros, unions, and bit fields are probably highest on the list. Fortunately (but not coincidentally), those features are rarely used, if at all, in the code being translated. Pointer arithmetic and arrays are also some work to translate, but even those are rare in the compiler, which primarily operates on trees and linked lists. The translator will preserve the comments and structure of the original C code, so the translation should be as readable as the current compiler.

_Phase 2_. Use the translator to convert the compilers from C to Go and delete the C copies. At this point we have transitioned to Go and still have a working compiler, but the compiler is still very much a C program. This may happen for Go 1.3, but that’s pretty aggressive. It is more likely to happen for Go 1.4.

_Phase 3_. Use some tools, perhaps derived from gofix and the Go oracle to split the compiler into packages, cleaning up and documenting the code, and adding unit tests as appropriate. This phase turns the compiler into an idiomatic Go program. This is targeted for Go 1.4.

_Phase 4a_. Apply standard profiling and measurement techniques to understand and optimize the memory and CPU usage of the compiler. This may include introducing parallelization; if so, the race detector is likely to be a significant help. This is targeted for Go 1.4, but parts may slip to Go 1.5. Some basic profiling and optimization may be done earlier, in Phase 3.

_Phase 4b_. (Concurrent with Phase 4a.) With the compiler split into packages with clearly defined boundaries, it should be straightforward to introduce a new middle representation between the architecture-independent unordered tree (Node*s) and the architecture-dependent ordered list (Prog*s) used today. That representation, which should be architecture-independent but contain information about precise order of execution, can be used to introduce order-dependent but architecture-independent optimizations like elimination of redundant nil checks and bounds checks. It may be based on SSA and if so would certainly take advantage of the lessons learned from Alan Donovan’s go.tools/ssa package.

_Phase 5_. Replace the front end with the latest (perhaps new) versions of go/parser and go/types. Robert Griesemer has discussed the possibility of designing new go/parser and go/types APIs at some point, based on experience with the current ones (and under new names, to preserve Go 1 compatibility). The work of connecting them to a compiler back end may help guide design of new APIs.

## Bootstrapping

With a Go compiler written in Go, there must be a plan for bootstrapping from scratch. The rule we plan to adopt is that the Go 1.3 compiler must compile using Go 1.2, Go 1.4 must compile using Go 1.3, and so on. Then there is a clear path to generating current binaries: build the Go 1.2 toolchain (written in C), use it to build the Go 1.3 toolchain, and so on. There will be a shell script to do this; it will take CPU time but not human time. The bootstrapping only needs to be done once per machine; the Go 1.x binaries can be kept in a known location and reused each time all.bash is run during the development of Go 1.(x+1).

Obviously, this bootstrapping path scales poorly over time. Before too many releases have gone by, it may make sense to write a back end for the compiler that generates C code. The code need not be efficient or readable, just correct. That C version would be checked in, just as today we check in the y.tab.c file generated by yacc. The bootstrap sequence would invoke gcc on that C code to build a bootstrap compiler, and the bootstrap compiler would be used to build the real compiler. Like in the other scheme, the bootstrap compiler binary can be kept in a known location and reused (not rebuilt) each time all.bash is run.

## Alternatives

There are a few alternatives that would be obvious approaches to consider, and so it is worth explaining why we have decided against them.

_Write new compilers from scratch_. The current compilers do have one very important property: they compile Go correctly (or at least correctly enough for nearly all current users). Despite Go’s simplicity, there are many subtle cases in the optimizations and other rewrites performed by the compilers, and it would be foolish to throw away the 10 or so man-years of effort that have gone into them.

_Translate the compiler manually_. We have translated other, smaller C and C++ programs to Go manually. The process is tedious and therefore error-prone, and the mistakes can be very subtle and difficult to find. A mechanical translator will instead generate translations with consistent classes of errors, which should be easier to find, and it will not zone out during the tedious parts. The Go compilers are also significantly larger than anything we’ve converted: over 60,000 lines of C. Mechanical help will make the job much easier. As Dick Sites wrote in 1974, “I would rather write programs to help me write programs than write programs.” Translating the compiler mechanically also makes it easier for development on the C originals to proceed unhindered until we are ready for the switch.

_Translate just the back ends and connect to go/parser and go/types immediately_. The data structures in the compiler that convey information from the front end to the back ends look nothing like the APIs presented by go/parser and go/types. Replacing the front end by those libraries would require writing code to convert from the go/parser and go/types data structures into the ones expected by the back ends, a very broad and error-prone undertaking. We do believe that it makes sense to use these packages, but it also makes sense to wait until the compiler is structured more like a Go program, into documented sub-packages of its own with defined boundaries and unit tests.

_Discard the current compilers and use gccgo (or go/parser + go/types + LLVM, or …)_. The current compilers are a large part of Go’s flexibility. Tying development of Go to a comparatively larger code base like GCC or LLVM seems likely to hurt that flexibility. Also, GCC is a large C (now partly C++) program and LLVM a large C++ program. All the reasons listed above justifying a move away from the current compiler code apply as much or more to these code bases.

## Long Term Use of C

Carried to completion, this plan still leaves the rest of the Plan 9 toolchain written in C. In the long term it would be nice to eliminate all C from the tree. This section speculates on how that might happen. It is not guaranteed to happen in this way or at all.

_Package runtime_. Most of the runtime is written in C, for many of the same reasons that the Go compiler is written in C. However, the runtime is much smaller than the compilers and it is already written in a mix of Go and C. It is plausible to convert the C to Go one piece at a time. The major pieces are the scheduler, the garbage collector, the hash map implementation, and the channel implementation. (The fine mixing of Go and C is possible here because the C is compiled with 6c, not gcc.)

_C compilers_. The Plan 9 C compilers are themselves written in C. If we remove all the C from Go package implementations (in particular, package runtime), we can remove these compilers: “go tool 6c” and so on would be no more, and .c files in Go package directory sources would no longer be supported. We would need to announce these plans early, so that external packages written partly in C have time to remove their uses. (Cgo, which uses gcc instead of 6c, would remain as a way to write parts of a package in C.) The Go 1 compatibility document excludes changes to the toolchain; deleting the C compilers is permitted.

_Assemblers_. The Plan 9 assemblers are also written in C. However, the assembler is little more than a simple parser coupled with a serialization of the parse tree. That could easily be translated to Go, either automatically or by hand.

_Linkers_. The Plan 9 linkers are also written in C. Recent work has moved most of the linker in into the compilers, and there is already a plan to rewrite what is left as a new, much simpler Go program. The part of the linker that has moved into the Go compiler will now need to be translated along with the rest of the compiler.

_Libmach-based tools: nm, pack, addr2line, and objdump_. Nm has already been rewritten in Go. Pack and addr2line can be rewritten any day. Objdump currently depends on libmach’s disassemblers, but those should be straightforward to convert to go, whether mechanically or manually, and at that point libmach itself can be deleted.



55 changes: 55 additions & 0 deletions design/go13linker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Go 1.3 Linker Overhaul

Russ Cox \
November 2013 \
golang.org/s/go13linker

## Abstract

The linker is one of the slowest parts of building and running a typical Go program. To address this, we plan to split the linker into two pieces. Perhaps one can be written in Go.

[**Update, 2023.** This plan was originally published as a Google document. For easier access, it was converted to Markdown in this repository in 2023. Later work overhauled the linker a second time, greatly improving its structure, efficiency, and code quality. This document has only minor historical value now.]

## Background

The linker has always been the slowest part of the Plan 9 toolchain, and it is now the slowest part of the Go toolchain. Ken Thompson’s [overview of the toolchain](http://plan9.bell-labs.com/sys/doc/compiler.html) concludes:

> The new compilers compile quickly, load slowly, and produce medium quality object code. The compilers are relatively portable, requiring but a couple of weeks’ work to produce a compiler for a different computer. For Plan 9, where we needed several compilers with specialized features and our own object formats, this project was indispensable. It is also necessary for us to be able to freely distribute our compilers with the Plan 9 distribution.
>
> Two problems have come up in retrospect. The first has to do with the division of labor between compiler and loader. Plan 9 runs on multi-processors and as such compilations are often done in parallel. Unfortunately, all compilations must be complete before loading can begin. The load is then single-threaded. With this model, any shift of work from compile to load results in a significant increase in real time. The same is true of libraries that are compiled infrequently and loaded often. In the future, we may try to put some of the loader work back into the compiler.
That document was written in the early 1990s. The future is here.

## Proposed Plan

The current linker performs two separable tasks. First, it translates an input stream of pseudo-instructions into executable code and data blocks, along with a list of relocations. Second, it deletes dead code, merges what’s left into a single image, resolves relocations, and generates a few whole-program data structures such as the [runtime symbol table](http://golang.org/s/go12symtab).

The first part can be factored out into a library - liblink - that can be linked into the assemblers and compilers. The object files written by 6a, 6c, or 6g and so on would be written by liblink and then contain executable code and data blocks and relocations, the result of the first half of the current linker.

The second part can be handled by what’s left of the linker after extracting liblink. That remaining program which would read the new object files and complete the link. That linker is a small amount of code, the bulk of it architecture-independent. It is possible that it could be merged into a single architecture-independent program invoked as “go tool ld”. It is even possible that it could be rewritten in Go, making it easy to parallelize large links. (See the section below for how to bootstrap.)

To start, we will focus on getting the new split working with C code. The exploration of using Go will happen only once the rest of the change is done.

To avoid churn in the usage of the tools, the generated object files will keep the existing suffixes .5, .6, .8. Perhaps in Go 1.3 we will even include shim programs named 5l, 6l, and 8l that invoke the new linker. These shim programs would be retired in Go 1.4.

## Object Files

The new split requires a new object file format. The current objects contain pseudo-instruction streams, but the new objects will contain executable code and data blocks along with relocations.

A natural question is whether we should adopt an existing object file format, such as ELF. At first, we will use a custom format. A Go-specific linker is required to build runtime data structures like the symbol table, so even if we used ELF object files we could not reuse a standard ELF linker. ELF files are also considerably more general and ELF semantics considerably more complex than the Go-specific linker needs. A custom, less general object file format should be simpler to generate and simpler to consume. On the other hand, ELF can be processed by standard tools like readelf, objdump, and so on. Once the dust has settled, though, and we know exactly what we need from the format, it is worth looking at whether the use of ELF makes sense.

The details of the new object file are not yet worked out. The rest of this section lists some design considerations.

- Obviously the files should be as simple as possible. With few exceptions, anything that can be done in the library half of the linker should be. Possible surprises include the stack split code being done in the library half, which makes object files OS-specific, although they already are due to OS-specific Go code in packages, and the software floating point work being done in the library half, making ARM object files GOARM-specific (today nothing GOARM-specific is done until the linker runs).
- We should make sure that object files are usable via mmap. This would reduce copying during I/O. It may require changing the Go runtime to simply panic, not crash, on SIGSEGV on non-nil addresses.
- Pure Go packages consist of a single object file generated by invoking the Go compiler once on the complete set of Go source files. That object file is then wrapped in an archive. We should arrange that a single object file is also a valid archive file, so that in that common case there is no wrapping step needed.

## Bootstrapping

If the new Go linker is written in Go, there is a bootstrapping problem: how do you link the linker? There are two approaches.

The first approach is to maintain a bootstrap list of CLs. The first CL in the sequence would have the current linker, written in C. Each subsequent step would be a CL containing a new linker that can be linked using the previous linker. The final binaries resulting from the sequence can be made available for download. The sequence need not be too long and could be made to coincide with milestones. For example, we could arrange that the Go 1.3 linker can be compiled as a Go 1.2 program, the Go 1.4 linker can be compiled as a Go 1.3 program, and so on. The recorded sequence makes it possible to re-bootstrap if needed but also provides a way to defend against the [Trusting Trust problem](http://cm.bell-labs.com/who/ken/trust.html). Another way to bootstrap would be to compile gccgo and use it to build the Go 1.3 linker.

The second approach is to keep the C linker even after we have a better one written in Go, and to keep both mostly feature-equivalent. The version written in C only needs to keep enough features to link the one written in Go. It needs to pick up some object files, merge them, and write out an executable. There’s no need for cgo support, no need for external linking, no need for shared libraries, no need for performance. It should be a relatively modest amount of code (perhaps just a few thousand lines) and should not need to change very often. The C version would be built and used during make.bash but not installed. This approach is easier for other developers building Go from source.

It doesn’t matter much which approach we take, just that there is at least one viable approach. We can decide once things are further along.
Loading

0 comments on commit d661ed1

Please sign in to comment.