This quarter at Northwestern, I had the opportunity to take an EECS independent study. I was interested in learning about adding WASM support for native debuggers (like GDB) to make C++ code compiled to machine code via WASM as an intermediate easy to debug. This document serves to aggregate a list of awesome online resources and repos that I found helpful during the quarter as well as some of my notes.
- Intro to WASM
- How debuggers work
- DWARF
- Making WASM work with native debuggers
- Wasm LLVM Backend
- WAVM, a native Wasm embedder
Wasm is a virtual instruction set for a stack-based virtual machine. Like x86 or ARM, it is a possible compilation target for (at the moment) systems languages like C/C++/Rust. Unlike x86 and ARM, Wasm is portable, meaning that a Wasm implementation can be deployed anywhere (in browser JavaScript engines, but also in non-web embeddings.)
Wasm has near native performance. Wasm binaries can be compiled AOT or JIT, and in either case, the compiler toolchains can make optimizations. This fact, and the fact that Wasm instructions are relatively close to native instructions, make Wasm pretty fast. The most common use case for WebAssembly at the moment is to execute C/C++/Rust code in the browser - there are a rich set of APIs that allow JS and Wasm to pass data and functions back and forth. Web applications can essentially 'hand off' computationally intense work to Wasm and see performance boosts as a result of Wasm's near native speed.
Wasm has a textual representation, which is useful primarily for debugging purposes. It can also be used to write Wasm programs by hand, although Wasm is intended to be generated by a compiler. Emscripten/Binaryen was the first toolchain to compile C/C++ to Wasm - it did so in a somewhat roundabout way. Emscripten uses LLVM (the Clang frontend and Fastcomp backend) to compile C/C++ to asm.js, a low level subset of JavaScript. Then, Binaryen, a compiler and toolchain libary, is used to compile the asm.js to Wasm. Currently, you can use Clang to compile your C/C++ straight to Wasm using the stable LLVM Wasm backend by simply passing Wasm as your target when you use Clang.
As mentioned before, Wasm execution is modeled as a stack machine - this means that every instruction either/or pushes or pops values from the stack. A stack machine was chosen to describe Wasm over a SSA/virtual register or AST-based scheme,to reduce the size of the binaries (more on this later). Binary size is very important for Wasm's use on the web, as it is important to reduce the amount of data being streamed over a network for latency purposes.
Wasm is more complicated than a simple stack machine - this is because it offers structured control flow. WebAssembly does not allow jumps or gotos like x86 - instead, Wasm defines blocks (for loops) that can be broken out of. According to the Wasm spec, to break out of a loop, the br instruction must indicate how many nested structures to break out of. If this number is greater than the current nesting level, it is considered invalid. Wasm's stack must keep track not only of intermediate computation values, but also the active nested control flow state, and also the call frame of active functions. Intuitively, these three stacks are interdependent, but one could model these three stacks independently. These stacks are not instantiated at runtime (and are implicit) - they are instantiated when Wasm is compiled down to machine code in order to translate instructions that manipulate the stack to register based instructions on x86.
Here are some great resources to check out:
- Formal Wasm Spec and Semantics: https://github.com/WebAssembly/spec/blob/master/papers/pldi2017.pdf
- Simple Wasm text format tutorial: https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format
- Wasm Binary Format: http://webassembly.github.io/spec/core/binary/modules.html
The keystone to debugger implementations on Linux is a syscall called ptrace. Because ptrace is a syscall, it can read and write data, access the state of registers, and can cause the processor to execute instructions and stop. To set up debugging properly, the tracee (program being debugged) must call ptrace with PTRACE_TRACEME request. To do this, code must be written to split a child process that runs code to be traced from the parent process which serves as the debugger. After the process is forked, the child calls ptrace and then proceeds to call the exec syscall to replace the process's image with the program to be traced. The ptrace syscall stops the child program before it begins executing. Control returns to parent, which can can then invoke PTRACE to instruct the child to do one of many things (a simple example is singlestep, which instructs the child to execute an instruction and stop). It can also copy the state of the child programs registers or unwind the stack.
To set a breakpoint at a specific code location address, ptrace is used to add a trap interrupt (INT3) at a specific code location. When the tracee tries to run the process and eventually reaches the trap instruction, it will stop and send the OS a signal and pass control back to the parent process which can be used to inspect the state of the tracee.
What if you don't know the actual code address that you want to break at? In most debugging use cases, we place breakpoints at line numbers or function names not actual memory addresses. DWARF data is used to translate line numbers to memory addresses. More on that next.
Here are some great resources to check out:
- Video from Meeting Cpp: https://www.youtube.com/watch?v=Q3Rm95Mk03c
- Eli Bendersky's incredible debugger implemenation tutorial with code samples: https://eli.thegreenplace.net/2011/01/23/how-debuggers-work-part-1
DWARF is a binary file format used to enable source-level debugging. DWARF is typically generated by a compiler - the compiler tracks source level data during parsing and lexing and constructs the DWARF data accordingly.
For the context of debuggers, there are two main DWARF sections to focus our attention on:
- Debug Info. According to LLVM documentation, the Debug Info is a hierarchical tag+attribute format - not dissimilar from a binary XML. It compactly describles source information (functions, types, namespaces)
- Debug Line. The debug Line section maps line and column information to markers representing 'new statements', the ends of function prologues, as well as the end of translation units.
Debuggers use the Debug Line and Debug Info data and some relatively simple logic to retrieve memory addresses from function names and line numbers.
There are two main approaches to providing debugger support for C/C++/Rust code compiled to machine code via Wasm. The first way is to develop new debuggers particularly for the new code format. The second is to make modifications to the Wasm infrastructure in order to allow the use of existing debuggers (like GDB). The Wasm community has selected the second option, probably because writing a custom Wasm debugger is a lot of overhead. Yury Delendik, a Mozilla engineer, has made significant progress on option 2 by hacking on Wasmtime, a Wasm JIT built with the Cranelift Compiler Infrastructure (seems like a lightweight LLVM written in Rust). Based on reading some of his Wasmtime commits, here are the steps that I think must be taken to add GDB compatability to Wasm via Clang + Wasm backend(the C++ to Wasm compiler) and WAVM (a Wasm to machine code JIT backed by LLVM):
-
The existing Wasm LLVM backend must generate DWARF that maps C++ source line information to locations in a Wasm binary. To do so, the DWARF format must be extended, as Wasm does not have the concept of registers or explicit memory locations. Instead of tracing register data like in native C++ DWARF, Wasm DWARF must maintain the state of the stack(s). Instead of mapping functions from explicit memory locations in the Code section of an ELF binary like in the native C++ DWARF, Wasm DWARF must isntead map C++ functions to offsets from the start of the Wasm binary's function section. Furthermore, this DWARF data must somehow be packaged within the existing Wasm binary format. Thankfully, this is not too difficult to do, as the Wasm binary specifies custom sections. According to the Wasm spec, custom sections "do not contribute, or othwerwise, affect, the WebAssembly semantics" and "may be ignored by any implementation". In essence, the working proposal for Wasm debugging assigns a single custom section per every DWARF section. Next, the Wasm to machine code compiler must not only compile the Wasm to machine code, but it should also transform the Wasm DWARF to C++ DWARF - values from the stack(s) should be translated to registers, function offsets should be translated to explicit memory locations, among others. The following steps pertain to this.
-
WAVM must be modified to track Wasm source line data. To do this, the parser, lexer, and front-end must be extended. WAVM currently maps source line information to lexed tokens, but only uses the source line information to provide detailed error messages when parsing fails (ex. WAVM tries to parse malformed WASM). LLVM has a DIBuilder libary that essentially allows one to store debug info inside of an LLVM IR Node. It is unclear, however, whether or not this debug information is translated during IR optimization passes - this is something that I would have to take a deep dive into.
-
WAVM must parse all DWARF related custom sections as DWARF. Once the the DWARF data is parsed, memory representations of the DWARF must be constructed. Delendik uses an awesome Rust library called Gimli to parse the DWARF - throughout his DWARF transformation modules, he simply passes around Gimli objects. As far as I can tell, there isn't a super simple C++ Dwarf parsing library, so this may be somewhat difficult. There is an arcane C library called libdwarf, but it may be difficult to use it parse the custom sections of the WASM binary. There is, however, a DWARF parsing Python library called pyelftools with a simple API - to integrate this in WAVM, one would have to use pyelftools to parse the DWARF and write the DWARF representations to an S-expression format or JSON. WAVM would then have to parse the S-expression/JSON and build its own data structures representing the parsed data.
-
The in-memory DWARF data structures must be updated from the Wasm binary locations (function offsets, local variables, global variables, stack(s)) to memory addresses (accessed from the transformed DIBuilder data). I'd need to look more into the details of how these individual transformations would work, the specific sections that need to be translated, and how (if its possible) to extract the transformed DIBuilder data.
-
WAVM's object writer needs to be modified to write the new DWARF data memory data structures to DWARF binary format in the JIT'd object file in memory.
-
A GDB interface needs to be implemented to load the JIT'd object file into GDB, so it knows where to look. The reason for this is that GDB knows where to look - in 'standard' AOT use cases, GDB simply looks for .o files - in the JIT'd case, bytes are written directly to executable memory.
It turns out that a lot of work has been done to complete item 1. Here are the following LLVM commits that address item 1:
The third commit has not landed on Clang yet, but has landed on Rust Nightly, meaning that at the moment, the only way to access DWARF data for native code compiled via WASM is by compiling Rust. You can view the bytes of all of the custom sections using wasm-objdump, a tool that dumps the Wasm binary format in human readable form (part of the WABT toolkit), but it doesn't seem like there is a good way to view the DWARF data embedded in the custom section in the same format as dwarfdump (https://developer.ibm.com/articles/au-dwarf-debug-format/.) Once the DWARF proposal is approved by the Wasm community, adding dwarf-dump functionality to WABT would make for a great project.
Relevant resources/code:
- Working Wasm DWARF proposal: https://yurydelendik.github.io/webassembly-dwarf/#dwarf-locals
- Wasmtime repo: https://github.com/yurydelendik/wasmtime. Areas of interest are in the debug_line branch - wasmtime/lib/debug.
- GDB interface: https://sourceware.org/gdb/current/onlinedocs/gdb/JIT-Interface.html#JIT-Interface
- Pyelftools: https://github.com/eliben/pyelftools
- WABT: https://github.com/WebAssembly/wabt
When going through the three LLVM commits listed above to understand how the WASM Dwarf was generated, I got a little sidetrack and spent some time reading through the Wasm backend. This was pretty difficult to understand - there is a lot of code, not a lot of documentation to describe the different LLVM backend classes, and sparse documentation to describe the different passes at a high level. This documentation was most helpful: https://llvm.org/docs/WritingAnLLVMBackend.html
Because Wasm is a stack machine, unlike x86, ARM and other architectures, it does not have a set of registers - only an operand stack. Because of this, the Wasm backend does not have to perform explicit register allocation. Instead of register allocation, it had to assign values to the operand stack. The following passes accomplish this:
-
createWebAssemblyRegStackify.
This pass converts virtual register uses and defs into single-use expression trees. Virtual registers used in these trees are operands to Wasm instruction operators, and do not explicitly need to be named (this reduces code size and is one of the primary reasons a stack machine was used over an AST or a Register based machine encoding scheme). In a traditional, register based machine one would have to access the operands via the name of the register, which increases the size of the instructions. These registers are marked as 'stackified', and are eventually converted to push + pop instructions from the operand stack. -
createWebAssemblyRegColoring.
This pass executes a graph coloring pass on the set of virtual registers not stackified. The purpose of this is essentially reduce the number of virtual registers for a cleaner mapping between the virtual registers to physical registers in the RegisterNumbering class. Although Wasm doesn't have physical registers, the register allocation framework used in most LLVM backends requires the definition of physical registers in the WebAssemblyRegisterInfo.td file - its unclear to me what the purpose of these physical registers are with regards to the execution of Wasm. It's probably something I should look into. -
createWebAssemblyExplicitLocals.
This pass takes all registers not colored or stackified, and defines them as local variables. When these local variables are read or written, local.get and local.set instructions are inserted.
Relevant code:
- LLVM Wasm Backend: https://github.com/llvm-mirror/llvm/tree/master/lib/Target/WebAssembly
During the last few weeks of the quarter, I spent time trying to understand how a Wasm embedder worked end to end. Here are the high level steps required to instantiate a Wasm module and run a Wasm program in WAVM.
-
Parse + Lex Wasm. WAVM uses a recursive descent parser when parsing the text S-expression like format. It parses the outside body within a set of parentheses, and recursively parses internal sets of parentheses. While parsing, it validates the code to check if the grammar and semantics are followed. If grammar is not followed, WAVM provides error messages with line numbers. The parsed code is written to an internal data structure.
-
The parsed function data loaded into memory is then converted to LLVM IR using a host of helper functions in the lib/LLVMJIT. Many of these functions heavily lean on macros to reduce writing redundant code. One good example of this is when generating LLVM IR for all binary operators. When converting Wasm to LLVM IR, WAVM instantiates three different stack data structures (one for operands, control flow, and the other for stack frames) as it translates from line to line in order to convert the stack based instructions to register-based instructions. Local variables, global variables, memory reads/writes, and table reads/writes passed via imports are left as symbols in the LLVM IR, as these values can only be determined at runtime.
-
LLVM JIT is used to compile the LLVM IR to object code with the aforementioned symbols.
-
Runtime objects for memory, tables, locals, and globals are instantiated. Values are populated from IO via Emscripten like stdin/stdout. Table values are instantiated from values from the element section of Wasm binary and memory values are instantaited from values of the data section of the Wasm binary.
-
Symbols in the JIT'd code are resolved from the runtime environment data structures. Steps 2-5 serve somewhat like a linker and loader.
While reading the WAVM code I noticed a few things:
-
Several Wasm module instances can share the same table and memory. These module instances are said to be in the same compartment and are somewhat analagous to processes that share memory. At the moment, these instances cannot run on different threads.
-
WAVM heavily leans on side-effect code. In many situations, newly instantiated empty objects are passed to functions whose job it is to populate the object but also return values. This is probably to reduce the overhead of copying structs and passing structs by copy.
-
In the same vein as the previous observation, WAVM havily leans on std::move. I learned that std::move casts l-values to r-values so that the move semantics can be used instead of copy semantics.