Support debug info in Binaryen #2400

dschuff · 2019-10-20T16:21:39Z

We need to do this if binaryen-optimized binaries are to be debuggable. This currently actually includes any emscripten output at all (since everything gets run through emscripten-finalize). Currently we support source positions but nothing else. Since Binaryen can make arbitrary transformations, we will probably have to approach the problem similarly to how LLVM and other compilers do it.

yurydelendik · 2019-10-21T18:34:16Z

There is also similar discussion (limited to DWARF) in emscripten: emscripten-core/emscripten#8934 (comment)

As a first step, and not to over-complicate binaryen logic, it might beneficial to just track the change in instruction locations and wasm locations (memory, locals, globals, etc). Additional tools will understand a debug format and later apply this transform/delta to the original debug information, e.g. to custom DWARF sections.

kripken · 2019-10-23T23:06:48Z

Some questions: Does DWARF basically get emitted as a bunch of custom sections in the wasm (or a separate file) where they refer to instructions by offset? Are those offsets absolute in the wasm, or relative to the code section? I think wasm supports multiple code sections (that's how gc-sections works?), if so, are the DWARF offsets basically "code section index, binary offset in that section"?

yurydelendik · 2019-10-24T13:07:13Z

Does DWARF basically get emitted as a bunch of custom sections in the wasm (or a separate file) where they refer to instructions by offset?

At this moment a bunch of custom sections that refer instructions by offset. It's possible to move that to an external file.

Are those offsets absolute in the wasm, or relative to the code section?

Relative to code section at this moment. (But it can be changed to be file wide)

I think wasm supports multiple code sections (that's how gc-sections works?), if so, are the DWARF offsets basically "code section index, binary offset in that section"?

At the moment of design, wasm supports only one code sections. Object files contain relocatable entries which can originate from DWARF sections, and in one-code-section wasm they will be adjusted by linker. AFAIK binaryen ignore relocation section though.

kripken · 2019-10-24T16:49:52Z

Thanks @yurydelendik! I think I get it.

What do you think of the following idea: Binaryen in general decreases wasm code size, so by padding with nops we could make almost every original instruction be at the same offset it was before (by noting its original position when reading, then when writing add padding before each instruction, which is trivial; we may also want to disable some optimization passes that reorder things).

This means binaryen wouldn't decrease code size in builds with full debug info, but that doesn't sound so bad. (edit: it would still decrease gzip size ;)

If the DWARF were say 95% accurate, with only a few things in the wrong place, would that be useful enough?

yurydelendik · 2019-10-24T18:59:39Z

writing add padding before each instruction, which is trivial; we may also want to disable some optimization passes that reorder things

Correct, non-reordering and non-growing code transformation is trivial -- DWARF can be preserved by the described idea. In general case, users will not need this narrow use case. The promise of DWARF information that it will be preserved and used even with highly optimized code, e.g. to break and inspect the stack trace in an optimized code, or help to analyze the stack dump. Bottom line is that a debug information has to be created (complete or partially complete) if user ask us to do that, regardless of debug/release or optimization level choice.

kripken · 2019-10-24T19:49:30Z

Do you think it's not possible to get 95% of debug info this way in an optimized build? Or are you saying even 95% is not enough?

yurydelendik · 2019-10-24T20:43:07Z

Do you think it's not possible to get 95% of debug info this way in an optimized build?

Not at all. I'm just saying that users would like to have debug info for binary they will redistribute (instead of special binary for debugging). I'm afraid, at the end, the "padding" solution may end up to be a more complicated one and less desirable for users than just tracking address and locals transforms and applying them to pre-optimized DWARF.

kripken · 2019-10-24T21:06:15Z

For redistribution I agree the extra size is not great. It would not affect gzip size, at least, but yeah, if shipping binaries built with full debug info is common, then this is not optimal.

This does sound much simpler than other options, though? It would take just a few hours to write. Are you saying the other options are also very quick to implement?

yurydelendik · 2019-10-24T21:37:37Z

It would take just a few hours to write.

It is okay to accepted it as temporary solution if it requires this amount of effort IMO.

Are you saying the other options are also very quick to implement?

https://github.com/yurydelendik/wtmaps-utils is in working state though not production quality (no tests or documentation). It is designed to fit in emscripten's pipeline without any change in the binaryen. Also it will require more effort to be user friendly.

dschuff · 2019-10-25T00:00:51Z

It's definitely a common use case to build a completely-optimized shippable binary with debug info and then strip it out, shipping the stripped binary and archiving the binary with debug info. This gtes you the best of both worlds, shipping the smallest thing, while allowing symbolization or debugging of stack traces and memory dumps in the wild.

I'm not sure the nop-padding would work well anyway. We'd either have to be very careful about which optimizations we run (no inlining, reordering, coalescing, etc) or do enough work to keep things working, which might be as hard as either modeling the debug info in the IR or tracking the deltas from all the transformations.

If I'm interpreting it right, wdwarf-cp records all the code addresses in the input and builds a map of input address -> output address, and then rewrites the dwarf with the different addresses? And it relies on Binaryen to track all the changes from source to destination?

I don't see anything in there about locals and other non-address things. For addresses this should mostly work already because Binaryen supports location data in the IR and input/output source maps. So we don't have to do anything when moving nodes around, and when creating/replacing nodes, we just need to copy source locations from the input. This is straightforward because an address for input source map info is always an address in the output as well.

But in order to make other info work, I think it would be harder. At minimum we'd need to track e.g. input and output local variables. But what if we do something like local coalescing; several locals each turn into a live range of a single local. There won't be one single correct mapping; different DIEs will get rewritten differently. Or any optimization that moves a local to a stack slot or vice versa. I don't know of any way to reason about stack slots other than locally.
It sounds like this approach could work, in that you'd get all of the debug info (even debug info you don't understand yet) transferred to something in the output (even if it's mangled somehow). If we assume that this is O0 output, then every variable lives on the stack and has well-defined values at sequence points, so maybe we don't need locals at all. But it would be limited.

Of course the other option is building up some debug info model in the IR, and then translating more and more of the input info into that as we understand more. That has the downside that you don't emit the info at all until you understand it.

yurydelendik · 2019-10-25T13:33:18Z

If I'm interpreting it right, wdwarf-cp records all the code addresses in the input and builds a map of input address -> output address, and then rewrites the dwarf with the different addresses? And it relies on Binaryen to track all the changes from source to destination?

Somewhat correct. First step is to generate source maps (using wtmaps) that will generate "identity" source map that will refer itself. Binaryen, which understands the source map format today, will generate updated source map after the wasm-opt run. And as a third step, the wdwarf-cp will use pre-opt DWARF and source map to generate final DWARF.

I'm not sure the nop-padding would work well anyway

Since the identity map was based on exact addresses of the instructions, the debug information can be restored with great degree of success even after advanced optimizations.

I don't see anything in there about locals and other non-address things.

Correct. There is nothing in the binaryen to track locals changes, also it will be tricky to embed that transform into source maps (but possible). It's a work to be done.

kripken · 2019-10-29T20:24:31Z

For locals, sounds like we'd need to add tracking in binaryen. That's probably not too hard as only 3 or so passes do major changes to them (simplify-locals, coalesce-locals, merge-locals).

For instruction addresses, I'm not sure I've understood the plan here. Is it that Binaryen emits metadata of "the instruction that began at address 1234 in the binary is now at 5678 in the new binary", for each instruction? I don't quite understand how that relates to source maps support?

yurydelendik · 2019-10-29T21:27:35Z

My plan was not to touch binaryen's logic and use source maps as a container for information. I am planning to document approach and algorithm at https://github.com/yurydelendik/wtmaps-utils .

"the instruction that began at address 1234 in the binary is now at 5678 in the new binary".. I don't quite understand how that relates to source maps support?

It does not have to be a source map format (though even locals movements tracking can be done with source maps). In my case a source map was selected just to store the mentioned above information, so it can be freely passed between multiple tools (wasm-opt, wasm-dis, wasm-as, etc.). It can be done with different format or via memory/API.

kripken · 2019-10-29T22:31:55Z

I think I see now, thanks @yurydelendik !

kripken · 2019-11-07T00:44:54Z

Some investigation:

LLVM from a few months ago cannot properly read new LLVM's debug info, llvm-dwarfdump shows errors. That by itself makes me think the best option here is to just use LLVM, and not any other dwarf implementation. However, otherwise I didn't see any worrying things in other tests, so maybe that's overly pessimistic? (e.g. I got pyelfutils to scan the dwarf sections in wasm files, and it seems to show the right output)
Looking at non-LLVM codebases libdwarf seems the best in C, but is LGPL, which may be an issue for us in Binaryen as some of our builds are static (binaryen.js).
Looking at integrating wtmaps-utils in emscripten, I filed some issues, but it's probably that I'm doing something wrong...

This imports LLVM code for DWARF handling. That code has the Apache 2 license like us. It's also the same code used to emit DWARF in the common toolchain, so it seems like a safe choice. This adds two passes: --dwarfdump which runs the same code LLVM runs for llvm-dwarfdump. This shows we can parse it ok, and will be useful for debugging. And --dwarfupdate writes out the DWARF sections (unchanged from what we read, so it just roundtrips - for updating we need #2515). This puts LLVM in thirdparty which is added here. All the LLVM code is behind USE_LLVM_DWARF, which is on by default, but off in JS for now, as it increases code size by 20%. This current approach imports the LLVM files directly. This is not how they are intended to be used, so it required a bunch of local changes - more than I expected actually, for the platform-specific stuff. For now this seems to work, so it may be good enough, but in the long term we may want to switch to linking against libllvm. A downside to doing that is that binaryen users would need to have an LLVM build, and even in the waterfall builds we'd have a problem - while we ship LLVM there anyhow, we constantly update it, which means that binaryen would need to be on latest llvm all the time too (which otherwise, given DWARF is quite stable, we might not need to constantly update). An even larger issue is that as I did this work I learned about how DWARF works in LLVM, and while the reading code is easy to reuse, the writing code is trickier. The main code path is heavily integrated with the MC layer, which we don't have - we might want to create a "fake MC layer" for that, but it sounds hard. Instead, there is the YAML path which is used mostly for testing, and which can convert DWARF to and from YAML and from binary. Using the non-YAML parts there, we can convert binary DWARF to the YAML layer's nice Info data, then convert that to binary. This works, however, this is not the path LLVM uses normally, and it supports only some basic DWARF sections - I had to add ranges support, in fact. So if we need more complex things, we may end up needing to use the MC layer approach, or consider some other DWARF library. However, hopefully that should not affect the core binaryen code which just calls a library for DWARF stuff. Helps #2400

Optionally track the binary format code section offsets, that is, when loading a binary, remember where each IR node was read from. This is necessary for DWARF debug info, as these are the offsets DWARF refers to. (Note that eventually we may want to do something else, like first read the DWARF and only then add debug info annotations into the IR in a more LLVM-like manner, but this is more straightforward and should be enough to update debug lines and ranges). This tracking adds noticeable overhead - every single IR node adds an entry in a map - so avoid it unless actually necessary. Specifically, if the user passes in -g and there are actually DWARF sections in the binary, and we are not about to remove those sections, then we need it. Print binary format code section offsets in text, when printing with -g. This will help debug and test dwarf support. It looks like ;; code offset: 0x7 as an annotation right before each node. Also add support for -g in wasm-opt tests (unlike a pass, it has just one - as a prefix). Helps #2400

kripken · 2019-12-20T22:31:39Z

An update: with #2545 (not yet merged at this time) we can update DWARF debug line info in binaryen.

kripken · 2019-12-21T00:29:47Z

Ok, with #2545 + emscripten-core/emscripten#10092 I can emit a wasm binary from emscripten that looks like it has valid DWARF debug line info (using -gforce_dwarf, the current temporary option for it). Reading the dwarfdump info, it looks correct to me, even after the binaryen tools in the middle!

I'd like to test this more seriously. I tried to load it in chrome and firefox, but none of the devtools appear to load it. I tried bare clang as well, trying to reproduce this blogpost from @RReverser but I must be doing something wrong - it says "source map detected" but nothing else happens? (that's on stable and dev, 79 and 81).

kripken · 2020-03-18T19:27:29Z

Closing this as in recent versions we have basically complete support for this: we read and write DWARF and it is valid as far as we know, even with optimizations.

A few optimization passes are currently disabled (things that mess with locals, mostly), and we can look into updating them depending on how important that is. But they only make a few % difference in code size, and since this is in debug builds, for most (but not all) use cases we should be good enough.

RReverser · 2020-03-19T16:19:40Z

I tried bare clang as well, trying to reproduce this blogpost from @RReverser but I must be doing something wrong - it says "source map detected" but nothing else happens? (that's on stable and dev, 79 and 81).

Ah yeah, forgot to post an update here, so in case someone else comes across this threads - AFAIK we have figured it out, found and fixed some issues.

This was referenced Nov 7, 2019

Sourcemap support for wat2wasm? WebAssembly/wabt#1210

Open

Debug a wasm application with reasonable amount of RAM bytecodealliance/wasmtime#537

Closed

kripken mentioned this issue Dec 3, 2019

DWARF Sourcemaps emscripten-core/emscripten#8934

Closed

kripken self-assigned this Dec 10, 2019

This was referenced Dec 11, 2019

DWARF parsing and writing support using LLVM #2520

Merged

Binary format code section offset tracking #2515

Merged

kripken mentioned this issue Dec 19, 2019

Understanding DWARF+WebAssembly offsets WebAssembly/debugging#9

Open

kripken closed this as completed Mar 18, 2020

artemmukhin mentioned this issue Jun 4, 2020

What is the state of debugging? rustwasm/wasm-bindgen#1981

Closed

letmaik mentioned this issue Oct 11, 2020

Local variable names lost while debugging AssemblyScript/assemblyscript#1496

Closed

ltfschoen mentioned this issue Apr 1, 2023

Unable to debug maciejhirsz/kobold#47

Open

weichx mentioned this issue Jun 27, 2023

Creating DWARF debug info #5786

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support debug info in Binaryen #2400

Support debug info in Binaryen #2400

dschuff commented Oct 20, 2019

yurydelendik commented Oct 21, 2019

kripken commented Oct 23, 2019

yurydelendik commented Oct 24, 2019

kripken commented Oct 24, 2019 •

edited

Loading

yurydelendik commented Oct 24, 2019

kripken commented Oct 24, 2019

yurydelendik commented Oct 24, 2019

kripken commented Oct 24, 2019

yurydelendik commented Oct 24, 2019

dschuff commented Oct 25, 2019

yurydelendik commented Oct 25, 2019

kripken commented Oct 29, 2019

yurydelendik commented Oct 29, 2019

kripken commented Oct 29, 2019

kripken commented Nov 7, 2019

kripken commented Dec 20, 2019

kripken commented Dec 21, 2019

kripken commented Mar 18, 2020

RReverser commented Mar 19, 2020

Support debug info in Binaryen #2400

Support debug info in Binaryen #2400

Comments

dschuff commented Oct 20, 2019

yurydelendik commented Oct 21, 2019

kripken commented Oct 23, 2019

yurydelendik commented Oct 24, 2019

kripken commented Oct 24, 2019 • edited Loading

yurydelendik commented Oct 24, 2019

kripken commented Oct 24, 2019

yurydelendik commented Oct 24, 2019

kripken commented Oct 24, 2019

yurydelendik commented Oct 24, 2019

dschuff commented Oct 25, 2019

yurydelendik commented Oct 25, 2019

kripken commented Oct 29, 2019

yurydelendik commented Oct 29, 2019

kripken commented Oct 29, 2019

kripken commented Nov 7, 2019

kripken commented Dec 20, 2019

kripken commented Dec 21, 2019

kripken commented Mar 18, 2020

RReverser commented Mar 19, 2020

kripken commented Oct 24, 2019 •

edited

Loading