Skip to content

Do source maps represent original location or current location? #25116

@aheejin

Description

@aheejin

In JS, source maps are meant to help debugging by connecting the original source locations to the (converted/minified) JS locations, so they contain original user programs’ source locations and functions.

But our source maps for Wasm files are currently in an ambiguous state. For example, for inlined functions, our source maps contain not the original function/line/column but the function/line/column that the original location is inlined into. Because we create source maps using DWARF information, we can in theory do better by parsing all inlined callsite information. Source maps don’t have the capability of storing multiple inlined locations for a single location, but at least it can store the most original, before-inliing location. But that would take more efforts and require a full parsing of llvm-dwarfdump results, making #9580 unusable.


I’m thinking of adding support for names field with function names (#25044), and this is the issue here again. It was suggested that I use the name section (instead of the full llvm-dwarfdump) info, which makes sense, but this means the function names/indices the names field and what the mappings field will contain are not the original function names but the current function names. So if inlining or outlining happened, the function names will reflect the final inlined/outlined state, not the original function names.

This has some advantages. This is a lot easier to generate. With this maybe we can replace the name section with the source map’s names field, because they will be the same. But it may not be the most helpful thing for debugging, and the meaning of source maps will be different from that of JS, where source maps contain the original source information.

Currently in Binaryen we don’t update the names field, which is not the worst thing in the world because at least it preserves input function names that each Expression is associated with: https://github.com/WebAssembly/binaryen/blob/600ccd0d3892670648c274ef24be7c673b8854f5/src/wasm/wasm-binary.cpp#L1268-L1272
We can do better; we can remove function names deleted after optimizations, which will in some cases significantly reduce the size of the names field.
But actually this is a weird middle ground already, because the names field we read from the input file wouldn’t be the original function names anyway, because they are generated from the LLVM’s optimized output, unless we generate them using DWARF’s full inlining information.

If we want to be simpler and just reflect the final status of Wasm functions, we don’t even need to read the names field and write that back with the output. We can just generate the names field from scratch using the final status of the optimized Wasm file.

What do you think the source maps should contain?
To sum up,

  1. Source maps contain the final Wasm function/line/column information
  • Pros: Easy to generate, and names field can serve as the name section
  • Cons: May not be helpful for debugging when inlining/outlining occurs
  1. Source maps contain the original source’s function/line/column information
  • Pros: Helpful for debugging. Consistent with the original meaning of source map for JS
  • Cons: Hard to generate. Need to read the full DWARF.

If we decide to use the name section to generate DWARF in #25044, that means we choose 1. And in Binaryen we wouldn't even need to read names field from the input; we should just regenerate it from the output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions