Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detailed debug-info (DWARF) support in new backends (initially x64). #2565

Merged
merged 5 commits into from
Jan 23, 2021

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Jan 9, 2021

This PR propagates "value labels" all the way from CLIF to DWARF
metadata on the emitted machine code. The key idea is as follows:

  • Translate value-label metadata on the input into "value_label"
    pseudo-instructions when lowering into VCode. These
    pseudo-instructions take a register as input, denote a value label,
    and semantically are like a "move into value label" -- i.e., they
    update the current value (as seen by debugging tools) of the given
    local. These pseudo-instructions emit no machine code.

  • Perform a dataflow analysis at the machine-code level, tracking
    value-labels that propagate into registers and into [SP+constant]
    stack storage. This is a forward dataflow fixpoint analysis where each
    storage location can contain a set of value labels, and each value
    label can reside in a set of storage locations. (Meet function is
    pairwise intersection by storage location.)

    This analysis traces value labels symbolically through loads and
    stores and reg-to-reg moves, so it will naturally handle spills and
    reloads without knowing anything special about them.

  • When this analysis converges, we have, at each machine-code offset, a
    mapping from value labels to some number of storage locations; for
    each offset for each label, we choose the best location (prefer
    registers). Note that we can choose any location, as the symbolic
    dataflow analysis is sound and guarantees that the value at the
    value_label instruction propagates to all of the named locations.

  • Then we can convert this mapping into a format that the DWARF
    generation code (wasmtime's debug crate) can use.

This PR also adds the new-backend variant to the gdb tests on CI.

@cfallin
Copy link
Member Author

cfallin commented Jan 9, 2021

@yurydelendik, I think I will need your help on this -- it appears that you wrote all of the debug support initially? Two questions:

  1. Is there a reason that the gdb tests are currently marked #[ignore]? Are they known to be broken for other reasons?

  2. If not, would you be able to help me debug this? I believe the analysis is generating correct, or mostly correct, mappings of labels to registers; but the debug crate in wasmtime is generating DWARF that uses the vmctx register outside of its valid range and I don't understand enough about DWARF or the code here to say why.

Any help from anyone who is passionate about debuginfo would be appreciated -- this is one of the last pieces needed to enable the switch to the new backend by default in wasmtime, and I've dumped a bunch of time into this and need to context-switch for a little bit :-)

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Jan 9, 2021
@yurydelendik
Copy link
Contributor

Is there a reason that the gdb tests are currently marked #[ignore]? Are they known to be broken for other reasons?

They are marked as "ignored" to not run them on developer's computer: the tests require lldb/gdb/llvm-dwarfdump install. These tests are run on the CI. See

# Test debug (DWARF) related functionality.
- run: |
sudo apt-get update && sudo apt-get install -y gdb
cargo test test_debug_dwarf -- --ignored --test-threads 1
if: matrix.os == 'ubuntu-latest'
env:
RUST_BACKTRACE: 1

If not, would you be able to help me debug this?

I'll try to be useful, yes.

but the debug crate in wasmtime is generating DWARF that uses the vmctx register outside of its valid range

To keep vmctx register/value valid as long as possible is a desired functionality, as well as knowing it during stack unwinding to provide Instance information. We talked with @sunfishcode about having fixed location for vmctx some time back.

If vmctx is not available, there is no way to get a variable value stored in the instances memory and shadow stack. The variables are marked as optimized out in this case. Which is acceptable solution from debugging of optimized code point of view. Of course we have to minimize such cases to make live of developers better.

@cfallin
Copy link
Member Author

cfallin commented Jan 9, 2021

To keep vmctx register/value valid as long as possible is a desired functionality, as well as knowing it during stack unwinding to provide Instance information. We talked with @sunfishcode about having fixed location for vmctx some time back.

If vmctx is not available, there is no way to get a variable value stored in the instances memory and shadow stack. The variables are marked as optimized out in this case. Which is acceptable solution from debugging of optimized code point of view. Of course we have to minimize such cases to make live of developers better.

Makes sense; I'll think about ways to do this. Perhaps the simplest would be to just add a hack that emits a value_label psuedo-instruction for vmctx at return -- this would keep it alive and ensure it can be accessed at least from a spillslot if not a register.

A question: I was struggling to work out the invariants in the code that converts range-associated data. What do start and end denote for the ValueLabelsRanges? Are these open or closed bounds and do they apply to the start or end of instructions? I had noticed some code that skips any "empty" ranges (start == end) -- if a value is computable only after one instruction, which ends at offset X, should we encode that as start=X, end=X+1?

There's something a little more broken going on in this PR right now, as the vmctx value is dead at a certain point, but DWARF info related to its former register is being generated. So I suspect that I'm misunderstanding some of the invariants related to ranges.

Thanks for the help!

@cfallin cfallin changed the title Draft: detailed debug-info (DWARF) support in new backends (initially x64). Detailed debug-info (DWARF) support in new backends (initially x64). Jan 9, 2021
@cfallin
Copy link
Member Author

cfallin commented Jan 9, 2021

@yurydelendik with the vmctx-always-alive hack, this seems to pass the gdb DWARF test locally on the new backend now (and I'm watching CI); thanks!

Question above about ranges still stands; I want to make sure I got the conversions right.

Otherwise, though, I think this is ready for review.

@bjorn3
Copy link
Contributor

bjorn3 commented Jan 9, 2021

Perhaps the simplest would be to just add a hack that emits a value_label psuedo-instruction for vmctx at return -- this would keep it alive and ensure it can be accessed at least from a spillslot if not a register.

It would be nice to have a more general way of keeping a value alive somewhere for debuginfo purposes.

@cfallin
Copy link
Member Author

cfallin commented Jan 9, 2021

Perhaps the simplest would be to just add a hack that emits a value_label psuedo-instruction for vmctx at return -- this would keep it alive and ensure it can be accessed at least from a spillslot if not a register.

It would be nice to have a more general way of keeping a value alive somewhere for debuginfo purposes.

Yes, I agree. I actually thought of a really simple/ugly (but maybe more robust) way of doing this -- allocate slots on the stack for all ValueLabels, and emit actual stores to these slots every time a labeled value is defined (or the label is associated with a new value). Then the DWARF generation can always use SP-offset expressions.

This would horribly pessimize code performance, though maybe not too much more than non-optimized debug code already is; but it would be much simpler and more likely to be correct in all sorts of edge cases. I see this as sort of in the same spirit as the idea proposed in #2459 for reftypes tracing: in both cases we instrument the CLIF to provide needed functionality so that the core compiler remains simpler and easier to verify. It is a little unconventional, though, so perhaps for another time.

@cfallin
Copy link
Member Author

cfallin commented Jan 21, 2021

@fitzgen or @bnjbvr, would you be able to spare some time to review this? I don't want it to sit too long as it's starting to become stale (I'll rebase across the conflicts when handling any review comments). Thanks!

@fitzgen
Copy link
Member

fitzgen commented Jan 21, 2021

Sure, I can review.

This PR propagates "value labels" all the way from CLIF to DWARF
metadata on the emitted machine code. The key idea is as follows:

- Translate value-label metadata on the input into "value_label"
  pseudo-instructions when lowering into VCode. These
  pseudo-instructions take a register as input, denote a value label,
  and semantically are like a "move into value label" -- i.e., they
  update the current value (as seen by debugging tools) of the given
  local. These pseudo-instructions emit no machine code.

- Perform a dataflow analysis *at the machine-code level*, tracking
  value-labels that propagate into registers and into [SP+constant]
  stack storage. This is a forward dataflow fixpoint analysis where each
  storage location can contain a *set* of value labels, and each value
  label can reside in a *set* of storage locations. (Meet function is
  pairwise intersection by storage location.)

  This analysis traces value labels symbolically through loads and
  stores and reg-to-reg moves, so it will naturally handle spills and
  reloads without knowing anything special about them.

- When this analysis converges, we have, at each machine-code offset, a
  mapping from value labels to some number of storage locations; for
  each offset for each label, we choose the best location (prefer
  registers). Note that we can choose any location, as the symbolic
  dataflow analysis is sound and guarantees that the value at the
  value_label instruction propagates to all of the named locations.

- Then we can convert this mapping into a format that the DWARF
  generation code (wasmtime's debug crate) can use.

This PR also adds the new-backend variant to the gdb tests on CI.
@cfallin
Copy link
Member Author

cfallin commented Jan 22, 2021

Thanks! I went over this again quickly and noticed a few typos, and decided to rebase while I was here. Also note there's a ridealong change to remove an outdated doc-comment about the backend pipeline (I can split that out if you'd like).

@@ -207,6 +225,31 @@ fn append_memory_deref(
return Ok(false);
}
}
LabelValueLoc::Reg(r) => {
let reg = isa.map_regalloc_reg_to_dwarf(r)? as u8;
writer.write_u8(gimli::constants::DW_OP_breg0.0 + reg)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExpressionWriter has write_op_breg

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, and I really appreciated the top level overview comment but, as a reader, I would have appreciated some more function/method/struct level comments so I didn't have to go searching through the code to figure out how a struct member was used and things like that. A few inline comments below. r=me with them addressed and/or questions answered, etc.

Thanks!

cranelift/codegen/src/machinst/debug.rs Show resolved Hide resolved
cranelift/codegen/src/machinst/debug.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/machinst/debug.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/machinst/debug.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/machinst/debug.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/machinst/debug.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/machinst/debug.rs Outdated Show resolved Hide resolved
I had missed that the CI config didn't actually run the tests, because
(I think) `matrix.target` is not set by default (?). All of our hosts
are native x86-64, so we can just gate on OS (Ubuntu) instead.

I also discovered that while I had been testing with the gdb tests
locally, when *all* `debug::*` tests are run, there are two that do not
pass on the new backend because of specific differences in compiled
code. One is a value-lifetime issue (the value is "optimized out" at the
point the breakpoint is set) and the other has to do with basic-block
order (it is trying to match against hardcoded machine-code offsets
which have changed).
@cfallin
Copy link
Member Author

cfallin commented Jan 23, 2021

@fitzgen I addressed your comments, but while double-checking the CI results one last time, realized the debug tests weren't actually running on CI (GitHub Actions config issue). On fixing that, discovered that the tests I was running locally were a subset of all debug tests, and two of the others were failing. These failures were due to (IMHO) not-super-important differences in codegen, so I added directives to ignore; but let me know if this is still r+ or if you'd like me to dig deeper. Thanks!

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@cfallin cfallin merged commit 95822a5 into bytecodealliance:main Jan 23, 2021
@cfallin cfallin deleted the debug-value-labels branch January 23, 2021 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants