Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debuginfo: Set bitwidth appropriately in enum variant tags #136895

Merged
merged 1 commit into from
Feb 14, 2025

Conversation

maurer
Copy link
Contributor

@maurer maurer commented Feb 12, 2025

Previously, we unconditionally set the bitwidth to 128-bits, the largest an enum would possibly be. Then, LLVM would cut down the constant by chopping off leading zeroes before emitting the DWARF. LLVM only supported 64-bit enumerators, so this would also have occasionally resulted in truncated data.

LLVM added support for 128-bit enumerators in llvm/llvm-project#125578

That patchset trusts the constant to describe how wide the variant tag is, so the high 64-bits of zeros are considered potentially load-bearing.

As a result, we went from emitting tags that looked like:
DW_AT_discr_value (0xfe)

(because dwarf::BestForm selected data1)

to emitting tags that looked like:
DW_AT_discr_value (<0x10> fe ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 )

This makes the DW_AT_discr_value encode at the bitwidth of the tag, which:

  1. Is probably closer to our intentions in terms of describing the data.
  2. Doesn't invoke the 128-bit support which may not be supported by all debuggers / downstream tools.
  3. Will result in smaller debug information.

@rustbot
Copy link
Collaborator

rustbot commented Feb 12, 2025

r? @wesleywiser

rustbot has assigned @wesleywiser.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 12, 2025
@maurer
Copy link
Contributor Author

maurer commented Feb 12, 2025

cc @tromey for expertise on DWARF encoding and whether I've adjusted this correctly

@rust-log-analyzer

This comment has been minimized.

@maurer maurer force-pushed the fix-enum-discr branch 3 times, most recently from 8ceace4 to 59f491b Compare February 12, 2025 02:01
@rust-log-analyzer

This comment has been minimized.

@maurer
Copy link
Contributor Author

maurer commented Feb 12, 2025

@rustbot author

Looks like I have a few things to fix still, will poke it tomorrow, removing from review queue until then.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 12, 2025
@tromey
Copy link
Contributor

tromey commented Feb 12, 2025

cc @tromey for expertise on DWARF encoding and whether I've adjusted this correctly

I think the idea is fine. I don't remember enough about rustc to really comment on the implementation.

I stumbled over an unexpected gdb/llvm/gcc discrepancy in the handle of DW_FORM_data* recently. See this gdb bug. However, assuming the discriminants are unsigned, I think this shouldn't pose a problem -- since it only affects the situation where sign-extension is needed.

@maurer
Copy link
Contributor Author

maurer commented Feb 12, 2025

The discriminants are not always unsigned unfortunately.

#[repr(i32)]
pub enum Foo {
    Unit = -2,
    TwoInts(u32, u32) = -1,
    ThreeShorts { x: u16, y: u16, z: u16 } = 0,
}

pub fn use_foo(_foo: Foo) {}

will produce via dwarfdump DW_AT_discr_value 254 (-2), and via llvm-dwarfdump DW_AT_discr_value (0xfe) for the first variant.

This happens because LLVM uses the discriminator type to determine the signedness when encoding, but then invokes BestForm because nullopt is passed as the form.

The sign data is available to the debugger through variant_part -> member -> type -> encoding, but based on your gdb bug, it sounds like debuggers might not use this.

@tromey: Based on your gdb bug, do you think pre-selecting DW_FORM_udata and DW_FORM_sdata in addInt for bittage of 64 or lower would be an improvement? Or are you suggesting on the gdb bug that GDB has the erroneous behavior, and emitting a data1 with the assumption that the debugger will use the encoding attribute of the discriminant type to expand it is correct? The other option I could see would be to select the data-width of the form based on the bitwidth of the discriminator.

I think we still want to make a change like this Rust side, because if LLVM is conditioning on the bitwidth, we should pass the right bitwidth.

Previously, we unconditionally set the bitwidth to 128-bits, the largest
an discrimnator would possibly be. Then, LLVM would cut down the constant by
chopping off leading zeroes before emitting the DWARF. LLVM only
supported 64-bit descriminators, so this would also have occasionally
resulted in truncated data (or an assert) if more than 64-bits were
used.

LLVM added support for 128-bit enumerators in llvm/llvm-project#125578

That patchset also trusts the constant to describe how wide the variant tag is.
As a result, we went from emitting tags that looked like:
DW_AT_discr_value     (0xfe)

(`form1`)

to emitting tags that looked like:
DW_AT_discr_value	(<0x10> fe ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 )

This makes the `DW_AT_discr_value` encode at the bitwidth of the tag,
which:
1. Is probably closer to our intentions in terms of describing the data.
2. Doesn't invoke the 128-bit support which may not be supported by all
   debuggers / downstream tools.
3. Will result in smaller debug information.
@tromey
Copy link
Contributor

tromey commented Feb 12, 2025

The sign data is available to the debugger through variant_part -> member -> type -> encoding, but based on your gdb bug, it sounds like debuggers might not use this.

Well, gdb doesn't. It probably should, though the spec isn't entirely clear and the thread from ages ago was also pretty unhelpful.

@tromey: Based on your gdb bug, do you think pre-selecting DW_FORM_udata and DW_FORM_sdata in addInt for bittage of 64 or lower would be an improvement? Or are you suggesting on the gdb bug that GDB has the erroneous behavior, and emitting a data1 with the assumption that the debugger will use the encoding attribute of the discriminant type to expand it is correct? The other option I could see would be to select the data-width of the form based on the bitwidth of the discriminator.

I think it's unclear what should be done. Currently I tend to agree with the LLVM interpretation here, and think that gdb is wrong. At the same time DWARF does recommend this:

If one of the DW_FORM_dataforms is used to represent a signed or unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data.

For gdb I am hoping to get some comments on the corresponding GCC bug before taking action. And of course a fix won't appear until the next release anyway...

@maurer
Copy link
Contributor Author

maurer commented Feb 12, 2025

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 12, 2025
@nikic nikic mentioned this pull request Feb 13, 2025
7 tasks
@nikic
Copy link
Contributor

nikic commented Feb 13, 2025

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Feb 13, 2025

📌 Commit d82219a has been approved by nikic

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 13, 2025
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Feb 14, 2025
debuginfo: Set bitwidth appropriately in enum variant tags

Previously, we unconditionally set the bitwidth to 128-bits, the largest an enum would possibly be. Then, LLVM would cut down the constant by chopping off leading zeroes before emitting the DWARF. LLVM only supported 64-bit enumerators, so this would also have occasionally resulted in truncated data.

LLVM added support for 128-bit enumerators in llvm/llvm-project#125578

That patchset trusts the constant to describe how wide the variant tag is, so the high 64-bits of zeros are considered potentially load-bearing.

As a result, we went from emitting tags that looked like:
DW_AT_discr_value     (0xfe)

(because `dwarf::BestForm` selected `data1`)

to emitting tags that looked like:
DW_AT_discr_value	(<0x10> fe ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 )

This makes the `DW_AT_discr_value` encode at the bitwidth of the tag, which:
1. Is probably closer to our intentions in terms of describing the data.
2. Doesn't invoke the 128-bit support which may not be supported by all debuggers / downstream tools.
3. Will result in smaller debug information.
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 14, 2025
…kingjubilee

Rollup of 11 pull requests

Successful merges:

 - rust-lang#136863 (rework rigid alias handling )
 - rust-lang#136869 (Fix diagnostic when using = instead of : in let binding)
 - rust-lang#136895 (debuginfo: Set bitwidth appropriately in enum variant tags)
 - rust-lang#136928 (eagerly prove WF when resolving fully qualified paths)
 - rust-lang#136941 (Move `llvm.ccache` to `build.ccache`)
 - rust-lang#136950 (rustdoc: use better, consistent SVG icons for scraped examples)
 - rust-lang#136957 (coverage: Eliminate more counters by giving them to unreachable nodes)
 - rust-lang#136960 (Compiletest should not inherit all host RUSTFLAGS)
 - rust-lang#136962 (unify LLVM version finding logic)
 - rust-lang#136970 (ci: move `x86_64-gnu-debug` job to the free runner)
 - rust-lang#136973 (Fix `x test --stage 1 ui-fulldeps` on macOS (until the next beta bump))

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 864eba9 into rust-lang:master Feb 14, 2025
6 checks passed
@rustbot rustbot added this to the 1.86.0 milestone Feb 14, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Feb 14, 2025
Rollup merge of rust-lang#136895 - maurer:fix-enum-discr, r=nikic

debuginfo: Set bitwidth appropriately in enum variant tags

Previously, we unconditionally set the bitwidth to 128-bits, the largest an enum would possibly be. Then, LLVM would cut down the constant by chopping off leading zeroes before emitting the DWARF. LLVM only supported 64-bit enumerators, so this would also have occasionally resulted in truncated data.

LLVM added support for 128-bit enumerators in llvm/llvm-project#125578

That patchset trusts the constant to describe how wide the variant tag is, so the high 64-bits of zeros are considered potentially load-bearing.

As a result, we went from emitting tags that looked like:
DW_AT_discr_value     (0xfe)

(because `dwarf::BestForm` selected `data1`)

to emitting tags that looked like:
DW_AT_discr_value	(<0x10> fe ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 )

This makes the `DW_AT_discr_value` encode at the bitwidth of the tag, which:
1. Is probably closer to our intentions in terms of describing the data.
2. Doesn't invoke the 128-bit support which may not be supported by all debuggers / downstream tools.
3. Will result in smaller debug information.
@myxoid
Copy link

myxoid commented Feb 14, 2025

Hi. I originated this bug report within Google. I maintain an open source tool that ingests DWARF, STG, and have previously contributed to libabigail. The latter doesn't yet support variants and discriminants but would run into exactly the same problem.

Making DWARF consumers do type inspection to interpret constant values is a significant burden. In the STG case, it would require either local reference chasing within the DWARF processor (something we avoid wherever possible, for performance reasons) or a post-processing pass to fix up enumerator / discriminator values whose sign was ambiguous during DIE traversal.

This text was referred to in earlier comment:

If one of the DW_FORM_dataforms is used to represent a signed or unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data.

It was added to the DWARF standard in response to a very similar issue to this one.

https://dwarfstd.org/issues/020702.1.html

LLVM appears to br emitting enumerator constants unambiguously. I would very much to see discriminator constants emitted the same way (and parsed in STG with the same code).

Aside: I wonder if this also affects gendwarfksyms.

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 17, 2025
Update to LLVM 20

LLVM 20 GA is scheduled for March 11th. Rust 1.87 will be stable on May 15th.

* [x] rust-lang#135764
* [x] rust-lang#136134
* [x] rust-lang/compiler-builtins#752
* [x] llvm/llvm-project#125287
* [x] rust-lang#136537
* [x] rust-lang#136895
* [x] Wait for beta branch (Feb 14).

Tested: host-x86_64, host-aarch64, apple, mingw, msvc
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 17, 2025
Update to LLVM 20

LLVM 20 GA is scheduled for March 11th. Rust 1.87 will be stable on May 15th.

* [x] rust-lang#135764
* [x] rust-lang#136134
* [x] rust-lang/compiler-builtins#752
* [x] llvm/llvm-project#125287
* [x] rust-lang#136537
* [x] rust-lang#136895
* [x] Wait for beta branch (Feb 14).

Tested: host-x86_64, host-aarch64, apple, mingw, msvc
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Feb 19, 2025
Update to LLVM 20

LLVM 20 GA is scheduled for March 11th. Rust 1.87 will be stable on May 15th.

* [x] rust-lang/rust#135764
* [x] rust-lang/rust#136134
* [x] rust-lang/compiler-builtins#752
* [x] llvm/llvm-project#125287
* [x] rust-lang/rust#136537
* [x] rust-lang/rust#136895
* [x] Wait for beta branch (Feb 14).

Tested: host-x86_64, host-aarch64, apple, mingw, msvc
github-merge-queue bot pushed a commit to rust-lang/rust-analyzer that referenced this pull request Feb 24, 2025
Update to LLVM 20

LLVM 20 GA is scheduled for March 11th. Rust 1.87 will be stable on May 15th.

* [x] rust-lang/rust#135764
* [x] rust-lang/rust#136134
* [x] rust-lang/compiler-builtins#752
* [x] llvm/llvm-project#125287
* [x] rust-lang/rust#136537
* [x] rust-lang/rust#136895
* [x] Wait for beta branch (Feb 14).

Tested: host-x86_64, host-aarch64, apple, mingw, msvc
BoxyUwU pushed a commit to BoxyUwU/rustc-dev-guide that referenced this pull request Feb 25, 2025
Update to LLVM 20

LLVM 20 GA is scheduled for March 11th. Rust 1.87 will be stable on May 15th.

* [x] rust-lang/rust#135764
* [x] rust-lang/rust#136134
* [x] rust-lang/compiler-builtins#752
* [x] llvm/llvm-project#125287
* [x] rust-lang/rust#136537
* [x] rust-lang/rust#136895
* [x] Wait for beta branch (Feb 14).

Tested: host-x86_64, host-aarch64, apple, mingw, msvc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants