Improve debug symbol names to avoid ambiguity and work better with MSVC's debugger #85269

dpaoliello · 2021-05-13T19:42:18Z

There are several cases where names of types and functions in the debug info are either ambiguous, or not helpful, such as including ambiguous placeholders (e.g., {{impl}}, {{closure}} or dyn _') or dropping qualifications (e.g., for dynamic types).

Instead, each debug symbol name should be unique and useful:

Include disambiguators for anonymous DefPathDataName (closures and generators), and unify their formatting when used as a path-qualifier vs item being qualified.
Qualify the principal trait for dynamic types.
If there is no principal trait for a dynamic type, emit all other traits instead.
Respect the qualified argument when emitting ref and pointer types.
For implementations, emit the disambiguator.
Print const generics when emitting generic parameters or arguments.

Additionally, when targeting MSVC, its debugger treats many command arguments as C++ expressions, even when the argument is defined to be a symbol name. As such names in the debug info need to be more C++-like to be parsed correctly:

Avoid characters with special meaning (#, [, ", +).
Never start a name with < or { as this is treated as an operator.
>> is always treated as a right-shift, even when parsing generic arguments (so add a space to avoid this).
Emit function declarations using C/C++ style syntax (e.g., leading return type).
Emit arrays as a synthetic array$<type, size> type.
Include a $ in all synthetic types as this is a legal character for C++, but not Rust (thus we avoid collisions with user types).

rust-highfive · 2021-05-13T19:42:20Z

Some changes occured to the CTFE / Miri engine

cc @rust-lang/miri

rust-highfive · 2021-05-13T19:42:21Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @matthewjasper (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

src/test/debuginfo/function-names.rs

dpaoliello · 2021-05-13T19:47:04Z

src/test/debuginfo/function-names.rs

@wesleywiser generator functions are still not appearing, even with #84822

Mark-Simulacrum · 2021-05-13T20:15:11Z

Many of these bits sound very much like the v0 mangling (currently not on by default). I'm wondering if we should rather make sure that works well for this use case and invest in it, rather than creating yet another variant...

dpaoliello · 2021-05-13T20:31:00Z

Many of these bits sound very much like the v0 mangling (currently not on by default). I'm wondering if we should rather make sure that works well for this use case and invest in it, rather than creating yet another variant...

v0 mangling may help on Linux, but on Windows the debugger uses the names stored in the debug info rather than demangling the names, so I'd still like to make the changes to improve that experience.

Mark-Simulacrum · 2021-05-13T20:47:34Z

That makes sense - definitely we should improve the experience - but I'm wondering if it also makes sense to try and arrive at a consistent format, for example making sure the v0 format (perhaps when targeting msvc, ideally always) uses the replacements and suggestions you've laid out here, and then generate the debug names using it on windows.

Cc @nagisa @eddyb (not sure who else might be involved in our debuginfo / symbol names)

michaelwoerister · 2021-05-14T11:33:04Z

cc me

michaelwoerister · 2021-05-14T11:37:20Z

@Mark-Simulacrum I think the v0 mangling scheme is only marginally related to debuginfo. DWARF (and also CodeView, I think) store a plain string version of type and function names. Can you explain in more detail what you think about the connection of v0 and debuginfo?

michaelwoerister

It's great to have so many new CDB tests, thanks @dpaoliello!

I think it would be good if all "synthetic" names (like array, tuple, etc) would use a consistent naming scheme. Right now some of them have two leading underscores (e.g. __impl) and some don't; and some of them contain the special character $ and some don't. Always having a special character that is not allowed in a Rust identifier might be a good way to avoid name clashes with regular types.

compiler/rustc_codegen_llvm/src/debuginfo/mod.rs

src/test/debuginfo/function-names.rs

compiler/rustc_codegen_ssa/src/debuginfo/type_names.rs

dpaoliello · 2021-05-17T21:08:15Z

I think it would be good if all "synthetic" names (like array, tuple, etc) would use a consistent naming scheme. Right now some of them have two leading underscores (e.g. __impl) and some don't; and some of them contain the special character $ and some don't. Always having a special character that is not allowed in a Rust identifier might be a good way to avoid name clashes with regular types.

I'm happy to adjust the names if we can come up with a standard.

MSVC's debugger permits identifiers to be [a-z A-Z _][a-z A-Z 0-9 _ $]*, comparing this to Rust gives us $ as a character that we can use for synthetic types to avoid collisions. Since it can't be the first character, we may want to make it trailing: array$, tuple$, impl$, etc.

@michaelwoerister Thoughts?

vadimcn · 2021-05-17T23:07:59Z

Since it can't be the first character, we may want to make it trailing: array$, tuple$, impl$, etc.

This looks fine.

Another option is to put them in a common "namespace": rust$::array, rust$::slice, rust$::tuple, etc.

dpaoliello · 2021-05-18T17:44:06Z

Another option is to put them in a common "namespace": rust$::array, rust$::slice, rust$::tuple, etc.

I'd prefer the simple names over namespaces to keep the symbol names short

michaelwoerister · 2021-05-20T14:04:27Z

It looks to me like we have two options:

Make things match Rust syntax where possible (e.g. emit mut Foo* even though mut does not exist in C++), or
encode things in a way any C++ parser is very likely to be able to handle -- even if that won't look like Rust at all.

To me it looks like there are already many constructs (like dyn, tuples, fixed-size arrays, etc) that we won't be able to make Rust-like, and some (like ref mut Foo*) that use Rust keywords but are not actually valid Rust code -- so I personally would prefer if we went the second route. This would also allow us to make things quite uniform and would make it easier to encode things in a way that does not lose any information.

The main downside I see is that NATVIS does not seem to provide a way of "demangling" these type names in order to display something nicer and closer to Rust syntax. But given that the alternative would only be better in a few cases, I don't think that is an actual problem. The increased uniformity of a such an encoding might even make it easier to "manually" decode these types, even though they look quite foreign.

I put together a table comparing the various encodings that should help us make further decisions (let me know if I got something wrong):

Type	Rust syntax	Before PR	After PR (current)	After PR (proposed)
shared reference	`&foo::Foo`	`Foo*`	`ref foo::Foo*`	`ref$<foo::Foo>`
mutable reference	`&mut foo::Foo`	`mut Foo*`	`ref mut foo::Foo*`	`ref_mut$<foo::Foo>`
const raw pointer	`*const foo::Foo`	`const Foo*`	`const foo::Foo*`	`ptr_const$<foo::Foo>`
mutable raw pointer	`*mut foo::Foo`	`mut Foo*`	`mut foo::Foo*`	`ptr_mut$<foo::Foo>`
fixed size array	`[foo::Foo; 42]`	`[Foo; 42]`	`array<Foo, 42>`	`array$<Foo, 42>`
slice	`[foo::Foo<u32>]`	`slice<Foo<u32>>`	`slice<foo::Foo<u32> >`	`slice$<foo::Foo<u32> >`
trait object	`dyn foo::Foo`	`Foo`	`__dyn<foo::Foo>`	`dyn$<foo::Foo>`
impl	`<foo::Foo>::bar`	`foo::{{impl}}::bar` (?)	`foo::Foo::bar`	`impl$<foo::Foo>::bar`
trait impl	`<foo::Foo for u32>::bar`	`foo::{{impl}}::bar` (?)	`__impl<foo::Foo for u32>::bar`	`impl$<foo::Foo, u32>::bar`
closure		`closure-0`	`foo::__closure$0`	`foo::closure$0`
generator		`generator-0`	`foo::__generator$0`	`foo::generator$0`
function	`extern "C" fn(u32, &foo::Foo) -> foo::Foo`	`extern "C" fn(u32, foo::Foo*) -> foo::Foo`	`extern C foo::Foo (u32, ref foo::Foo*)` (?)	?

I'm curious to hear what you all think about this :)

Mark-Simulacrum · 2021-05-20T14:19:37Z

@Mark-Simulacrum I think the v0 mangling scheme is only marginally related to debuginfo. DWARF (and also CodeView, I think) store a plain string version of type and function names. Can you explain in more detail what you think about the connection of v0 and debuginfo?

I wasn't aware we had the ability to store plain string versions -- if so, then that concern is moot for sure.

In regards to the latest question/comment, I also tend to favor the proposal to go with non-Rust-like syntax if we can't match it closely anyway. That also gives us the ability to just have a table in the documentation (e.g., the rustc book) which tells people what things mean, and there's less worry about having "close but not really" meanings. This is basically what you already said :)

wesleywiser · 2021-05-20T15:09:13Z

I also think it makes more sense to go with the second option and encode type names in a C++ friendly way.

dpaoliello · 2021-05-20T18:03:03Z

I'm curious to hear what you all think about this :)

Looks good to me.

For functions, MSVC produces the format return_type (calling_conv*)(params...) (https://godbolt.org/z/a5P4Ye7dG)

So for your example we'd have: foo::Foo (__cdecl*)(u32, ref$<foo::Foo>)

We can possibly drop the calling convention for native Rust functions, and for non-Windows platforms (e.g., Clang doesn't include the calling convention - https://godbolt.org/z/1rGerjf1a).

compiler/rustc_codegen_ssa/src/debuginfo/type_names.rs

michaelwoerister · 2021-05-21T07:56:45Z

For functions, MSVC produces the format return_type (calling_conv*)(params...)

Function types are an interesting case. How well does natvis matching work types like return_type (calling_conv*)(params...)? Do we even need it?

We could also try to encode them like the other types, as fn$<return_type, params...> but it's unclear how to handle calling convention, unsafety, and variadic functions. Maybe something like fn$unsafe$cdecl$variadic<return_type, params...>? Or a more compact version fn$uv$cdecl<return_type, params...> where uv would stand for unsafe & variadic? Not exactly a pretty sight :)

dpaoliello · 2021-05-21T17:47:02Z

For functions, MSVC produces the format return_type (calling_conv*)(params...)

Function types are an interesting case. How well does natvis matching work types like return_type (calling_conv*)(params...)? Do we even need it?

We could also try to encode them like the other types, as fn$<return_type, params...> but it's unclear how to handle calling convention, unsafety, and variadic functions. Maybe something like fn$unsafe$cdecl$variadic<return_type, params...>? Or a more compact version fn$uv$cdecl<return_type, params...> where uv would stand for unsafe & variadic? Not exactly a pretty sight :)

Using the same format as C++ should mean that NatVis, and other native debugging tools, will understand that it is a function pointer (and be able to extract the return type and params).

The important thing is to make sure that the function signature is unique, not that is necessarily encodes all the information about the function: since a function can't be overloaded by safe vs unsafe, or by the calling convention (even in C/C++), then that information doesn't need to be in the debug symbol.

Personally, I'd rather stick to the return_type (calling_conv*)(params...) formation (perhaps without calling convention) even if it is a bit lossy since most tools should already handle it correctly (e.g., the APIs for MSVC's debugger include ways of extracting parameter info from a function pointer type: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/dbgmodel/nf-dbgmodel-idebughosttype-getfunctionparametertypeat).

michaelwoerister · 2021-05-25T08:34:56Z

The important thing is to make sure that the function signature is unique, not that is necessarily encodes all the information about the function: since a function can't be overloaded by safe vs unsafe, or by the calling convention (even in C/C++), then that information doesn't need to be in the debug symbol.

Yes, I agree, calling convention and unsafety aren't strictly necessary here.

(e.g., the APIs for MSVC's debugger include ways of extracting parameter info from a function pointer type: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/dbgmodel/nf-dbgmodel-idebughosttype-getfunctionparametertypeat).

Do those APIs go through the stringified type name rather than CodeView records?

Personally, I'd rather stick to the return_type (calling_conv*)(params...) formation (perhaps without calling convention) even if it is a bit lossy since most tools should already handle it correctly

I'm not opposed to going that route, especially if it has tangible advantages for tooling. However, if it does not make a difference for tooling (and we leave off calling convention and unsafety) then I think something like fn$<return_type, param0, param1, variadic$> is a bit nicer. Does NatVis have good support for matching on C-style function pointer types (i.e return_type (calling_conv*)(params...))?

src/test/debuginfo/type-names.rs

dpaoliello · 2021-06-02T00:28:47Z

Personally, I'd rather stick to the return_type (calling_conv*)(params...) formation (perhaps without calling convention) even if it is a bit lossy since most tools should already handle it correctly

I'm not opposed to going that route, especially if it has tangible advantages for tooling. However, if it does not make a difference for tooling (and we leave off calling convention and unsafety) then I think something like fn$<return_type, param0, param1, variadic$> is a bit nicer. Does NatVis have good support for matching on C-style function pointer types (i.e return_type (calling_conv*)(params...))?

Unfortunately, using the fn$<...> synthetic type causes the Debugger Model to no longer be able to get the return type or parameter types, so we'll need to stick with the C++ style formatting.

wesleywiser · 2021-07-02T14:36:03Z

The test failed because the debuginfo type name for isize is different between 32-bit and 64-bit Windows. I've updated the test to ignore that like it was already doing for usize.

@bors r=michaelwoerister

bors · 2021-07-02T14:36:05Z

📌 Commit c1601dc has been approved by michaelwoerister

bors · 2021-07-02T17:19:37Z

⌛ Testing commit c1601dc with merge 2545459...

bors · 2021-07-02T20:00:45Z

☀️ Test successful - checks-actions
Approved by: michaelwoerister
Pushing 2545459 to master...

michaelwoerister · 2021-07-05T08:14:54Z

Thanks for your patience and all the hard work you put into this, @dpaoliello!

rylev · 2021-07-06T13:48:42Z

This change led to moderate performance regressions in many debug build benchmarks which is unsurprising.

As part of the performance triage process, I'm marking this as a performance regression. Given the existence of #86431, I will also mark this as having a justified performance regression as hopefully that issue will resolve the performance regressions introduced here.

@rustbot label +perf-regression +perf-regression-triaged

…-type-names-fix, r=oli-obk,wesleywiser Handle non-integer const generic parameters in debuginfo type names. This PR fixes an ICE introduced by rust-lang#85269 which started emitting const generic arguments for debuginfo names but did not cover the case where such an argument could not be evaluated to a flat string of bits. The fix implemented in this PR is very basic: If `try_eval_bits()` fails for the constant in question, we fall back to generating a stable hash of the constant and emit that instead. This way we get a (virtually) unique name and side step the problem of generating a string representation of a potentially complex value. The downside is that the generated name will be rather opaque. E.g. the regression test adds a function `const_generic_fn_non_int<()>` which is then rendered as `const_generic_fn_non_int<{CONST#fe3cfa0214ac55c7}>`. I think it's an open question how to deal with this more gracefully. I'd be interested in ideas on how to do this better. r? `@wesleywiser` cc `@dpaoliello` (do you see any problems with this approach?) cc `@Mark-Simulacrum` & `@nagisa` (who I've seen comment on debuginfo issues recently -- anyone else?) Fixes rust-lang#86893

…-type-names-fix, r=oli-obk,wesleywiser Handle non-integer const generic parameters in debuginfo type names. This PR fixes an ICE introduced by rust-lang#85269 which started emitting const generic arguments for debuginfo names but did not cover the case where such an argument could not be evaluated to a flat string of bits. The fix implemented in this PR is very basic: If `try_eval_bits()` fails for the constant in question, we fall back to generating a stable hash of the constant and emit that instead. This way we get a (virtually) unique name and side step the problem of generating a string representation of a potentially complex value. The downside is that the generated name will be rather opaque. E.g. the regression test adds a function `const_generic_fn_non_int<()>` which is then rendered as `const_generic_fn_non_int<{CONST#fe3cfa0214ac55c7}>`. I think it's an open question how to deal with this more gracefully. I'd be interested in ideas on how to do this better. r? ``@wesleywiser`` cc ``@dpaoliello`` (do you see any problems with this approach?) cc ``@Mark-Simulacrum`` & ``@nagisa`` (who I've seen comment on debuginfo issues recently -- anyone else?) Fixes rust-lang#86893

…-type-names-fix, r=oli-obk,wesleywiser Handle non-integer const generic parameters in debuginfo type names. This PR fixes an ICE introduced by rust-lang#85269 which started emitting const generic arguments for debuginfo names but did not cover the case where such an argument could not be evaluated to a flat string of bits. The fix implemented in this PR is very basic: If `try_eval_bits()` fails for the constant in question, we fall back to generating a stable hash of the constant and emit that instead. This way we get a (virtually) unique name and side step the problem of generating a string representation of a potentially complex value. The downside is that the generated name will be rather opaque. E.g. the regression test adds a function `const_generic_fn_non_int<()>` which is then rendered as `const_generic_fn_non_int<{CONST#fe3cfa0214ac55c7}>`. I think it's an open question how to deal with this more gracefully. I'd be interested in ideas on how to do this better. r? ```@wesleywiser``` cc ```@dpaoliello``` (do you see any problems with this approach?) cc ```@Mark-Simulacrum``` & ```@nagisa``` (who I've seen comment on debuginfo issues recently -- anyone else?) Fixes rust-lang#86893

…ype-names-fix, r=oli-obk,wesleywiser Handle non-integer const generic parameters in debuginfo type names. This PR fixes an ICE introduced by rust-lang#85269 which started emitting const generic arguments for debuginfo names but did not cover the case where such an argument could not be evaluated to a flat string of bits. The fix implemented in this PR is very basic: If `try_eval_bits()` fails for the constant in question, we fall back to generating a stable hash of the constant and emit that instead. This way we get a (virtually) unique name and side step the problem of generating a string representation of a potentially complex value. The downside is that the generated name will be rather opaque. E.g. the regression test adds a function `const_generic_fn_non_int<()>` which is then rendered as `const_generic_fn_non_int<{CONST#fe3cfa0214ac55c7}>`. I think it's an open question how to deal with this more gracefully. I'd be interested in ideas on how to do this better. r? `@wesleywiser` cc `@dpaoliello` (do you see any problems with this approach?) cc `@Mark-Simulacrum` & `@nagisa` (who I've seen comment on debuginfo issues recently -- anyone else?) Fixes rust-lang#86893

Some checks are temporary disabled for MSVC LLDB. Pretty-printers for pointer types of string slices does not work since Rust 1.55 because of the changes in debug info generation introduced in rust-lang/rust#85269. Since 1.55, rustc generates `ptr_const$<...>` and `ptr_mut$<...>` type names instead of `const str *` and `mut str *` when targeting MSVC. So pretty-printer should be updated and the corresponding `lldbg-check`s should be added

rust-highfive assigned matthewjasper May 13, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 13, 2021

dpaoliello commented May 13, 2021

View reviewed changes

src/test/debuginfo/function-names.rs Outdated Show resolved Hide resolved

dpaoliello commented May 13, 2021

View reviewed changes

src/test/debuginfo/function-names.rs Outdated

Copy link

Contributor Author

dpaoliello May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesleywiser generator functions are still not appearing, even with #84822

This comment has been minimized.

Sign in to view

michaelwoerister reviewed May 14, 2021

View reviewed changes

This comment has been minimized.

Sign in to view

wesleywiser mentioned this pull request May 17, 2021

Improve debugging experience for enums on windows-msvc #85292

Merged

michaelwoerister reviewed May 21, 2021

View reviewed changes

compiler/rustc_codegen_ssa/src/debuginfo/type_names.rs Outdated Show resolved Hide resolved

dpaoliello commented May 25, 2021

View reviewed changes

src/test/debuginfo/type-names.rs Outdated Show resolved Hide resolved

dpaoliello force-pushed the dpaoliello/DebugSymbols branch from a2dc033 to 410e9dc Compare June 2, 2021 19:39

bors added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 2, 2021

Fix type name difference between i686 and x86_64 for test

c1601dc

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 2, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 2, 2021

bors merged commit 2545459 into rust-lang:master Jul 2, 2021

rustbot added this to the 1.55.0 milestone Jul 2, 2021

bors mentioned this pull request Jul 2, 2021

[msvc] Consistently show active variant and fix visualization for single variant enums #86636

Merged

goffrie mentioned this pull request Jul 5, 2021

ICE generating debuginfo for non-primitive const generics #86893

Closed

rustbot added the perf-regression Performance regression. label Jul 6, 2021

rylev added the perf-regression-triaged The performance regression has been triaged. label Jul 6, 2021

dpaoliello deleted the dpaoliello/DebugSymbols branch July 6, 2021 17:07

hellow554 mentioned this pull request Jul 12, 2021

Regression ICE with const generics when updating from 7c3872e6b 2021-06-24 to 7a16cfcff 2021-07-11 #87076

Closed

michaelwoerister mentioned this pull request Jul 12, 2021

Handle non-integer const generic parameters in debuginfo type names. #87082

Merged

This was referenced Oct 6, 2021

MSVC LLDB pretty-printers for pointer types of string slices does not work since Rust 1.55 intellij-rust/intellij-rust#7935

Closed

CI: Test LLDB pretty-printers on Windows intellij-rust/intellij-rust#7399

Merged

lcnr mentioned this pull request Dec 11, 2021

Tracking issue for const generics (RFC 2000) #44580

Closed

16 tasks

Uh oh!

Improve debug symbol names to avoid ambiguity and work better with MSVC's debugger #85269

Improve debug symbol names to avoid ambiguity and work better with MSVC's debugger #85269

Uh oh!

Conversation

dpaoliello commented May 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented May 13, 2021

Uh oh!

rust-highfive commented May 13, 2021

Uh oh!

Uh oh!

dpaoliello May 13, 2021

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

Mark-Simulacrum commented May 13, 2021

Uh oh!

dpaoliello commented May 13, 2021

Uh oh!

Mark-Simulacrum commented May 13, 2021

Uh oh!

michaelwoerister commented May 14, 2021

Uh oh!

michaelwoerister commented May 14, 2021

Uh oh!

michaelwoerister left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

dpaoliello commented May 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadimcn commented May 17, 2021

Uh oh!

dpaoliello commented May 18, 2021

Uh oh!

michaelwoerister commented May 20, 2021

Uh oh!

Mark-Simulacrum commented May 20, 2021

Uh oh!

wesleywiser commented May 20, 2021

Uh oh!

dpaoliello commented May 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

michaelwoerister commented May 21, 2021

Uh oh!

dpaoliello commented May 21, 2021

Uh oh!

michaelwoerister commented May 25, 2021

Uh oh!

Uh oh!

dpaoliello commented Jun 2, 2021

Uh oh!

wesleywiser commented Jul 2, 2021

Uh oh!

bors commented Jul 2, 2021

Uh oh!

bors commented Jul 2, 2021

Uh oh!

bors commented Jul 2, 2021

Uh oh!

michaelwoerister commented Jul 5, 2021

Uh oh!

rylev commented Jul 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

dpaoliello commented May 13, 2021 •

edited

Loading

dpaoliello commented May 17, 2021 •

edited

Loading

dpaoliello commented May 20, 2021 •

edited

Loading

rylev commented Jul 6, 2021 •

edited

Loading