Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve debug symbol names to avoid ambiguity and work better with MSVC's debugger #85269

Merged
merged 3 commits into from
Jul 2, 2021

Conversation

dpaoliello
Copy link
Contributor

@dpaoliello dpaoliello commented May 13, 2021

There are several cases where names of types and functions in the debug info are either ambiguous, or not helpful, such as including ambiguous placeholders (e.g., {{impl}}, {{closure}} or dyn _') or dropping qualifications (e.g., for dynamic types).

Instead, each debug symbol name should be unique and useful:

  • Include disambiguators for anonymous DefPathDataName (closures and generators), and unify their formatting when used as a path-qualifier vs item being qualified.
  • Qualify the principal trait for dynamic types.
  • If there is no principal trait for a dynamic type, emit all other traits instead.
  • Respect the qualified argument when emitting ref and pointer types.
  • For implementations, emit the disambiguator.
  • Print const generics when emitting generic parameters or arguments.

Additionally, when targeting MSVC, its debugger treats many command arguments as C++ expressions, even when the argument is defined to be a symbol name. As such names in the debug info need to be more C++-like to be parsed correctly:

  • Avoid characters with special meaning (#, [, ", +).
  • Never start a name with < or { as this is treated as an operator.
  • >> is always treated as a right-shift, even when parsing generic arguments (so add a space to avoid this).
  • Emit function declarations using C/C++ style syntax (e.g., leading return type).
  • Emit arrays as a synthetic array$<type, size> type.
  • Include a $ in all synthetic types as this is a legal character for C++, but not Rust (thus we avoid collisions with user types).

@rust-highfive
Copy link
Collaborator

Some changes occured to the CTFE / Miri engine

cc @rust-lang/miri

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @matthewjasper (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 13, 2021
// gdb-check:[...]static fn function_names::main::{{closure}}(*mut function_names::main::{closure#0});

// Generator
// Generators don't seem to appear in GDB's symbol table.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesleywiser generator functions are still not appearing, even with #84822

@rust-log-analyzer

This comment has been minimized.

@Mark-Simulacrum
Copy link
Member

Many of these bits sound very much like the v0 mangling (currently not on by default). I'm wondering if we should rather make sure that works well for this use case and invest in it, rather than creating yet another variant...

@dpaoliello
Copy link
Contributor Author

Many of these bits sound very much like the v0 mangling (currently not on by default). I'm wondering if we should rather make sure that works well for this use case and invest in it, rather than creating yet another variant...

v0 mangling may help on Linux, but on Windows the debugger uses the names stored in the debug info rather than demangling the names, so I'd still like to make the changes to improve that experience.

@Mark-Simulacrum
Copy link
Member

That makes sense - definitely we should improve the experience - but I'm wondering if it also makes sense to try and arrive at a consistent format, for example making sure the v0 format (perhaps when targeting msvc, ideally always) uses the replacements and suggestions you've laid out here, and then generate the debug names using it on windows.

Cc @nagisa @eddyb (not sure who else might be involved in our debuginfo / symbol names)

@michaelwoerister
Copy link
Member

cc me

@michaelwoerister
Copy link
Member

@Mark-Simulacrum I think the v0 mangling scheme is only marginally related to debuginfo. DWARF (and also CodeView, I think) store a plain string version of type and function names. Can you explain in more detail what you think about the connection of v0 and debuginfo?

Copy link
Member

@michaelwoerister michaelwoerister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to have so many new CDB tests, thanks @dpaoliello!

I think it would be good if all "synthetic" names (like array, tuple, etc) would use a consistent naming scheme. Right now some of them have two leading underscores (e.g. __impl) and some don't; and some of them contain the special character $ and some don't. Always having a special character that is not allowed in a Rust identifier might be a good way to avoid name clashes with regular types.

compiler/rustc_codegen_llvm/src/debuginfo/mod.rs Outdated Show resolved Hide resolved
src/test/debuginfo/function-names.rs Show resolved Hide resolved
compiler/rustc_codegen_ssa/src/debuginfo/type_names.rs Outdated Show resolved Hide resolved
@rust-log-analyzer

This comment has been minimized.

@dpaoliello
Copy link
Contributor Author

dpaoliello commented May 17, 2021

I think it would be good if all "synthetic" names (like array, tuple, etc) would use a consistent naming scheme. Right now some of them have two leading underscores (e.g. __impl) and some don't; and some of them contain the special character $ and some don't. Always having a special character that is not allowed in a Rust identifier might be a good way to avoid name clashes with regular types.

I'm happy to adjust the names if we can come up with a standard.

MSVC's debugger permits identifiers to be [a-z A-Z _][a-z A-Z 0-9 _ $]*, comparing this to Rust gives us $ as a character that we can use for synthetic types to avoid collisions. Since it can't be the first character, we may want to make it trailing: array$, tuple$, impl$, etc.

@michaelwoerister Thoughts?

@vadimcn
Copy link
Contributor

vadimcn commented May 17, 2021

Since it can't be the first character, we may want to make it trailing: array$, tuple$, impl$, etc.

This looks fine.

Another option is to put them in a common "namespace": rust$::array, rust$::slice, rust$::tuple, etc.

@dpaoliello
Copy link
Contributor Author

Another option is to put them in a common "namespace": rust$::array, rust$::slice, rust$::tuple, etc.

I'd prefer the simple names over namespaces to keep the symbol names short

@michaelwoerister
Copy link
Member

It looks to me like we have two options:

  • Make things match Rust syntax where possible (e.g. emit mut Foo* even though mut does not exist in C++), or
  • encode things in a way any C++ parser is very likely to be able to handle -- even if that won't look like Rust at all.

To me it looks like there are already many constructs (like dyn, tuples, fixed-size arrays, etc) that we won't be able to make Rust-like, and some (like ref mut Foo*) that use Rust keywords but are not actually valid Rust code -- so I personally would prefer if we went the second route. This would also allow us to make things quite uniform and would make it easier to encode things in a way that does not lose any information.

The main downside I see is that NATVIS does not seem to provide a way of "demangling" these type names in order to display something nicer and closer to Rust syntax. But given that the alternative would only be better in a few cases, I don't think that is an actual problem. The increased uniformity of a such an encoding might even make it easier to "manually" decode these types, even though they look quite foreign.

I put together a table comparing the various encodings that should help us make further decisions (let me know if I got something wrong):

Type Rust syntax Before PR After PR (current) After PR (proposed)
shared reference &foo::Foo Foo* ref foo::Foo* ref$<foo::Foo>
mutable reference &mut foo::Foo mut Foo* ref mut foo::Foo* ref_mut$<foo::Foo>
const raw pointer *const foo::Foo const Foo* const foo::Foo* ptr_const$<foo::Foo>
mutable raw pointer *mut foo::Foo mut Foo* mut foo::Foo* ptr_mut$<foo::Foo>
fixed size array [foo::Foo; 42] [Foo; 42] array<Foo, 42> array$<Foo, 42>
slice [foo::Foo<u32>] slice<Foo<u32>> slice<foo::Foo<u32> > slice$<foo::Foo<u32> >
trait object dyn foo::Foo Foo __dyn<foo::Foo> dyn$<foo::Foo>
impl <foo::Foo>::bar foo::{{impl}}::bar (?) foo::Foo::bar impl$<foo::Foo>::bar
trait impl <foo::Foo for u32>::bar foo::{{impl}}::bar (?) __impl<foo::Foo for u32>::bar impl$<foo::Foo, u32>::bar
closure closure-0 foo::__closure$0 foo::closure$0
generator generator-0 foo::__generator$0 foo::generator$0
function extern "C" fn(u32, &foo::Foo) -> foo::Foo extern "C" fn(u32, foo::Foo*) -> foo::Foo extern C foo::Foo (u32, ref foo::Foo*) (?) ?

I'm curious to hear what you all think about this :)

@Mark-Simulacrum
Copy link
Member

@Mark-Simulacrum I think the v0 mangling scheme is only marginally related to debuginfo. DWARF (and also CodeView, I think) store a plain string version of type and function names. Can you explain in more detail what you think about the connection of v0 and debuginfo?

I wasn't aware we had the ability to store plain string versions -- if so, then that concern is moot for sure.

In regards to the latest question/comment, I also tend to favor the proposal to go with non-Rust-like syntax if we can't match it closely anyway. That also gives us the ability to just have a table in the documentation (e.g., the rustc book) which tells people what things mean, and there's less worry about having "close but not really" meanings. This is basically what you already said :)

@wesleywiser
Copy link
Member

I also think it makes more sense to go with the second option and encode type names in a C++ friendly way.

@dpaoliello
Copy link
Contributor Author

dpaoliello commented May 20, 2021

I'm curious to hear what you all think about this :)

Looks good to me.

For functions, MSVC produces the format return_type (calling_conv*)(params...) (https://godbolt.org/z/a5P4Ye7dG)

So for your example we'd have: foo::Foo (__cdecl*)(u32, ref$<foo::Foo>)

We can possibly drop the calling convention for native Rust functions, and for non-Windows platforms (e.g., Clang doesn't include the calling convention - https://godbolt.org/z/1rGerjf1a).

@michaelwoerister
Copy link
Member

For functions, MSVC produces the format return_type (calling_conv*)(params...)

Function types are an interesting case. How well does natvis matching work types like return_type (calling_conv*)(params...)? Do we even need it?

We could also try to encode them like the other types, as fn$<return_type, params...> but it's unclear how to handle calling convention, unsafety, and variadic functions. Maybe something like fn$unsafe$cdecl$variadic<return_type, params...>? Or a more compact version fn$uv$cdecl<return_type, params...> where uv would stand for unsafe & variadic? Not exactly a pretty sight :)

@dpaoliello
Copy link
Contributor Author

For functions, MSVC produces the format return_type (calling_conv*)(params...)

Function types are an interesting case. How well does natvis matching work types like return_type (calling_conv*)(params...)? Do we even need it?

We could also try to encode them like the other types, as fn$<return_type, params...> but it's unclear how to handle calling convention, unsafety, and variadic functions. Maybe something like fn$unsafe$cdecl$variadic<return_type, params...>? Or a more compact version fn$uv$cdecl<return_type, params...> where uv would stand for unsafe & variadic? Not exactly a pretty sight :)

Using the same format as C++ should mean that NatVis, and other native debugging tools, will understand that it is a function pointer (and be able to extract the return type and params).

The important thing is to make sure that the function signature is unique, not that is necessarily encodes all the information about the function: since a function can't be overloaded by safe vs unsafe, or by the calling convention (even in C/C++), then that information doesn't need to be in the debug symbol.

Personally, I'd rather stick to the return_type (calling_conv*)(params...) formation (perhaps without calling convention) even if it is a bit lossy since most tools should already handle it correctly (e.g., the APIs for MSVC's debugger include ways of extracting parameter info from a function pointer type: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/dbgmodel/nf-dbgmodel-idebughosttype-getfunctionparametertypeat).

@michaelwoerister
Copy link
Member

The important thing is to make sure that the function signature is unique, not that is necessarily encodes all the information about the function: since a function can't be overloaded by safe vs unsafe, or by the calling convention (even in C/C++), then that information doesn't need to be in the debug symbol.

Yes, I agree, calling convention and unsafety aren't strictly necessary here.

(e.g., the APIs for MSVC's debugger include ways of extracting parameter info from a function pointer type: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/dbgmodel/nf-dbgmodel-idebughosttype-getfunctionparametertypeat).

Do those APIs go through the stringified type name rather than CodeView records?

Personally, I'd rather stick to the return_type (calling_conv*)(params...) formation (perhaps without calling convention) even if it is a bit lossy since most tools should already handle it correctly

I'm not opposed to going that route, especially if it has tangible advantages for tooling. However, if it does not make a difference for tooling (and we leave off calling convention and unsafety) then I think something like fn$<return_type, param0, param1, variadic$> is a bit nicer. Does NatVis have good support for matching on C-style function pointer types (i.e return_type (calling_conv*)(params...))?

@dpaoliello
Copy link
Contributor Author

Personally, I'd rather stick to the return_type (calling_conv*)(params...) formation (perhaps without calling convention) even if it is a bit lossy since most tools should already handle it correctly

I'm not opposed to going that route, especially if it has tangible advantages for tooling. However, if it does not make a difference for tooling (and we leave off calling convention and unsafety) then I think something like fn$<return_type, param0, param1, variadic$> is a bit nicer. Does NatVis have good support for matching on C-style function pointer types (i.e return_type (calling_conv*)(params...))?

Unfortunately, using the fn$<...> synthetic type causes the Debugger Model to no longer be able to get the return type or parameter types, so we'll need to stick with the C++ style formatting.

@dpaoliello dpaoliello force-pushed the dpaoliello/DebugSymbols branch from a2dc033 to 410e9dc Compare June 2, 2021 19:39
@bors bors added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 2, 2021
@wesleywiser
Copy link
Member

The test failed because the debuginfo type name for isize is different between 32-bit and 64-bit Windows. I've updated the test to ignore that like it was already doing for usize.

@bors r=michaelwoerister

@bors
Copy link
Contributor

bors commented Jul 2, 2021

📌 Commit c1601dc has been approved by michaelwoerister

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 2, 2021
@bors
Copy link
Contributor

bors commented Jul 2, 2021

⌛ Testing commit c1601dc with merge 2545459...

@bors
Copy link
Contributor

bors commented Jul 2, 2021

☀️ Test successful - checks-actions
Approved by: michaelwoerister
Pushing 2545459 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 2, 2021
@bors bors merged commit 2545459 into rust-lang:master Jul 2, 2021
@rustbot rustbot added this to the 1.55.0 milestone Jul 2, 2021
@michaelwoerister
Copy link
Member

Thanks for your patience and all the hard work you put into this, @dpaoliello!

@rylev
Copy link
Member

rylev commented Jul 6, 2021

This change led to moderate performance regressions in many debug build benchmarks which is unsurprising.

As part of the performance triage process, I'm marking this as a performance regression. Given the existence of #86431, I will also mark this as having a justified performance regression as hopefully that issue will resolve the performance regressions introduced here.

@rustbot label +perf-regression +perf-regression-triaged

@rustbot rustbot added the perf-regression Performance regression. label Jul 6, 2021
@rylev rylev added the perf-regression-triaged The performance regression has been triaged. label Jul 6, 2021
@dpaoliello dpaoliello deleted the dpaoliello/DebugSymbols branch July 6, 2021 17:07
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jul 13, 2021
…-type-names-fix, r=oli-obk,wesleywiser

Handle non-integer const generic parameters in debuginfo type names.

This PR fixes an ICE introduced by rust-lang#85269 which started emitting const generic arguments for debuginfo names but did not cover the case where such an argument could not be evaluated to a flat string of bits.

The fix implemented in this PR is very basic: If `try_eval_bits()` fails for the constant in question, we fall back to generating a stable hash of the constant and emit that instead. This way we get a (virtually) unique name and side step the problem of generating a string representation of a potentially complex value.

The downside is that the generated name will be rather opaque. E.g. the regression test adds a function `const_generic_fn_non_int<()>` which is then rendered as `const_generic_fn_non_int<{CONST#fe3cfa0214ac55c7}>`. I think it's an open question how to deal with this more gracefully.

I'd be interested in ideas on how to do this better.

r? `@wesleywiser`

cc `@dpaoliello` (do you see any problems with this approach?)
cc `@Mark-Simulacrum` & `@nagisa` (who I've seen comment on debuginfo issues recently -- anyone else?)

Fixes rust-lang#86893
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jul 13, 2021
…-type-names-fix, r=oli-obk,wesleywiser

Handle non-integer const generic parameters in debuginfo type names.

This PR fixes an ICE introduced by rust-lang#85269 which started emitting const generic arguments for debuginfo names but did not cover the case where such an argument could not be evaluated to a flat string of bits.

The fix implemented in this PR is very basic: If `try_eval_bits()` fails for the constant in question, we fall back to generating a stable hash of the constant and emit that instead. This way we get a (virtually) unique name and side step the problem of generating a string representation of a potentially complex value.

The downside is that the generated name will be rather opaque. E.g. the regression test adds a function `const_generic_fn_non_int<()>` which is then rendered as `const_generic_fn_non_int<{CONST#fe3cfa0214ac55c7}>`. I think it's an open question how to deal with this more gracefully.

I'd be interested in ideas on how to do this better.

r? ``@wesleywiser``

cc ``@dpaoliello`` (do you see any problems with this approach?)
cc ``@Mark-Simulacrum`` & ``@nagisa`` (who I've seen comment on debuginfo issues recently -- anyone else?)

Fixes rust-lang#86893
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Jul 13, 2021
…-type-names-fix, r=oli-obk,wesleywiser

Handle non-integer const generic parameters in debuginfo type names.

This PR fixes an ICE introduced by rust-lang#85269 which started emitting const generic arguments for debuginfo names but did not cover the case where such an argument could not be evaluated to a flat string of bits.

The fix implemented in this PR is very basic: If `try_eval_bits()` fails for the constant in question, we fall back to generating a stable hash of the constant and emit that instead. This way we get a (virtually) unique name and side step the problem of generating a string representation of a potentially complex value.

The downside is that the generated name will be rather opaque. E.g. the regression test adds a function `const_generic_fn_non_int<()>` which is then rendered as `const_generic_fn_non_int<{CONST#fe3cfa0214ac55c7}>`. I think it's an open question how to deal with this more gracefully.

I'd be interested in ideas on how to do this better.

r? ```@wesleywiser```

cc ```@dpaoliello``` (do you see any problems with this approach?)
cc ```@Mark-Simulacrum``` & ```@nagisa``` (who I've seen comment on debuginfo issues recently -- anyone else?)

Fixes rust-lang#86893
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 14, 2021
…ype-names-fix, r=oli-obk,wesleywiser

Handle non-integer const generic parameters in debuginfo type names.

This PR fixes an ICE introduced by rust-lang#85269 which started emitting const generic arguments for debuginfo names but did not cover the case where such an argument could not be evaluated to a flat string of bits.

The fix implemented in this PR is very basic: If `try_eval_bits()` fails for the constant in question, we fall back to generating a stable hash of the constant and emit that instead. This way we get a (virtually) unique name and side step the problem of generating a string representation of a potentially complex value.

The downside is that the generated name will be rather opaque. E.g. the regression test adds a function `const_generic_fn_non_int<()>` which is then rendered as `const_generic_fn_non_int<{CONST#fe3cfa0214ac55c7}>`. I think it's an open question how to deal with this more gracefully.

I'd be interested in ideas on how to do this better.

r? `@wesleywiser`

cc `@dpaoliello` (do you see any problems with this approach?)
cc `@Mark-Simulacrum` & `@nagisa` (who I've seen comment on debuginfo issues recently -- anyone else?)

Fixes rust-lang#86893
artemmukhin added a commit to intellij-rust/intellij-rust that referenced this pull request Oct 6, 2021
Some checks are temporary disabled for MSVC LLDB.
Pretty-printers for pointer types of string slices does not work since Rust 1.55 because of the changes in debug info generation introduced in rust-lang/rust#85269. Since 1.55, rustc generates `ptr_const$<...>` and `ptr_mut$<...>` type names instead of `const str *` and `mut str *` when targeting MSVC. So pretty-printer should be updated and the corresponding `lldbg-check`s should be added
yopox pushed a commit to Kobzol/intellij-rust that referenced this pull request Feb 27, 2023
Some checks are temporary disabled for MSVC LLDB.
Pretty-printers for pointer types of string slices does not work since Rust 1.55 because of the changes in debug info generation introduced in rust-lang/rust#85269. Since 1.55, rustc generates `ptr_const$<...>` and `ptr_mut$<...>` type names instead of `const str *` and `mut str *` when targeting MSVC. So pretty-printer should be updated and the corresponding `lldbg-check`s should be added
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.