Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to demangle RFC2603 v0 mangled symbols with extra bits on the end #27

Closed
jsgf opened this issue Jun 24, 2019 · 5 comments · Fixed by #30
Closed

Fails to demangle RFC2603 v0 mangled symbols with extra bits on the end #27

jsgf opened this issue Jun 24, 2019 · 5 comments · Fixed by #30

Comments

@jsgf
Copy link

jsgf commented Jun 24, 2019

I have some symbols of the form:

_RNvNtNtNtNtCs92dm3009vxr_4rand4rngs7adapter9reseeding4fork23FORK_HANDLER_REGISTERED.0.0
_RNvNvXNtCsdLK3nj9OsGJ_7rand_os13linux_androidNtB4_5OsRngNtB6_9OsRngImpl16test_initialized18OS_RNG_INITIALIZED.0.0
_RINvNtCs3a3bPlXdjxn_4core3ptr18real_drop_in_placeNtNtNtCsfJAotMnIXtG_3std2io5error5ErrorECsdLK3nj9OsGJ_7rand_os.146

which don't demangle. If I remove the .N on the end, then they do. I'm not sure what the extension is from, but this is an LTO-built executable.

(cc @eddyb)

@eddyb
Copy link
Member

eddyb commented Jun 24, 2019

Yeah, LLVM's LTO will do things to symbol names. Most likely there is some special case in the legacy demangler that needs to be added for v0 as well.

@alexcrichton
Copy link
Member

@eddyb I actually don't think that this is something that legacy handles but rather I think this is indicative of duplicate symbols across crates which LLVM automatically renames when LTO is happening. I think it happens when you have a static private symbol in each CGU of the same symbol name, and then when placing into one CGU LLVM has to rename them.

Is this perhaps a bug in the symbol mangling where unique names should be generated in each CGU for these symbols?

@eddyb
Copy link
Member

eddyb commented Jun 25, 2019

@alexcrichton If there is a bug in rustc, it's the same for both legacy and v0 since they use the same source of truth for the "instantiating crate" disambiguator (and it is the whole crate, not CGU-specific).

I believe that for the legacy mangling, this kind of symbol is handled here: https://github.com/alexcrichton/rustc-demangle/blob/de656cdd0b41e5163e2a73e51d800fea3804b8d9/src/lib.rs#L84-L94

That doesn't work for v0 as the symbol is not going to (always) end in an E.

We could have it so the demangler also returns the "leftover input", and then the top-level demangle function handles \.llvm\.[A-F0-9@]* and \.[\x21-\x7e]* (instead of before, as it does today).

@eddyb
Copy link
Member

eddyb commented Jun 27, 2019

Just out of curiosity, I've passed some suffixed legacy symbols, and manually suffixed C++ ones, to c++filt (which can demangle Rust symbols, using the demangler shared by GCC, binutils and GDB).

It doesn't seem to strip any suffix, which does make sense, given that I couldn't find code to handle that.

This means that even if we fix it in rustc-demangle, other tools won't be able to handle these symbols, so we'd have to either convince them to start stripping suffixes or somehow prevent them in the first place.

@alexcrichton
Copy link
Member

We've already got to handle symbols like ThinLTO generated ones with *.llvm.*, and we don't really have any control over that. I think we can have control with these symbols by ensuring that the same symbol isn't generated into two CGUs (since LTO then will mangle them when they're linked into the same module).

Overall seems reasonable to just handle them here and expect other libraries to handle them as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants