Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paths to source files for inlined items from extern crates not absolute in debuginfo #34179

Closed
luser opened this issue Jun 9, 2016 · 19 comments

Comments

@luser
Copy link
Contributor

luser commented Jun 9, 2016

@eddyb and I have been poking at this on IRC for a little bit. For code that's linked in from an extern crate, the source file paths that wind up in the debug info have relative paths, like:

< 4><0x000000ab>          DW_TAG_subprogram
                            DW_AT_low_pc                0x00005570
                            DW_AT_high_pc               <offset-from-lowpc>170
                            DW_AT_frame_base            len 0x0001: 56: DW_OP_reg6
                            DW_AT_linkage_name          "_ZN16rust_bin_example4iter8Iterator23count<std::env::ArgsOs>E"
                            DW_AT_name                  "count<std::env::ArgsOs>"
                            DW_AT_decl_file             0x00000003 ../src/libcore/iter.rs
                            DW_AT_decl_line             0x000001d0

For the debug symbols that we generate from Firefox binaries, we wind up interpreting these relative to the compilation dir of the rust crate we're compiling that imports them, so the paths wind up completely wrong, like /build/debug-mozilla-central/media/libstagefright/../src/libcore/option.rs (actual output from my local Firefox build).

If there were absolute paths, they'd be persisted all the way through, and then we could do something useful with them.

Longer-term it'd be great if we also persisted info about what source repository and revision they were built from, or something like that, but we'd be inventing things there, since there's no analogue in existing DWARF produced by C compilers, AFAIK.

/cc @rillian

@eddyb
Copy link
Member

eddyb commented Jun 9, 2016

cc @michaelwoerister @alexcrichton

@luser
Copy link
Contributor Author

luser commented Jun 9, 2016

Some relevant links from our discussion:

So I think if we persisted the absolute path in the FileMap in the crate metadata we'd be able to do the right thing here. I'm not precisely sure of the right way to do this that wouldn't muck things up that expect a relative path there, though.

@eddyb
Copy link
Member

eddyb commented Jun 9, 2016

@luser I would just store both the short path and the absolute one.

@alexcrichton
Copy link
Member

The origin for this file name is actually on the bots which are producing the relevant nightlies. That is, the crux is an invocation on the nightly bots that looks like:

rustc ../src/libcore/lib.rs --out-dir path/to/out/dir

I think that means that we basically plumb through the relative path all the way down to the monomorphizations themselves, so any monomorphization of Iterator methods end up showing with the source file of ../src/libcore/iter.rs. I think this relative file is also used in panic! and assert! messages, but afaik it's not used in too many other locations.

@luser if we were to store an absolute path it'd look something like:

/bot/slave/nightly-dist-rustc-linux/build/src/libcore/iter.rs

Would that be ok? I guess gecko would automatically map these paths?

It may not be an adequate long-term solution, however, to store absolute paths in the metadata (unless there's a separate space we can store them) because we'll want to support installing the standard library source code and having debuginfo hooked up to what's installed locally. I'm not actually sure how that would work as I doubt it would involve rewriting the metadata. (cc @brson on this point)

@luser
Copy link
Contributor Author

luser commented Jun 9, 2016

@alexcrichton Having the absolute paths like that would be something we could work with, certainly. It's comparable to what we get for e.g. C functions inlined from libc. For example, on Windows we get paths like f:/dd/vctools/crt_bld/self_x86/crt/prebuild/eh/unhandld.cpp.

Having the debuginfo hooked up to the matching source sounds pretty great long-term, whatever form that'd take. Presumably that'd somehow involve also persisting information about the source the crate was built from, which is the direction I was going at the end of my initial comment. I'd love to be able to map source paths from lib{std,core} to links to the proper source revision on github without having to maintain a separate mapping of hardcoded paths from your builders. :)

@alexcrichton
Copy link
Member

Ok cool, I do think it'd be super slick if we had enough info hook the standard library and all dependencies up as well! For now though the standard library source links could be reconstructed at least through:

  1. When building Gecko recording the rustc -vV to get the SHA git hash. I think there's a Gecko bug on this?
  2. Being able to identify source paths that come from the main distribution, not local source.
  3. Understanding those source paths enough to extract out the relevant bits.

So in that sense I think as long as you can identify and parse what source paths are standard library sources it should be pretty good to go, right? Unfortunately that identification will vary across platforms but I wonder if we could somehow annotate the source in the DWARF info as specifically coming from the standard library or something like that (as that would even work for debuggers perhaps!).

@michaelwoerister is our local debuginfo guru, so I'll see what he has to say as well!

@luser
Copy link
Contributor Author

luser commented Jun 9, 2016

I'm not a super DWARF expert (and things are different on Windows), but I'm not aware of any way to annotate the original source location beyond pathnames. This is what I meant by "we'd be inventing things there". DWARF is extremely flexible, however, so we could certainly invent attributes and use them. (We'd have to plumb this through LLVM, I suppose.) I'm not sure what we'd do on Windows. PDB files do allow arbitrary data streams, and Microsoft has long supported something in this space with their "source server indexing". You can put a specially-formatted stream into a PDB file that maps source filenames to a way to retrieve them from VCS. We use this in Firefox to map source files to URLs to hg.mozilla.org, so when you debug a Firefox Nighly/Release on Windows the debugger can fetch the matching source automatically: https://developer.mozilla.org/en-US/docs/Mozilla/Using_the_Mozilla_source_server

@luser
Copy link
Contributor Author

luser commented Jun 9, 2016

The bug on "get the rustc commit hash" is https://bugzilla.mozilla.org/show_bug.cgi?id=1275424.

@michaelwoerister
Copy link
Member

Longer-term it'd be great if we also persisted info about what source repository and revision they were built from, or something like that, but we'd be inventing things there, since there's no analogue in existing DWARF produced by C compilers, AFAIK.

Something like this information is usually found in the DW_TAG_compile_unit containing the relevant debuginfo as its sub-entries:

DW_TAG_compile_unit
    DW_AT_producer              "rustc version 1.11.0-dev (42d3a8e69 2016-06-07)"
    DW_AT_language              <Unknown LANG value 0x1c>
    DW_AT_name                  "./lib.rs"
    DW_AT_stmt_list             0x00000000
    DW_AT_comp_dir              "/home/mw/stuff/debuginfo-paths/src"
    DW_AT_low_pc                0x00000000
    DW_AT_ranges                0x00000000

The path given in DW_AT_comp_dir is supposed to be what the relative paths in the sub-entries are relative to. But this obviously doesn't scale to inlining things from other crates. We could think about introducing a separate DW_TAG_compile_unit for each external crate we are inlining something from, although I'm not sure this would be the cleanest solution, since each DW_TAG_compile_unit is supposed to correspond to one obj-file, as far as I know. I'll take another look at the DWARF standard, maybe there's a better solution in there.

As an intermediate fix, it should not be much of a problem to always store absolute paths for functions inlined from external crates.

@luser
Copy link
Contributor Author

luser commented Jun 9, 2016

I've got a WIP patch.

@luser luser changed the title Paths to source files from extern crates not absolute in debuginfo Paths to source files for inlined items from extern crates not absolute in debuginfo Jun 9, 2016
@luser
Copy link
Contributor Author

luser commented Jun 9, 2016

Resummarizing, given the comments here I think this is only an issue for inlined items or monomorphisms? (Most Rust code will likely have a lot of monomorphisms from libstd, though.)

@michaelwoerister
Copy link
Member

Resummarizing, given the comments here I think this is only an issue for inlined items or monomorphisms?

Yes, I think that's true. From the compiler's point of view, both cases are handled the same.

(Most Rust code will likely have a lot of monomorphisms from libstd, though.)

Yes, it's definitely not a niche problem.

I'm wondering though if this could not be solved by setting up GDB's source path in just the right way (e.g. adding <rust-dir>/src to it with the directory command). If there are no name clashes between source files, the GDB documentation at least reads like that should work.

@luser
Copy link
Contributor Author

luser commented Jun 9, 2016

I encountered this problem in the context of linking Rust code into Gecko. We parse the debug symbols from the binaries we generate to convert them to a format we use for processing crash reports. Since the filenames for these DIEs are relative, we handle them as relative to the comp_dir of the CU they're linked into, so we get bogus paths.

So FWIW, the closest analogue here in C is probably just inlined functions, and if I compile this tiny example:
https://gist.github.com/luser/ea21348d5cb87feaa2dac90fdcb42756

The DWARF for the inlined function looks like:

< 1><0x0000002d>    DW_TAG_subprogram
                      DW_AT_external              yes(1)
                      DW_AT_name                  "do_write"
                      DW_AT_decl_file             0x00000001 /build/inline-test/header.h
                      DW_AT_decl_line             0x00000003
                      DW_AT_prototyped            yes(1)
                      DW_AT_type                  <0x00000054>
                      DW_AT_inline                DW_INL_declared_inlined

@michaelwoerister
Copy link
Member

And if your tool would use a similar strategy as GDB if it encountered a relative path? That is, if <comp-dir>/<rel-path> does not exist, try a number of well-known "standard library" sources?
Not that I'm really against having absolute paths for inlined items, but unless you are compiling your own standard library in a fixed location, you'd still have to have some kind of special-casing in your tool to account for things from libstd, right?

@luser
Copy link
Contributor Author

luser commented Jun 9, 2016

Well, that requires making the tool reading the DWARF smarter than it currently is. Right now it just takes the decl_file, joins it to the comp_dir, and hands that back. Also we're not likely to actually have the libstd sources installed when processing the debug info. We do post-processing with a Python script, and my thought was to have a mapping of known Rust source dirs (which isn't great, but is workable).

@michaelwoerister
Copy link
Member

Thinking more about the absolute paths approach, I'm a little concerned that we might break people's GDB setups. Right now they can get GDB to find the sources for libstd if they are using the directory command (which, by the way, would be great to do by default in rust-gdb) but with absolute paths that might not be possible anymore.

@michaelwoerister
Copy link
Member

Oh, nevermind my last comment: there is set substitute-path in GDB which should be usable to the same effect as directory but could handle absolute paths.

@eddyb
Copy link
Member

eddyb commented Jun 9, 2016

@michaelwoerister Could rust-gdb set that up somehow, maybe when rustup is around and if rustup had a way of getting the source?

@michaelwoerister
Copy link
Member

@eddyb Yes, that's what I was thinking of.

bors added a commit that referenced this issue Jun 16, 2016
Add an abs_path member to FileMap, use it when writing debug info.

Fixes #34179.

When items are inlined from extern crates, the filename in the debug info
is taken from the FileMap that's serialized in the rlib metadata.
Currently this is just FileMap.name, which is whatever path is passed to rustc.
Since libcore and libstd are built by invoking rustc with relative paths,
they wind up with relative paths in the rlib, and when linked into a binary
the debug info uses relative paths for the names, but since the compilation
directory for the final binary, tools trying to read source filenames
will wind up with bad paths. We noticed this in Firefox with source
filenames from libcore/libstd having bad paths.

This change stores an absolute path in FileMap.abs_path, and uses that
if available for writing debug info. This is not going to magically make
debuggers able to find the source, but it will at least provide sensible
paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants