Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Mach-O backtraces without dsymutil #377

Merged
merged 1 commit into from
Oct 29, 2020

Conversation

philipc
Copy link
Contributor

@philipc philipc commented Oct 22, 2020

No description provided.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I think that this looks great and would be happy to land it feature gated. For caching, I wonder if we could perhaps just have an infinitely-sized "cache" inside of each object? That way the LRU can take care of top-level things, but each top-level thing is allowed to be arbitrarily big (which sorta makes sense to me). Basically each Object in backtrace would have a Vec of loaded member objects, populated lazily of course.


fn split(path: &str) -> Option<(&str, &str)> {
let index = path.find('(')?;
let (archive, rest) = path.split_at(index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow it's literally libfoo.a(foo.o)!

@philipc philipc force-pushed the macho-split branch 2 times, most recently from fd5451e to 7d5078d Compare October 24, 2020 07:38
@philipc
Copy link
Contributor Author

philipc commented Oct 24, 2020

I think that this looks great and would be happy to land it feature gated.

What do you think it would take to not need to be gated?

For caching, I wonder if we could perhaps just have an infinitely-sized "cache" inside of each object? That way the LRU can take care of top-level things, but each top-level thing is allowed to be arbitrarily big (which sorta makes sense to me). Basically each Object in backtrace would have a Vec of loaded member objects, populated lazily of course.

Yeah, that's the kind of thing I was expecting. By "infinitely-sized", were you expecting that we'd do a linear scan for the object file name? Instead, since we can already identify each object file by an index, I've used that to index into a presized Vec. It doesn't try to share mmaps for archives though. I was actually wondering if the OS would be smart enough to make that cheap.

@alexcrichton
Copy link
Member

What do you think it would take to not need to be gated?

Oh not much probably! If you'd prefer to land it ungated I don't mind reviewing accordingly. I think though it's nothing really out of the ordinary, just a thorough review, tests, CI, etc.

Yeah, that's the kind of thing I was expecting. By "infinitely-sized", were you expecting that we'd do a linear scan for the object file name?

Would it be possible to have some sort of indexed-by-address map of some sort? I'm not really sure how this debuginfo stuff works, but can we always say that everything within a range of addresses will lie in the same object file? Ideally we could have a Vec of objects which is sorted by the addresses they show up at, and then we can just lookup an address in this list first to find the dwarf info for a symbol. Failing that we could try the main file or loading the object lazily.

@philipc
Copy link
Contributor Author

philipc commented Oct 25, 2020

Oh not much probably! If you'd prefer to land it ungated I don't mind reviewing accordingly. I think though it's nothing really out of the ordinary, just a thorough review, tests, CI, etc.

I think that should be doable, but hold off until I get CI passing (this needs at least a new object release). I assume the existing tests with RUSTFLAGS="-Z run-dsymutil=no" will be enough?

Would it be possible to have some sort of indexed-by-address map of some sort? I'm not really sure how this debuginfo stuff works, but can we always say that everything within a range of addresses will lie in the same object file? Ideally we could have a Vec of objects which is sorted by the addresses they show up at, and then we can just lookup an address in this list first to find the dwarf info for a symbol. Failing that we could try the main file or loading the object lazily.

I don't think there is any guarantee that each object will be a contiguous address range, so we can't exactly do that. The next best thing is a Vec of symbols sorted by addresses, and we already have that in object::ObjectMap. So we do a binary search for the address which gives us an object index, then use the object index to directly get the dwarf info for that object, then look up the address in the dwarf info.

Another reason we need it to be a Vec of symbols is that the addresses in the dwarf haven't been relocated, so object::ObjectMap also contains the symbol names, which lets us look up the original symbol in the object file to "unrelocate" the address (currently this is a linear scan, but could be changed to a binary search or a hash map).

@alexcrichton
Copy link
Member

For testing I think yeah -Zrun-dsymutil=no should probably be good enough. I at least don't know how to test much more myself!

Given what you're saying, could perhaps each object file internally have a list of dwarf debug sections, and we have a mapping from the object index to the position in the vector? That way if it's in the original file itself it'll be at index 0, and otherwise later indexes are populated by lazily loading the object indices.

I'm not sure if that makes sense, but I can try pushing up a commit with what I'm thinking if it doesn't.

@alexcrichton
Copy link
Member

Er carry on, what this PR does now is exactly what I was thinking 👍

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments here and there. I think #379 has some cleanup which may make some pieces nicer here eventually, but not as much as I thought so I'll wait on this merging before that.

src/symbolize/gimli/macho.rs Outdated Show resolved Hide resolved
let object = Object::parse(macho, endian, data)?;
let stash = Stash::new();
let inner = super::cx(&stash, object)?;
return Some(mk!(Mapping { map, inner, stash }));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading this I was hoping that we could deduplicate this construction of a Mapping and I ended up with #379. I don't think it'll help a ton but it might be good to add a helper somewhere in this file to encapsulate the Object::parse and find_header calls and such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already deduplicated in your other comment, right?

src/symbolize/gimli/macho.rs Outdated Show resolved Hide resolved
src/symbolize/gimli/macho.rs Show resolved Hide resolved
@@ -613,7 +613,18 @@ pub unsafe fn resolve(what: ResolveWhat<'_>, cb: &mut dyn FnMut(&super::Symbol))
});
}
}

if let Some((object_cx, object_addr)) = cx.object.search_object_map(addr as u64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with this a bit locally, ideally we'd deduplciate this and the above find_frames loop since I think they're basically identical. Unfortunately I kept running into lifetime errors. It may mean that the lazily-populated map for objects should perhaps use interior mutability, or ideally something like LazyCell, although that's not stable yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to come up with anything for this that felt like a net decrease in complexity.

@philipc
Copy link
Contributor Author

philipc commented Oct 28, 2020

I've rebased and force pushed. I think I addressed most of the comments, and I also changed the object symbol lookup to a binary search.

@philipc philipc force-pushed the macho-split branch 4 times, most recently from 4d184ed to 0466a77 Compare October 28, 2020 06:05
@philipc philipc marked this pull request as ready for review October 28, 2020 07:06
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good to me! A last thing or two but otherwise looks good-to-go to me!

src/symbolize/gimli/macho.rs Show resolved Hide resolved
src/symbolize/gimli/macho.rs Show resolved Hide resolved
@philipc philipc force-pushed the macho-split branch 5 times, most recently from 96684f9 to 36082bc Compare October 29, 2020 00:17
@philipc
Copy link
Contributor Author

philipc commented Oct 29, 2020

I added those comments and the debug asserts for the symbol sorting. Two more things

  • the macos-nightly no-dsymutil test wasn't testing anything because it was still finding the dsym it had already built. So I added a cargo clean, but maybe we should be checking timestamps or something on these?
  • I don't know why the windows-gnu32 build is failing now, but I checked that the rest of the tests still pass with it deleted.

@alexcrichton
Copy link
Member

Ah yeah I think that's anomalous, so I'm going to go ahead and merge, thanks again @philipc!

@alexcrichton alexcrichton merged commit 795d882 into rust-lang:master Oct 29, 2020
@philipc philipc deleted the macho-split branch October 30, 2020 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants