-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lldb crash parsing Google Chrome types #54761
Comments
@llvm/issue-subscribers-lldb |
Forgot to mention: $ lldb --version |
Same crash in lldb-14 pulled from https://apt.llvm.org/. The packages look mildly broken, as the python files are scattered between site and dist directories, I had to fix that in my Dockerfile:
Is it possible to get debug info for the binaries somehow, or would I have to build locally to look at what's happening?
|
Crashes also in a local ToT build:
|
Here's a backtrace for where this happens. Looks like CXXRecordDecl::getMostRecentDecl() is failing and returning nullptr. This appears to bottleneck in ... time passes ... Mkay, it looks like CXXRecordDecl::DefinitionData is only initialized at construction and never assigned thereafter. So no matter what getMostRecentDecl() returns, it'll return the nullptr from construction...
|
Perhaps the problem is that the CmapSubtableFormat14 type has been stripped from the symbols at link time? Would this be considered a clang, or a linker bug? It looks like the symbols only contain declarations of the type:
|
This one type is certainly an outlier among its peers, it's the only type that has no definition. All the other types look more like this example:
|
Mkay, this diff will avoid the crash:
|
I'd recommend making a reduced repro case that consists of:
It's hard to understand what exactly is going on when the repro is "build chrome". |
@nico I'm making lldb robust to this, but arguably the problem here is that the Google Chrome binary contains a method CmapSubtableFormat14::closure_glyphs in the symbols, but the symbols don't have a full type definition for the CmapSubtableFormat14. This is merely the first type that tripped this crasher, there are others, although I didn't do a full assay so I can't tell you how common the problem is. |
@nico suggests this might be due to "Constructor type homing for debug info" which sounds likely enough to me. In that case this isn't due to a clang or linker bug, but rather a clang feature. |
If this is due to type homing that would indicate that using |
@shafik I don't presently have access to any hardware that can build Chrome, and as I'm no longer on the team, I can't build Google Chrome at all. I any case, I think that's a tangent to the issue here, which is the lldb crash? |
I should note that there's a one-line fix for this uploaded here: https://reviews.llvm.org/D123415. |
I wouldn't agree with that. I doubt the issue is as simple as "the type is missing" -- lldb already has code to handle those cases. There must be either be something special about that type (e.g., some unusual combination of C++ features perhaps), or the way you're using lldb (e.g. SBModule::GetTypes might force the parsing of types that would not be parsed normally, or in the order they would not be parsed normally).
Yes, but without understanding the problem, it's hard to say whether this is the right fix right now, much less to ensure that the problem stays fixed in the future. |
Mkay, I've spent the morning trying to write a minimized repro without luck.
But I can't seem to trip the crash and I never get a call to AddMethodToCXXRecordType, even though my Foo type has methods. I'm looking one level up, through DWARFASTParserClang::ParseSubroutine, to see whether it may have made the wrong decision, but the code is ... detail rich ... I've also noticed that this is happening in DWARFUnit 2323 (of 61287), but I can't provoke the crash by calling GetTypes(...) on that specific "CompileUnit" using SBModule::GetCompileUnitAtIndex(). Possibly someone who's familiar with the code in DWARFASTParserClang::ParseSubroutine would spot the problem right away. It's decided to add_method, on a type that has no definition. Here are the vars in the context immediately before the errant dispatch to AddMethodToCXXRecordType:
|
Mkay, I've been able to debug into the DWARFASTParserClang::ParseSubroutine call as the " |
So it looks like the second to last call to
Can you dump the DIE for that call and the |
(I've managed to reproduce this on my end and reduce the testcase into something that can be hopefully fed into cvise. The automatic reduction is running now...) |
@shafik thanks for taking a look! Here are the local variables at the outer call:
and here's the state at the inner invocation just before it calls AddMethodToCXXRecordType:
|
@labath thanks for taking a look. I'm way out of my depth :). |
It took some manual prodding -- cvise couldn't make sense of the heavily templated code (I barely made sense as well), but here's a small(ish) reproducer:
to run do:
If anyone wants to take a stab, be my guest. It'll take me a while to get around to looking at this. |
Does not repro for me locally with clang 12. Maybe this is clang version dependent?
|
I can repro the crash when building with clang-14:
|
I am unfortunately not able to reproduce this w/ ToT clang and lldb on macOS. FYI, I did have to modify the the last clang invocation as follows:
We have seem similar crashes on macOS, I was hoping this would lead to solution. |
I think clang-12 did not have constructor type homing yet. It works for me on clang-13, and ToT.
I can't reproduce it on a mac either. I suspect this because |
Is it not helpful to repro the crash on a mac against an ELF binary produced on Linux? This repros readily for me. I can also call SBCompileUnit::GetTypes() on the lldb.target.module[0].compile_units in order, and the crash reproduces on the second GetTypes() call.
|
I was able to reproduce this on macOS by generating the I believe the key to this is the DWARF for
So If we dig into So I am wondering, should we be generating member functions for classes that are just forward declarations? Actually looking at the DWARF 5 spec it looks like this is valid, so I guess we need to make |
I think perhaps we need to use This seems to fix this case and I am going to run |
I can confirm that the crash goes away if I |
Is the right approach related PRs would https://reviews.llvm.org/D86216 and https://reviews.llvm.org/D85968 I still can't come up with a way to reproduce this locally w/o using the library generated on Linux. Maybe @labath might have a better test strategy. |
The difference between this situation, and the ones in those two patches is that adding a nested type (which is done in those patches) definitely does not affect the layout of a class. It is still not legal in in regular C++, but it seems clang lets us get away with it. The situation with member functions is more complicated. Technically, the presence of a regular member function should not affect the class layout, but I suspect (I haven't tried) we could get in trouble if the method we're adding is a virtual function. Nonetheless, As for testing, given that this does not require running a binary, one could recreate the linux binary on any platform using just clang and lld (no docker) -- slap a --target=x86_64-pc-linux on the two compiler commands, and replace the last one with a direct ld.lld invocation. However, given the complexity of the input, I think the result would still be a fairly brittle test (not in the sense that it can break/fail, but that it can stop testing what it is supposed to test). Given the gap between the problem description and the complexity of the test case, I think we (I?) still haven't fully understood the problem. If the problem is really as simple as "adding a method to a forward-declared type" then it should be possible to reproduce this bug with a much simpler test case. Either we are missing some important aspect of the bug or (and I think this is more likely) we need to find a simpler way to tickle this bug. I can try to do that, but probably not today (and maybe not this week). |
I am looking into why
So we don't set I am not sure how to trigger this code path though. |
If you're looking for a case where Clang will emit a type declaration with a subprogram definition, here's a fairly simple example that should do the trick:
|
Here's a slightly better reproducer:
Instead of using |
I can repro the crash on my M1 Mini with the above.
Interestingly(?) I can call |
That's weird/unfortunate. Since standalone-debug is the default on Darwin anyway, it doesn't seem necessary for no-standalone to have different defaults there compared to elsewhere... Hmm, doesn't /look/ like that's the case at least here:
|
Yes, that doesn't seem to be the case for ToT clang, but it is how system clang works. Maybe that changed recently (?) |
Ah, yeah - ctor homing was enabled by default relatively recently. |
Fixed by D124370 |
@labath the ToT release works like charm for me with your fix in. Thanks. |
To repro, download https://edgedl.me.gvt1.com/chrome/linux/symbols/google-chrome-debug-info-linux64-100.0.4896.60.zip and unzip. Then run:
$ echo 'script print(lldb.target.module[0].GetTypes(lldb.eTypeClassClass | lldb.eTypeClassStruct))' | lldb debug-info/chrome.debug
It appears this is bailing on this type: https://source.chromium.org/chromium/chromium/src/+/main:third_party/harfbuzz-ng/src/src/hb-ot-cmap-table.hh;l=1141 for some reason.
The text was updated successfully, but these errors were encountered: