-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for built-in ORC information #460
base: main
Are you sure you want to change the base?
Conversation
This will allow orc_version_from_header() to be reused for upcoming ORC integration that does not use libelf. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
This allows users to get the object which the module was created from. The primary use case is for Linux kernel modules, to return the "struct module" associated with the drgn module object. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
So, the error here was that the 4.9 kernel does not support ORC, and I was passing through the lookup error for "num_orcs" in The interesting thing is that after correcting that, the test passes, because on 4.9, unwind can be done via frame pointers rather than ORC. So really, my stack tracing test isn't (necessarily) testing the ability to unwind with ORC, it's testing the ability to unwind with anything other than DWARF. I don't really have a knob to test this in the Python API. But unfortunately, this isn't really something easy to test in a C unit test either. |
Since drgn typically assumes the presence of debug files, ORC has always been loaded from the ELF file, rather than the program. However, ORC is present in kernel core dumps, so it can be used when the debug file is unavailable. Implement the ability to load built-in ORC for vmlinux and kernel modules. We still prefer to load ORC from the debug file wherever possible, because this is almost certainly faster. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
When looking up CFI rules using ORC, we use module->debug_file_bias unconditionally. This makes sense when the ORC is always loaded from an ELF debug file. However, now that built-in ORC can be loaded, it is possible that: 1. ORC is loaded from the built-in source, prior to loading the debug file. The module->debug_file_bias == 0, so the ORC is interpreted correctly. 2. Later, a debug file is loaded, updating debug_file_bias. However, the ORC hasn't been loaded from the debug file, so the bias is not applicable. 3. Future CFI lookups using ORC fail due to the extra bias. To avoid this, apply the debug_file_bias once to module->orc.pc_base, at the time we load the ORC sections out of the debug file. This ensures that the bias is only applied to the ORC data when we know we need it. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
I made a couple changes:
|
Interestingly, my test above that tested log messages failed on Python 3.6. I guess there is a difference in how the logging got initialized. I've added a context manager to explicitly set drgn's log level to DEBUG for the duration of the test, which I believe resolves the issue. Finally, I've gone ahead and added one more test, which actually does test vmlinux ORC. It creates a Program with no debuginfo, and then copies a With that, I do feel happy about the testing now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I perused the ORC changes and they looked sane overall, but I'll have to give them a closer look after the holidays. I looked over the Module.object
change more closely, so some comments there.
The Python 3.6 logging issues are probably due to our very sketchy integration with Python logging:
Lines 144 to 155 in a1869f9
// This is slightly heinous. We need to sync the Python logging configuration | |
// with libdrgn, but the Python log level and handlers can change at any time, | |
// and there are no APIs to be notified of this. | |
// | |
// To sync the log level, we monkey patch logger._cache.clear() to update the | |
// libdrgn log level on every live program. This only works since CPython commit | |
// 78c18a9b9a14 ("bpo-30962: Added caching to Logger.isEnabledFor() (GH-2752)") | |
// (in v3.7), though. Before that, the best we can do is sync the level at the | |
// time that the program is created. | |
// | |
// We also check handlers in that monkey patch, which isn't the right place to | |
// hook but should work in practice in most cases. |
For Linux kernel loadable modules, this is a reference to the ``struct | ||
module`` this was created from. For other kinds, this is currently an absent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pointer (i.e. struct module *
) feels slightly more natural to me (following the precedent of drgn.Thread.object
). I'd be fine restricting drgn.Program.linux_kernel_loadable_module()
to only accept struct module *
and not struct module
to make that simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about that as well. I get that value pointers vs reference objects are always confusing to users, so I agree it would probably be better to stick with the convention that's already established.
I'd actually love to try to keep linux_kernel_loadable_module()
accepting struct module
. If we can't create a pointer to the object, then we can just leave the Module.object
field absent as before. If you're ok with that.
The reason being that I really enjoyed the possibility of somebody constructing a struct module
value Object, and then passing it in. I don't actually know how feasible that would be, but it's kinda fun :)
struct drgn_program *prog = drgn_module_program(self->module); | ||
Program *prog_obj = container_of(prog, Program, prog); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are enough places that do this that we probably need a Program *Module_prog(Module *module)
akin to DrgnObject_prog()
and DrgnType_prog()
.
* @param[out] ret Initialized object where the module object is placed | ||
*/ | ||
struct drgn_error * | ||
drgn_module_object(struct drgn_module *module, struct drgn_object *ret); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add a drgn_module_set_object()
(and the corresponding Python binding setter)? The module API in general lets the user say it knows better than drgn, so we should have a way to override this (for any kind of module).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It strikes me that this opens up the possibility for a user to assign a completely irrelevant object to the module, thus making it much more likely for an error to occur when we're using it. I don't have a problem with that, of course: good error handling is important regardless.
But in my experience, sometimes errors that come from too deep within the bowels of drgn are no longer helpful, because they don't come with a stack trace like Python errors. For example here: if a user ran prog.thread(1).stack_trace()
, and got an error related to the ORC unwinder, it may not be obvious where it came from.
I know we can catch errors that we expect to be likely, and replace them with more informative ones. But I wonder if it wouldn't be worthwhile to add some sort of error-chaining functionality for drgn? That way, we could implement it so that any error returned by drgn_read_builtin_orc()
might get wrapped with the context "encountered error while loading built-in ORC for module foo", without eliminating the specifics of the error that is wrapped.
That's just an oddball idea which doesn't have to be part of this PR, in any case. And I could make the case against it as well: maybe we should rely on logging to give us the context for the error! Just an idea to ponder over the holidays.
Ahh, thank you! I knew about the loglevel monkey-patching, but I hadn't read the full comment, so I missed the bit about syncing the log level when the Program is created. That makes sense, and I can easily fix it. And yeah, I should have made more clear on this that I wasn't hoping for quick action before the holidays :) All the quick updates have been because I've been nerd-sniped by the tests, which are really fun to make work. |
At your leisure: How do you feel about the API choice of |
Stick with the absent object, I prefer that over |
Loading built-in ORC is a difficult functionality to test: it is best tested when there is no debuginfo file. Thus, we add two tests: one simpler test in which the kernel has debuginfo, but a module does not, and we must unwind a stack with functions from the module. The second test is more complex, where we create a program with no debuginfo at all, and provide it just enough data to initialize the module API and unwind with built-in ORC. In both cases, to verify that drgn is actually using ORC, we capture its log messages. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Thanks to your work today with 3ce0fee ("tests: don't clobber file in use by libelf") and a1869f9 ("Make StackFrame.name fall back to symbol/PC and add StackFrame.function_name"), this branch is fully unblocked!
Now that the module API has landed, we can usually rely on having a
struct drgn_module
available for a kernel module, even if the module debuginfo is not loaded (either because you have only loaded debuginfo for the kernel, or because you are using CTF). My previous ORC support required essentially re-implementing a small portion of the module API, specifically a tree that mapped address ranges to ORC data. Now, I can drop all that complexity and share this branch which I think is ready for consideration.This branch's main goal is to enable using built-in ORC information for stack unwinding, for the cases where a module does not have debuginfo. I think the commit messages here describe everything that's important. I think the testing is okay -- I wish I could test the vmlinux ORC, but I think I would need a type finder to allow the module API to even get initialized.