Module API #332

osandov · 2023-07-05T19:12:40Z

drgn currently provides limited control over how debugging information is found: drgn.Program.load_debug_info() allows specifying a list of files that drgn will try to use, but that's it. drgn has built-in logic for where to search for debugging information by default; this is a custom implementation for the Linux kernel, a partial implementation for userspace core dumps, and libdwfl for live userspace processes. These all have issues, and really need to be unified and more flexible.

The solution to this is an API that exposes the main executable and every shared library, loadable kernel module, etc. as a "module". We can then allow providing debugging information per module, and even allow the user to create modules in case drgn gets it wrong. The existing load_debug_info() API will then be re-implemented on top of this API.

This will also solve or add the flexibility to enable a bunch of related issues: #16, #17, #25.

I'm working on this in the modules branch.

The text was updated successfully, but these errors were encountered:

This will simplify the implementation of the module API (#332). Signed-off-by: Omar Sandoval <osandov@osandov.com>

This will simplify the implementation of the module API (osandov#332). Signed-off-by: Omar Sandoval <osandov@osandov.com>

In my branch for the module API (#332), I want to log an error without any additional context. Passing an empty format string causes a "zero-length gnu_printf format string" warning from GCC, and passing NULL crashes in vsnprintf(). Empty format strings are totally valid, but NULL clearly isn't, so annotate the format parameter as non-NULL and disable -Wformat-zero-length. Signed-off-by: Omar Sandoval <osandov@osandov.com>

osandov · 2023-11-04T17:06:31Z

One other feature to consider, which doesn't exactly fit in with the API as it is currently implemented in my branch, is supporting plugins to get debug info. I.e., some configuration file that defines some way to get debug info on a particular system/distro.

brenns10 · 2023-11-05T16:32:54Z

I've done a little bit of thinking about debuginfo in drgn-tools, and one thing I've found useful is splitting the concept into "finding" and "fetching" debuginfo. For "finding", the assumption is that the files exist on the filesystem if you know where to look. Drgn does this well. But for example on our analysis systems, we have an NFS mount that contains a bunch of vmlinux/ko files. That's a nonstandard location so it's nice to have a "finder" for that.

The "fetching" falls into the same category as debuginfod: the files are either in a remote location, or require a lengthy extraction process to find. For instance, I have two fetcher implementations now: one which can find kernel RPM debuginfo packages, and download and extract them, and another for internal analysis systems which finds the RPM on a (different) NFS share and does the same. The important thing about fetching is to place the newly created files into a location that the finder will find next time :)

I find the separation useful, but it could be a bit tied to our use cases!

osandov · 2024-03-21T21:56:25Z

I'm working on hammering out the remaining bits of this now.

Re: "finding" vs "fetching", at least for debuginfod, the API is that you call debuginfod_find_executable() or debuginfod_find_debuginfo(), which first checks the debuginfod client cache, and if that misses, then it downloads it and stores the result in the cache. I.e., it's a single entrypoint that "finds" locally and "fetches" if that fails, and that's how I've been picturing the drgn equivalents. For your examples, I think a similar approach would work?

We currently only have one test resource file, sample.coredump.zst, but the tests for #332 will add more. Create a package, tests.resources, to contain test resources and a function, get_resource(), to decompress them. It can also be used on the command line: python3 -m tests.resources $resource_name Signed-off-by: Omar Sandoval <osandov@osandov.com>

brenns10 · 2024-03-29T16:33:18Z

I only really meant that it's nice to be able to check whether a request for debuginfo can be satisfied quickly, before committing to doing a long, blocking call. If debuginfod provides a way to check for cached debuginfo only, then it would be nice to have that option. But obviously whatever is easiest to implement, and if there's something I'd like to see, I could probably take a look at adding it too :)

drgn currently provides limited control over how debugging information is found. drgn has hardcoded logic for where to search for debugging information. The most the user can do is provide a list of files for drgn to try in addition to the default locations (with the -s CLI option or the drgn.Program.load_debug_info() method). The implementation is also a mess. We use libdwfl, but its data model is slightly different from what we want, so we have to work around it or reimplement its functionality in several places: see commits e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn: debug_info: report userspace core dump debug info ourselves"), and 1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for some examples. The mismatched combination of libdwfl and our own code is difficult to maintain, and the lack of control over the whole debug info pipeline has made it difficult to fix several longstanding issues. The solution is a major rework removing our libdwfl dependency and replacing it with our own model. This (huge) commit is that rework comprising the following components: - drgn.Module/struct drgn_module, a representation of a binary used by a program. - Automatic discovery of the modules loaded in a program. - Interfaces for manually creating and overriding modules. - Automatic discovery of debugging information from the standard locations and debuginfod. - Interfaces for custom debug info finders and for manually overriding debugging information. - Tons of test cases. A lot of care was taken to make these interfaces extremely flexible yet cohesive. The existing interfaces are also reimplemented on top of the new functionality to maintain backwards compatibility, with one exception: drgn.Program.load_debug_info()/-s would previously accept files that it didn't find loaded in the program. This turned out to be a big footgun for users, so now this must be done explicitly (with drgn.ExtraModule/--extra-symbols). The API and implementation both owe a lot to libdwfl: - The concepts of modules, module address ranges/section addresses, and file biases are heavily inspired by the libdwfl interfaces. - Ideas for determining modules in userspace processes and core dumps were taken from libdwfl. - Our implementation of ELF symbol table address lookups is based on dwfl_module_addrinfo(). drgn has taken these concepts and fine-tuned them based on lessons learned. Credit is also due to Stephen Brennan for early testing and feedback. Closes #16, closes #25, closes #332. Signed-off-by: Omar Sandoval <osandov@osandov.com>

osandov · 2024-12-17T21:21:18Z

I just pushed what I expect to be the final version of this branch (there will of course by lots of followups enabled by the new API). My plan is to cut a new release tomorrow, then merge this branch to kick off the next release cycle.

drgn currently provides limited control over how debugging information is found. drgn has hardcoded logic for where to search for debugging information. The most the user can do is provide a list of files for drgn to try in addition to the default locations (with the -s CLI option or the drgn.Program.load_debug_info() method). The implementation is also a mess. We use libdwfl, but its data model is slightly different from what we want, so we have to work around it or reimplement its functionality in several places: see commits e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn: debug_info: report userspace core dump debug info ourselves"), and 1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for some examples. The mismatched combination of libdwfl and our own code is difficult to maintain, and the lack of control over the whole debug info pipeline has made it difficult to fix several longstanding issues. The solution is a major rework removing our libdwfl dependency and replacing it with our own model. This (huge) commit is that rework comprising the following components: - drgn.Module/struct drgn_module, a representation of a binary used by a program. - Automatic discovery of the modules loaded in a program. - Interfaces for manually creating and overriding modules. - Automatic discovery of debugging information from the standard locations and debuginfod. - Interfaces for custom debug info finders and for manually overriding debugging information. - Tons of test cases. A lot of care was taken to make these interfaces extremely flexible yet cohesive. The existing interfaces are also reimplemented on top of the new functionality to maintain backwards compatibility, with one exception: drgn.Program.load_debug_info()/-s would previously accept files that it didn't find loaded in the program. This turned out to be a big footgun for users, so now this must be done explicitly (with drgn.ExtraModule/--extra-symbols). The API and implementation both owe a lot to libdwfl: - The concepts of modules, module address ranges/section addresses, and file biases are heavily inspired by the libdwfl interfaces. - Ideas for determining modules in userspace processes and core dumps were taken from libdwfl. - Our implementation of ELF symbol table address lookups is based on dwfl_module_addrinfo(). drgn has taken these concepts and fine-tuned them based on lessons learned. Credit is also due to Stephen Brennan for early testing and feedback. Closes #16, closes #25, closes #332. Signed-off-by: Omar Sandoval <osandov@osandov.com>

osandov added the enhancement New feature or request label Jul 5, 2023

osandov self-assigned this Jul 5, 2023

osandov added this to drgn Roadmap Jul 5, 2023

osandov moved this to In Progress in drgn Roadmap Jul 5, 2023

osandov added the debuginfo Support for debugging information formats label Jul 5, 2023

osandov added a commit that referenced this issue Oct 2, 2023

libdrgn: embed drgn_debug_info in drgn_program

84c3adc

This will simplify the implementation of the module API (#332). Signed-off-by: Omar Sandoval <osandov@osandov.com>

osandov added a commit that referenced this issue Oct 2, 2023

libdrgn: embed drgn_debug_info in drgn_program

c85dd74

This will simplify the implementation of the module API (#332). Signed-off-by: Omar Sandoval <osandov@osandov.com>

Asphaltt pushed a commit to Asphaltt/drgn-bpf that referenced this issue Oct 4, 2023

libdrgn: embed drgn_debug_info in drgn_program

eaa9936

This will simplify the implementation of the module API (osandov#332). Signed-off-by: Omar Sandoval <osandov@osandov.com>

osandov mentioned this issue May 22, 2024

Support attaching to QEMU, kgdb, and other gdbstub targets #172

Open

brenns10 linked a pull request Jul 3, 2024 that will close this issue

Add linux kernel module helpers #411

Merged

osandov removed a link to a pull request Jul 11, 2024

Add linux kernel module helpers #411

Merged

osandov mentioned this issue Oct 18, 2024

pylint is confused about optional prog? #436

Closed

osandov closed this as completed in 4e83130 Dec 19, 2024

github-project-automation bot moved this from In Progress to Done in drgn Roadmap Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module API #332

Module API #332

osandov commented Jul 5, 2023 •

edited

Loading

osandov commented Nov 4, 2023

brenns10 commented Nov 5, 2023

osandov commented Mar 21, 2024

brenns10 commented Mar 29, 2024

osandov commented Dec 17, 2024

Module API #332

Module API #332

Comments

osandov commented Jul 5, 2023 • edited Loading

osandov commented Nov 4, 2023

brenns10 commented Nov 5, 2023

osandov commented Mar 21, 2024

brenns10 commented Mar 29, 2024

osandov commented Dec 17, 2024

osandov commented Jul 5, 2023 •

edited

Loading