Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module API #332

Closed
osandov opened this issue Jul 5, 2023 · 5 comments
Closed

Module API #332

osandov opened this issue Jul 5, 2023 · 5 comments
Assignees
Labels
debuginfo Support for debugging information formats enhancement New feature or request

Comments

@osandov
Copy link
Owner

osandov commented Jul 5, 2023

drgn currently provides limited control over how debugging information is found: drgn.Program.load_debug_info() allows specifying a list of files that drgn will try to use, but that's it. drgn has built-in logic for where to search for debugging information by default; this is a custom implementation for the Linux kernel, a partial implementation for userspace core dumps, and libdwfl for live userspace processes. These all have issues, and really need to be unified and more flexible.

The solution to this is an API that exposes the main executable and every shared library, loadable kernel module, etc. as a "module". We can then allow providing debugging information per module, and even allow the user to create modules in case drgn gets it wrong. The existing load_debug_info() API will then be re-implemented on top of this API.

This will also solve or add the flexibility to enable a bunch of related issues: #16, #17, #25.

I'm working on this in the modules branch.

@osandov osandov added the enhancement New feature or request label Jul 5, 2023
@osandov osandov self-assigned this Jul 5, 2023
@osandov osandov moved this to In Progress in drgn Roadmap Jul 5, 2023
@osandov osandov added the debuginfo Support for debugging information formats label Jul 5, 2023
osandov added a commit that referenced this issue Oct 2, 2023
This will simplify the implementation of the module API (#332).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
osandov added a commit that referenced this issue Oct 2, 2023
This will simplify the implementation of the module API (#332).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Asphaltt pushed a commit to Asphaltt/drgn-bpf that referenced this issue Oct 4, 2023
This will simplify the implementation of the module API (osandov#332).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
osandov added a commit that referenced this issue Oct 9, 2023
In my branch for the module API (#332), I want to log an error without
any additional context. Passing an empty format string causes a
"zero-length gnu_printf format string" warning from GCC, and passing
NULL crashes in vsnprintf().

Empty format strings are totally valid, but NULL clearly isn't, so
annotate the format parameter as non-NULL and disable
-Wformat-zero-length.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
@osandov
Copy link
Owner Author

osandov commented Nov 4, 2023

One other feature to consider, which doesn't exactly fit in with the API as it is currently implemented in my branch, is supporting plugins to get debug info. I.e., some configuration file that defines some way to get debug info on a particular system/distro.

@brenns10
Copy link
Contributor

brenns10 commented Nov 5, 2023

I've done a little bit of thinking about debuginfo in drgn-tools, and one thing I've found useful is splitting the concept into "finding" and "fetching" debuginfo. For "finding", the assumption is that the files exist on the filesystem if you know where to look. Drgn does this well. But for example on our analysis systems, we have an NFS mount that contains a bunch of vmlinux/ko files. That's a nonstandard location so it's nice to have a "finder" for that.

The "fetching" falls into the same category as debuginfod: the files are either in a remote location, or require a lengthy extraction process to find. For instance, I have two fetcher implementations now: one which can find kernel RPM debuginfo packages, and download and extract them, and another for internal analysis systems which finds the RPM on a (different) NFS share and does the same. The important thing about fetching is to place the newly created files into a location that the finder will find next time :)

I find the separation useful, but it could be a bit tied to our use cases!

@osandov
Copy link
Owner Author

osandov commented Mar 21, 2024

I'm working on hammering out the remaining bits of this now.

Re: "finding" vs "fetching", at least for debuginfod, the API is that you call debuginfod_find_executable() or debuginfod_find_debuginfo(), which first checks the debuginfod client cache, and if that misses, then it downloads it and stores the result in the cache. I.e., it's a single entrypoint that "finds" locally and "fetches" if that fails, and that's how I've been picturing the drgn equivalents. For your examples, I think a similar approach would work?

osandov added a commit that referenced this issue Mar 22, 2024
We currently only have one test resource file, sample.coredump.zst, but
the tests for #332 will add more. Create a package, tests.resources, to
contain test resources and a function, get_resource(), to decompress
them. It can also be used on the command line:

  python3 -m tests.resources $resource_name

Signed-off-by: Omar Sandoval <osandov@osandov.com>
@brenns10
Copy link
Contributor

I only really meant that it's nice to be able to check whether a request for debuginfo can be satisfied quickly, before committing to doing a long, blocking call. If debuginfod provides a way to check for cached debuginfo only, then it would be nice to have that option. But obviously whatever is easiest to implement, and if there's something I'd like to see, I could probably take a look at adding it too :)

@brenns10 brenns10 linked a pull request Jul 3, 2024 that will close this issue
@osandov osandov removed a link to a pull request Jul 11, 2024
osandov added a commit that referenced this issue Dec 17, 2024
drgn currently provides limited control over how debugging information
is found. drgn has hardcoded logic for where to search for debugging
information. The most the user can do is provide a list of files for
drgn to try in addition to the default locations (with the -s CLI option
or the drgn.Program.load_debug_info() method).

The implementation is also a mess. We use libdwfl, but its data model is
slightly different from what we want, so we have to work around it or
reimplement its functionality in several places: see commits
e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn:
debug_info: report userspace core dump debug info ourselves"), and
1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for
some examples. The mismatched combination of libdwfl and our own code is
difficult to maintain, and the lack of control over the whole debug info
pipeline has made it difficult to fix several longstanding issues.

The solution is a major rework removing our libdwfl dependency and
replacing it with our own model. This (huge) commit is that rework
comprising the following components:

- drgn.Module/struct drgn_module, a representation of a binary used by a
  program.
- Automatic discovery of the modules loaded in a program.
- Interfaces for manually creating and overriding modules.
- Automatic discovery of debugging information from the standard
  locations and debuginfod.
- Interfaces for custom debug info finders and for manually overriding
  debugging information.
- Tons of test cases.

A lot of care was taken to make these interfaces extremely flexible yet
cohesive. The existing interfaces are also reimplemented on top of the
new functionality to maintain backwards compatibility, with one
exception: drgn.Program.load_debug_info()/-s would previously accept
files that it didn't find loaded in the program. This turned out to be a
big footgun for users, so now this must be done explicitly (with
drgn.ExtraModule/--extra-symbols).

The API and implementation both owe a lot to libdwfl:

- The concepts of modules, module address ranges/section addresses, and
  file biases are heavily inspired by the libdwfl interfaces.
- Ideas for determining modules in userspace processes and core dumps
  were taken from libdwfl.
- Our implementation of ELF symbol table address lookups is based on
  dwfl_module_addrinfo().

drgn has taken these concepts and fine-tuned them based on lessons
learned.

Credit is also due to Stephen Brennan for early testing and feedback.

Closes #16, closes #25, closes #332.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
@osandov
Copy link
Owner Author

osandov commented Dec 17, 2024

I just pushed what I expect to be the final version of this branch (there will of course by lots of followups enabled by the new API). My plan is to cut a new release tomorrow, then merge this branch to kick off the next release cycle.

osandov added a commit that referenced this issue Dec 18, 2024
drgn currently provides limited control over how debugging information
is found. drgn has hardcoded logic for where to search for debugging
information. The most the user can do is provide a list of files for
drgn to try in addition to the default locations (with the -s CLI option
or the drgn.Program.load_debug_info() method).

The implementation is also a mess. We use libdwfl, but its data model is
slightly different from what we want, so we have to work around it or
reimplement its functionality in several places: see commits
e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn:
debug_info: report userspace core dump debug info ourselves"), and
1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for
some examples. The mismatched combination of libdwfl and our own code is
difficult to maintain, and the lack of control over the whole debug info
pipeline has made it difficult to fix several longstanding issues.

The solution is a major rework removing our libdwfl dependency and
replacing it with our own model. This (huge) commit is that rework
comprising the following components:

- drgn.Module/struct drgn_module, a representation of a binary used by a
  program.
- Automatic discovery of the modules loaded in a program.
- Interfaces for manually creating and overriding modules.
- Automatic discovery of debugging information from the standard
  locations and debuginfod.
- Interfaces for custom debug info finders and for manually overriding
  debugging information.
- Tons of test cases.

A lot of care was taken to make these interfaces extremely flexible yet
cohesive. The existing interfaces are also reimplemented on top of the
new functionality to maintain backwards compatibility, with one
exception: drgn.Program.load_debug_info()/-s would previously accept
files that it didn't find loaded in the program. This turned out to be a
big footgun for users, so now this must be done explicitly (with
drgn.ExtraModule/--extra-symbols).

The API and implementation both owe a lot to libdwfl:

- The concepts of modules, module address ranges/section addresses, and
  file biases are heavily inspired by the libdwfl interfaces.
- Ideas for determining modules in userspace processes and core dumps
  were taken from libdwfl.
- Our implementation of ELF symbol table address lookups is based on
  dwfl_module_addrinfo().

drgn has taken these concepts and fine-tuned them based on lessons
learned.

Credit is also due to Stephen Brennan for early testing and feedback.

Closes #16, closes #25, closes #332.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
osandov added a commit that referenced this issue Dec 18, 2024
drgn currently provides limited control over how debugging information
is found. drgn has hardcoded logic for where to search for debugging
information. The most the user can do is provide a list of files for
drgn to try in addition to the default locations (with the -s CLI option
or the drgn.Program.load_debug_info() method).

The implementation is also a mess. We use libdwfl, but its data model is
slightly different from what we want, so we have to work around it or
reimplement its functionality in several places: see commits
e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn:
debug_info: report userspace core dump debug info ourselves"), and
1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for
some examples. The mismatched combination of libdwfl and our own code is
difficult to maintain, and the lack of control over the whole debug info
pipeline has made it difficult to fix several longstanding issues.

The solution is a major rework removing our libdwfl dependency and
replacing it with our own model. This (huge) commit is that rework
comprising the following components:

- drgn.Module/struct drgn_module, a representation of a binary used by a
  program.
- Automatic discovery of the modules loaded in a program.
- Interfaces for manually creating and overriding modules.
- Automatic discovery of debugging information from the standard
  locations and debuginfod.
- Interfaces for custom debug info finders and for manually overriding
  debugging information.
- Tons of test cases.

A lot of care was taken to make these interfaces extremely flexible yet
cohesive. The existing interfaces are also reimplemented on top of the
new functionality to maintain backwards compatibility, with one
exception: drgn.Program.load_debug_info()/-s would previously accept
files that it didn't find loaded in the program. This turned out to be a
big footgun for users, so now this must be done explicitly (with
drgn.ExtraModule/--extra-symbols).

The API and implementation both owe a lot to libdwfl:

- The concepts of modules, module address ranges/section addresses, and
  file biases are heavily inspired by the libdwfl interfaces.
- Ideas for determining modules in userspace processes and core dumps
  were taken from libdwfl.
- Our implementation of ELF symbol table address lookups is based on
  dwfl_module_addrinfo().

drgn has taken these concepts and fine-tuned them based on lessons
learned.

Credit is also due to Stephen Brennan for early testing and feedback.

Closes #16, closes #25, closes #332.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
osandov added a commit that referenced this issue Dec 18, 2024
drgn currently provides limited control over how debugging information
is found. drgn has hardcoded logic for where to search for debugging
information. The most the user can do is provide a list of files for
drgn to try in addition to the default locations (with the -s CLI option
or the drgn.Program.load_debug_info() method).

The implementation is also a mess. We use libdwfl, but its data model is
slightly different from what we want, so we have to work around it or
reimplement its functionality in several places: see commits
e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn:
debug_info: report userspace core dump debug info ourselves"), and
1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for
some examples. The mismatched combination of libdwfl and our own code is
difficult to maintain, and the lack of control over the whole debug info
pipeline has made it difficult to fix several longstanding issues.

The solution is a major rework removing our libdwfl dependency and
replacing it with our own model. This (huge) commit is that rework
comprising the following components:

- drgn.Module/struct drgn_module, a representation of a binary used by a
  program.
- Automatic discovery of the modules loaded in a program.
- Interfaces for manually creating and overriding modules.
- Automatic discovery of debugging information from the standard
  locations and debuginfod.
- Interfaces for custom debug info finders and for manually overriding
  debugging information.
- Tons of test cases.

A lot of care was taken to make these interfaces extremely flexible yet
cohesive. The existing interfaces are also reimplemented on top of the
new functionality to maintain backwards compatibility, with one
exception: drgn.Program.load_debug_info()/-s would previously accept
files that it didn't find loaded in the program. This turned out to be a
big footgun for users, so now this must be done explicitly (with
drgn.ExtraModule/--extra-symbols).

The API and implementation both owe a lot to libdwfl:

- The concepts of modules, module address ranges/section addresses, and
  file biases are heavily inspired by the libdwfl interfaces.
- Ideas for determining modules in userspace processes and core dumps
  were taken from libdwfl.
- Our implementation of ELF symbol table address lookups is based on
  dwfl_module_addrinfo().

drgn has taken these concepts and fine-tuned them based on lessons
learned.

Credit is also due to Stephen Brennan for early testing and feedback.

Closes #16, closes #25, closes #332.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
@github-project-automation github-project-automation bot moved this from In Progress to Done in drgn Roadmap Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debuginfo Support for debugging information formats enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

2 participants