-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Module API #332
Comments
This will simplify the implementation of the module API (#332). Signed-off-by: Omar Sandoval <osandov@osandov.com>
This will simplify the implementation of the module API (#332). Signed-off-by: Omar Sandoval <osandov@osandov.com>
This will simplify the implementation of the module API (osandov#332). Signed-off-by: Omar Sandoval <osandov@osandov.com>
In my branch for the module API (#332), I want to log an error without any additional context. Passing an empty format string causes a "zero-length gnu_printf format string" warning from GCC, and passing NULL crashes in vsnprintf(). Empty format strings are totally valid, but NULL clearly isn't, so annotate the format parameter as non-NULL and disable -Wformat-zero-length. Signed-off-by: Omar Sandoval <osandov@osandov.com>
One other feature to consider, which doesn't exactly fit in with the API as it is currently implemented in my branch, is supporting plugins to get debug info. I.e., some configuration file that defines some way to get debug info on a particular system/distro. |
I've done a little bit of thinking about debuginfo in drgn-tools, and one thing I've found useful is splitting the concept into "finding" and "fetching" debuginfo. For "finding", the assumption is that the files exist on the filesystem if you know where to look. Drgn does this well. But for example on our analysis systems, we have an NFS mount that contains a bunch of vmlinux/ko files. That's a nonstandard location so it's nice to have a "finder" for that. The "fetching" falls into the same category as debuginfod: the files are either in a remote location, or require a lengthy extraction process to find. For instance, I have two fetcher implementations now: one which can find kernel RPM debuginfo packages, and download and extract them, and another for internal analysis systems which finds the RPM on a (different) NFS share and does the same. The important thing about fetching is to place the newly created files into a location that the finder will find next time :) I find the separation useful, but it could be a bit tied to our use cases! |
I'm working on hammering out the remaining bits of this now. Re: "finding" vs "fetching", at least for debuginfod, the API is that you call |
We currently only have one test resource file, sample.coredump.zst, but the tests for #332 will add more. Create a package, tests.resources, to contain test resources and a function, get_resource(), to decompress them. It can also be used on the command line: python3 -m tests.resources $resource_name Signed-off-by: Omar Sandoval <osandov@osandov.com>
I only really meant that it's nice to be able to check whether a request for debuginfo can be satisfied quickly, before committing to doing a long, blocking call. If debuginfod provides a way to check for cached debuginfo only, then it would be nice to have that option. But obviously whatever is easiest to implement, and if there's something I'd like to see, I could probably take a look at adding it too :) |
drgn currently provides limited control over how debugging information is found. drgn has hardcoded logic for where to search for debugging information. The most the user can do is provide a list of files for drgn to try in addition to the default locations (with the -s CLI option or the drgn.Program.load_debug_info() method). The implementation is also a mess. We use libdwfl, but its data model is slightly different from what we want, so we have to work around it or reimplement its functionality in several places: see commits e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn: debug_info: report userspace core dump debug info ourselves"), and 1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for some examples. The mismatched combination of libdwfl and our own code is difficult to maintain, and the lack of control over the whole debug info pipeline has made it difficult to fix several longstanding issues. The solution is a major rework removing our libdwfl dependency and replacing it with our own model. This (huge) commit is that rework comprising the following components: - drgn.Module/struct drgn_module, a representation of a binary used by a program. - Automatic discovery of the modules loaded in a program. - Interfaces for manually creating and overriding modules. - Automatic discovery of debugging information from the standard locations and debuginfod. - Interfaces for custom debug info finders and for manually overriding debugging information. - Tons of test cases. A lot of care was taken to make these interfaces extremely flexible yet cohesive. The existing interfaces are also reimplemented on top of the new functionality to maintain backwards compatibility, with one exception: drgn.Program.load_debug_info()/-s would previously accept files that it didn't find loaded in the program. This turned out to be a big footgun for users, so now this must be done explicitly (with drgn.ExtraModule/--extra-symbols). The API and implementation both owe a lot to libdwfl: - The concepts of modules, module address ranges/section addresses, and file biases are heavily inspired by the libdwfl interfaces. - Ideas for determining modules in userspace processes and core dumps were taken from libdwfl. - Our implementation of ELF symbol table address lookups is based on dwfl_module_addrinfo(). drgn has taken these concepts and fine-tuned them based on lessons learned. Credit is also due to Stephen Brennan for early testing and feedback. Closes #16, closes #25, closes #332. Signed-off-by: Omar Sandoval <osandov@osandov.com>
I just pushed what I expect to be the final version of this branch (there will of course by lots of followups enabled by the new API). My plan is to cut a new release tomorrow, then merge this branch to kick off the next release cycle. |
drgn currently provides limited control over how debugging information is found. drgn has hardcoded logic for where to search for debugging information. The most the user can do is provide a list of files for drgn to try in addition to the default locations (with the -s CLI option or the drgn.Program.load_debug_info() method). The implementation is also a mess. We use libdwfl, but its data model is slightly different from what we want, so we have to work around it or reimplement its functionality in several places: see commits e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn: debug_info: report userspace core dump debug info ourselves"), and 1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for some examples. The mismatched combination of libdwfl and our own code is difficult to maintain, and the lack of control over the whole debug info pipeline has made it difficult to fix several longstanding issues. The solution is a major rework removing our libdwfl dependency and replacing it with our own model. This (huge) commit is that rework comprising the following components: - drgn.Module/struct drgn_module, a representation of a binary used by a program. - Automatic discovery of the modules loaded in a program. - Interfaces for manually creating and overriding modules. - Automatic discovery of debugging information from the standard locations and debuginfod. - Interfaces for custom debug info finders and for manually overriding debugging information. - Tons of test cases. A lot of care was taken to make these interfaces extremely flexible yet cohesive. The existing interfaces are also reimplemented on top of the new functionality to maintain backwards compatibility, with one exception: drgn.Program.load_debug_info()/-s would previously accept files that it didn't find loaded in the program. This turned out to be a big footgun for users, so now this must be done explicitly (with drgn.ExtraModule/--extra-symbols). The API and implementation both owe a lot to libdwfl: - The concepts of modules, module address ranges/section addresses, and file biases are heavily inspired by the libdwfl interfaces. - Ideas for determining modules in userspace processes and core dumps were taken from libdwfl. - Our implementation of ELF symbol table address lookups is based on dwfl_module_addrinfo(). drgn has taken these concepts and fine-tuned them based on lessons learned. Credit is also due to Stephen Brennan for early testing and feedback. Closes #16, closes #25, closes #332. Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn currently provides limited control over how debugging information is found. drgn has hardcoded logic for where to search for debugging information. The most the user can do is provide a list of files for drgn to try in addition to the default locations (with the -s CLI option or the drgn.Program.load_debug_info() method). The implementation is also a mess. We use libdwfl, but its data model is slightly different from what we want, so we have to work around it or reimplement its functionality in several places: see commits e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn: debug_info: report userspace core dump debug info ourselves"), and 1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for some examples. The mismatched combination of libdwfl and our own code is difficult to maintain, and the lack of control over the whole debug info pipeline has made it difficult to fix several longstanding issues. The solution is a major rework removing our libdwfl dependency and replacing it with our own model. This (huge) commit is that rework comprising the following components: - drgn.Module/struct drgn_module, a representation of a binary used by a program. - Automatic discovery of the modules loaded in a program. - Interfaces for manually creating and overriding modules. - Automatic discovery of debugging information from the standard locations and debuginfod. - Interfaces for custom debug info finders and for manually overriding debugging information. - Tons of test cases. A lot of care was taken to make these interfaces extremely flexible yet cohesive. The existing interfaces are also reimplemented on top of the new functionality to maintain backwards compatibility, with one exception: drgn.Program.load_debug_info()/-s would previously accept files that it didn't find loaded in the program. This turned out to be a big footgun for users, so now this must be done explicitly (with drgn.ExtraModule/--extra-symbols). The API and implementation both owe a lot to libdwfl: - The concepts of modules, module address ranges/section addresses, and file biases are heavily inspired by the libdwfl interfaces. - Ideas for determining modules in userspace processes and core dumps were taken from libdwfl. - Our implementation of ELF symbol table address lookups is based on dwfl_module_addrinfo(). drgn has taken these concepts and fine-tuned them based on lessons learned. Credit is also due to Stephen Brennan for early testing and feedback. Closes #16, closes #25, closes #332. Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn currently provides limited control over how debugging information is found. drgn has hardcoded logic for where to search for debugging information. The most the user can do is provide a list of files for drgn to try in addition to the default locations (with the -s CLI option or the drgn.Program.load_debug_info() method). The implementation is also a mess. We use libdwfl, but its data model is slightly different from what we want, so we have to work around it or reimplement its functionality in several places: see commits e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn: debug_info: report userspace core dump debug info ourselves"), and 1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for some examples. The mismatched combination of libdwfl and our own code is difficult to maintain, and the lack of control over the whole debug info pipeline has made it difficult to fix several longstanding issues. The solution is a major rework removing our libdwfl dependency and replacing it with our own model. This (huge) commit is that rework comprising the following components: - drgn.Module/struct drgn_module, a representation of a binary used by a program. - Automatic discovery of the modules loaded in a program. - Interfaces for manually creating and overriding modules. - Automatic discovery of debugging information from the standard locations and debuginfod. - Interfaces for custom debug info finders and for manually overriding debugging information. - Tons of test cases. A lot of care was taken to make these interfaces extremely flexible yet cohesive. The existing interfaces are also reimplemented on top of the new functionality to maintain backwards compatibility, with one exception: drgn.Program.load_debug_info()/-s would previously accept files that it didn't find loaded in the program. This turned out to be a big footgun for users, so now this must be done explicitly (with drgn.ExtraModule/--extra-symbols). The API and implementation both owe a lot to libdwfl: - The concepts of modules, module address ranges/section addresses, and file biases are heavily inspired by the libdwfl interfaces. - Ideas for determining modules in userspace processes and core dumps were taken from libdwfl. - Our implementation of ELF symbol table address lookups is based on dwfl_module_addrinfo(). drgn has taken these concepts and fine-tuned them based on lessons learned. Credit is also due to Stephen Brennan for early testing and feedback. Closes #16, closes #25, closes #332. Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn currently provides limited control over how debugging information is found:
drgn.Program.load_debug_info()
allows specifying a list of files that drgn will try to use, but that's it. drgn has built-in logic for where to search for debugging information by default; this is a custom implementation for the Linux kernel, a partial implementation for userspace core dumps, and libdwfl for live userspace processes. These all have issues, and really need to be unified and more flexible.The solution to this is an API that exposes the main executable and every shared library, loadable kernel module, etc. as a "module". We can then allow providing debugging information per module, and even allow the user to create modules in case drgn gets it wrong. The existing
load_debug_info()
API will then be re-implemented on top of this API.This will also solve or add the flexibility to enable a bunch of related issues: #16, #17, #25.
I'm working on this in the modules branch.
The text was updated successfully, but these errors were encountered: