Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Kallsyms Symbol Finder #351

Closed
wants to merge 10 commits into from
Closed

Conversation

brenns10
Copy link
Contributor

@brenns10 brenns10 commented Aug 22, 2023

Now that #241 is no longer a draft, I'm putting the next branch that builds upon it here for easy review. Unfortunately I can't set the base branch to be my own symbol_finder branch, so the PR currently includes the changes from #241 as well.

This branch allows the built-in kallsyms information to be used as a symbol table. For best results, it should be used with CONFIG_KALLSYMS_ALL. This only provides symbols for the kernel: no modules. There are two ways to support this:

  1. For live systems with root permissions, we can directly parse the text contents of /proc/kallsyms. This works on practically any kernel version!
  2. For vmcores (or live systems where /proc/kcore is unavailable, maybe due to permissions, see Add the ability to run drgn against the live kernel as non-root user #347), we can parse the data structures that contain the kallsyms info. This requires some upstream changes which were merged back in v6.0, which add symbol information into the vmcoreinfo note. In particular, if f09bddbd8661 ("vmcoreinfo: add kallsyms_num_syms symbol") is present, then this should work.

The API I used here is to make the kallsyms finder represented as a Python object, which can be registered via add_symbol_finder(). I didn't want to hook into any of the add_debug_info() logic because I wanted maximum flexibility - most people won't want kallsyms, at least not initially. It also has the benefit of avoiding breaking any existing logic.

This can be used on Oracle Linux 7-9 with UEK 5-7, but it can also be used on the vmtest kernels, which serves as a good way to explore:

$ python3 -m vmtest.vm -k 6.4*  # any kernel 6.0 or later will do
[... boot output...]
# umount /lib/modules/$(uname -r)
# python -m drgn
>>> finder = make_kallsyms_vmlinux_finder(prog)
>>> finder("slab_caches", None, True)
[Symbol(name='slab_caches', address=0xffffffffa58e20a0, size=0x20, binding=<SymbolBinding.GLOBAL: 2>, kind=<SymbolKind.OBJECT: 1>)]
>>> prog.add_symbol_finder(finder)
>>> prog.symbol("slab_caches")
Symbol(name='slab_caches', address=0xffffffffa58e20a0, size=0x20, binding=<SymbolBinding.GLOBAL: 2>, kind=<SymbolKind.OBJECT: 1>)

No automatic testing just yet (waiting on Symbol Finder API to be stabilized and merged). However, it will be interesting to test, since ideally we would want to test the text-based and vmcore-based parsing methods. I may want to add a toggle to allow bypassing /proc/kcore so that we can test the other method.


Some notes on fixes / To-dos for this branch:

  • Obviously wait for Pluggable Symbol finder API, with Python support #241 to be merged
  • If Add the ability to run drgn against the live kernel as non-root user #347 is merged, I need to be careful to detect a /proc/kallsyms where all the addresses are zero, and bail out of that code path, since non-root users can still read it without memory addresses.
  • In hindsight, I think the KallsymsFinder() constructor is bad. I wanted to move additional parsing of the vmcoreinfo note out into the Python code. Now, I think libdrgn/kallsyms.c should be able to find the necessary information from the vmcoreinfo without the Python helper code.
  • There's some compatibility issues with new "long symbol" support for Rust. The kallsyms data structure format breaks with no indication. Currently I am detecting this via the kernel major version, but I really ought to send a patch with Fixes: tag to specify a NUMBER(kallsyms_version)=2 in the vmcoreinfo, that way we could detect it without version number hacks.
  • Address lookup is reasonably efficient using bsearch(). However the name lookup is currently linear. I need to add a hash table.
  • Note: module support is not planned as part of this branch. Module kallsyms requires type & object finders for vmlinux. I have Python helper code stuffed at the end of my ctf branch which implements a module kallsyms finder. That might be worth porting to C later on.

@brenns10 brenns10 mentioned this pull request Aug 22, 2023
6 tasks
@brenns10 brenns10 force-pushed the kallsyms_finder branch 2 times, most recently from cd84d55 to 73f6a5c Compare October 21, 2023 06:34
@brenns10 brenns10 force-pushed the kallsyms_finder branch 3 times, most recently from bc52003 to 78cec7a Compare March 1, 2024 23:55
brenns10 added 10 commits March 1, 2024 16:46
By using __attribute__((__packed__)), we shrink each enum from the
default integer size of four bytes, down to the minimum size of one.

This reduces the size of drgn_symbol from 32 bytes down to 26, with 6
bytes of padding. It doesn't have a practical benefit yet, but adding
fields to struct drgn_symbol in the future may not increase the size.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Symbol lookup is not yet modular, like type or object lookup. However,
making it modular would enable easier development and prototyping of
alternative Symbol providers, such as Linux kernel module symbol tables,
vmlinux kallsyms tables, and BPF function symbols. To begin with, create
a modular Symbol API within libdrgn, and refactor the ELF symbol search
to use it.

For now, we leave drgn_program_find_symbol_by_address_internal() alone.
Its conversion will require some surgery, since the new API can return
errors, whereas this function cannot.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
The following commit will modify it to use
drgn_program_symbols_search(), a static function declared below. Move it
underneath in preparation. No changes to the function.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
The drgn_program_find_symbol_by_address_internal() function is used when
libdrgn itself may want to lookup a symbol: in particular, when
formatting stack traces or objects. It does less work by possibly
already having a Dwfl_Module looked up, and by avoiding memory
allocation of a symbol, and it's more convenient because it doesn't
return any errors, including on lookup failure.

Unfortunately, the new symbol finder API breaks all of these properties:
the returned symbol is now allocated via malloc() which needs cleanup on
error, and errors can be returned by any finder via the lookup API.
What's more, the finder API doesn't allow specifying an already-known
module. Thankfully, error handling can be improved using the cleanup
API, and looking up a module for an address is usually a reasonably
cheap binary tree operation.

Switch the internal method over to the new finder API. The major
difference now is simply that lookup failures don't result in an error:
they simply result in a NULL symbol.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Now that the symbol finder API is created, we can move the ELF symbol
implementation into the debug_info.c file, where it more logically
belongs. The only change to these functions in the move is to declare
elf_symbols_search as static.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Previously, Symbol objects could not be constructed in Python. However,
in order to allow Python Symbol finders, this needs to be changed.
Unfortunately, Symbol name lifetimes are tricky to manage. We introduce
a lifetime enumeration to handle this. The lifetime may be "static",
i.e. longer than the life of the program; "external", i.e. longer than
the life of the symbol, but no guarantees beyond that; or "owned", i.e.
owned by the Symbol itself.

Symbol objects constructed in Python are "external". The Symbol struct
owns the pointer to the drgn_symbol, and it holds a reference to the
Python object keeping the name valid (either the program, or a PyUnicode
object).

The added complexity is justified by the fact that most symbols are from
the ELF file, and thus share a lifetime with the Program. It would be a
waste to constantly strdup() these strings, just to support a small
number of Symbols created by Python code.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Expose the Symbol finder API so that Python code can be used to lookup
additional symbols by name or address.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Specify a "fake" symbol finder and then test that its results are
plumbed through the API successfully. While this is a contrived test, it
helps build confidence in the plumbing of the API.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
The Linux kernel can be configured to include kallsyms, a built-in
compressed symbol table which is also exposed at /proc/kallsyms. The
symbol table contains most (but not all) of the ELF symbol table
information. It can be used as a Symbol finder.

The kallsyms information can be extracted in two ways: for live systems
where we have root access, the simplest approach is to simply read
/proc/kallsyms. For vmcores, or live systems where we are not root, we
must parse the data from the vmcore, which is significantly more
involved.

To avoid tying the kallsyms system too deeply into the drgn internals,
the finder is exposed as a Python class, which must be created using
symbol information from the vmcoreinfo. Attaching the KallsymsFinder to
the program will attach the underlying C function, so we can avoid some
of the inefficiencies of the Python API.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
@brenns10
Copy link
Contributor Author

Closing this because it's got a noisy history and outdated description. I will create a new pull request with the kallsyms code.

@brenns10 brenns10 closed this Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant