Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating multiple CUs in a single pass with libdwarfp #202

Open
Victorious3 opened this issue Nov 15, 2023 · 14 comments
Open

Generating multiple CUs in a single pass with libdwarfp #202

Victorious3 opened this issue Nov 15, 2023 · 14 comments
Assignees
Labels
enhancement New feature or request

Comments

@Victorious3
Copy link

I'm trying to use libdwarfp to generate debug information for my compiler project.
The output is used to create an ELF object file. Now, I do have multiple compilation units which all end up in a single file. This seems to work fine for the most part by calling dwarf_producer_init for every compilation unit and dealing with the data accordingly.
Now the problem I ran into was that in the dwarf header for a compilation unit there's an offset into the abbrev section. Having 0 here works just fine for the first CU but when I add multiple ones (simply concatinating the sections) it doesn't yield the correct result.
I could probably patch the debug_info section as a workaround but maybe it would be a good idea to add a way of specifying the right offset with the api.

@davea42
Copy link
Owner

davea42 commented Nov 16, 2023

As you have already guessed I had not considered the idea of creating multiple compile units
from a single libdwarfp run.

It was originally for an SGI compiler that only emitted one CU per compile.
The offsets for abbrev and line table at least will be affected.

I will take a look at this, but likely not today (sorry).

@Victorious3
Copy link
Author

Another way to patch this would probably be a relocation, and in fact there's one emitted by dwarf_get_relocation_info that points at the abbrev section.
I have tried to do symbolic relocations but what I've noticed is that there seems to be no addend anywhere to be found in the structure for those (Dwarf_Relocation_Data_s). How are these handled? What I can get right now is this:

RELOCATION RECORDS FOR [.debug_info]:
OFFSET           TYPE              VALUE
0000000000000006 R_X86_64_32       .debug_abbrev
000000000000001a R_X86_64_32       .debug_str
000000000000001e R_X86_64_32       .debug_str
0000000000000023 R_X86_64_32       .debug_str
00000000000000a1 R_X86_64_32       .debug_abbrev
00000000000000b5 R_X86_64_32       .debug_str
00000000000000b9 R_X86_64_32       .debug_str

The first four come from the first CU, the rest from the second CU. I have just incremented the offset by the size of the first debug_info section.
The symbol alone isn't enough though to point it at the correct location.

@davea42
Copy link
Owner

davea42 commented Nov 16, 2023

You are thinking along the right lines. It is a relocation issue.
What is missing is a way to connect things across different CUs.
A sort of global relocation set.
Maybe a new type in Dwarf_Rel_Type.

Anyway, the relocations are done in dwarf_pro_section.c via
function pointers like de_relocate_by_name_symbol()

The section names refer to the symbol table and the value of the symtab entry
is to be added into the section-offset of the value the reloc refers to.
Of course for each section there will be a base address (default all zero)
and as a CU is emitted update the base per section and when relocating
such add in the right base.

For a single CU, yes the section symbol is enough. You are pushing beyond
that, so yes, you need to update a per-section base address. Kind of similar
to what a linker would do... Edit: no, exactly what a linker would do, not
just kind of.

Seems like you already have the idea, I doubt if this helps at all.
DavidA

@Victorious3
Copy link
Author

Maybe I'm missing something. The three relocations into the string table are from strp attributes. I'm using symbolic relocations because I need to patch the offsets.

Shouldn't the relocations be of the form .debug_str + x where x points at the actual string inside that section? That's what I mean with addend. Dwarf_Relocation_Data_s doesn't seem to contain that piece of information.

At least this is what I get if I compile something with gcc.

To clarify, I'm actually using libbfd to emit the final ELF file. It wants an array of relocations which I construct with the information in the relocation data struct. I give it the section symbol and offset but it also wants that addend which I have no idea where to get from.

@davea42
Copy link
Owner

davea42 commented Nov 17, 2023

Since libdwarfp generates
.rel relocations (not .rela) the addend is the value of the symbol involved. See the Elf ABI generic document
for its discussion of _Rel vs _Rela in the Relocation section of the document.

Entries of type _Rel store an implicit addend in the location to be modified.

You could adopt the _Rela approach by always initially inserting 0 in the location to be modified,
I suppose.

@Victorious3
Copy link
Author

Victorious3 commented Nov 17, 2023

Ah that clarifies things! I wasn't aware of how .rel works. Hm. So I need to read at the offset and take the value that has been written there and use it as the addend?

EDIT: That seems to have worked, thanks a lot!

@Victorious3 Victorious3 changed the title Possibility to set a value for abbrev_offset for the generated CU Dealing with multiple CUs in a single pass Nov 17, 2023
@Victorious3
Copy link
Author

Victorious3 commented Nov 17, 2023

Another thing I have noticed, which is related to emitting multiple CUs is that the data for the .debug_str section grows with multiple passes and doesn't reset. So in a way this is already doing what it's supposed to? It just seems to be different from how other sections are affected. It doesn't really deduplicate strings though if they come from another CU.

@davea42
Copy link
Owner

davea42 commented Nov 17, 2023

Well, it's doing more than it was designed to do! If one emits just one CU adding to .debug_str deduplicates, but
I would be shocked to think that worked across CUs. Because it was not designed for that.
Looks like it sets up a simple hash table for strings for a CU and when that
section is emitted it destroys the hash table. So of course will grow...

My only real use of libdwarfp is with dwarfgen and I've built options to dwarfgen to tell it to
invent attributes (out of thin air) because I could find no compiler emitting a particular
DWARF attribute or particular FORM and it seemed difficult
to provoke a compiler to emit an attribute/form.

dwarfgen reads in DWARF from an object,
takes a particular CU N from that object (numbered from zero), reads and and re-emits
that CU DWARF changed somehow (and then stops).
So basically a fake .text .data and a few TAGs and Attributes.
This is a part of the libdwarf-regressiontests project.

It was actually used with SGI compilers emitting DWARF2 (and not 100% of DWARF2!)
in the 1990's. But it has not received any attention for many years.

@Victorious3
Copy link
Author

Small update, it seems like line numbers also need some work. They stay the same between CUs and I can't seem to reset them.

@Victorious3
Copy link
Author

I think a really simple way of allowing this would be having a way to reset all internal state back to the original values. It wouldn't fix the string table but at least I wouldn't get weird offsets for the rest of the run.

@davea42
Copy link
Owner

davea42 commented Dec 20, 2023

Don't understand enough about what your are doing to comment further or
do actually do anything. Sorry. libdwarfp was (as I said before) not designed
to emit multiple CUs. I don't have any idea how I could help, nor what
changes would matter.
Lacking further information I will let this rest, but will eventually close it.

@davea42
Copy link
Owner

davea42 commented Dec 20, 2023

Even if I implemented the output as multiple CUs I have no idea if the result
would be relevant to you...Sorry.

@Victorious3
Copy link
Author

Victorious3 commented Dec 22, 2023

I'm currently experimenting with merging two object files, they do have multiple CUs.
I initially thought this could work as giving the root node multiple siblings, if you wanted to do this properly it could work like that.

I imagine that tools based on this library would sometimes need to write multiple CUs consecutively, even if done with multiple files. In fact I think for my project specifically I only need one CU per file, but I do want to do this multiple times.
There seems to be a lot of global state going on which prevents this from working properly, so I really think that just being able to reset the whole library would already suffice as a (not ideal) solution.

EDIT: If it helps you I have the code for it here: https://github.com/Princess-org/Princess/blob/incremental/merge.pr
It's in my own programming language but it does read like C for most things regarding dealing with a C library.

@davea42 davea42 self-assigned this Mar 1, 2024
@davea42 davea42 changed the title Dealing with multiple CUs in a single pass Generating multiple CUs in a single pass with libdwarfp Mar 1, 2024
@davea42
Copy link
Owner

davea42 commented Mar 1, 2024

I changed the title to reflect the actual topic of the issue: libdwarfp.

@davea42 davea42 added the enhancement New feature or request label Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants