-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new relocation for landing pad #452
base: complex-label-lp
Are you sure you want to change the base?
Changes from all commits
db7c38a
0726ba1
1e21e42
5d43b57
32688be
02546de
bc7fd87
6e16388
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -548,7 +548,9 @@ Description:: Additional information about the relocation | |
<| S - P | ||
.2+| 65 .2+| TLSDESC_CALL .2+| Static | .2+| Annotate call to TLS descriptor resolver function, `%tlsdesc_call(address of %tlsdesc_hi)`, for relaxation purposes only | ||
<| | ||
.2+| 66-190 .2+| *Reserved* .2+| - | .2+| Reserved for future standard use | ||
.2+| 66 .2+| LPAD .2+| Static | .2+| Annotates the landing pad instruction inserted at the beginning of the function. The addend indicates the label value of the landing pad, and the symbol value is the address of the mapping symbol for the function signature, which will have the same address as the function. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this relocation only for the func-sig scheme? Based on its description, it looks like so, but the following There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That should also work for unlabeled scheme as well, let me think how to make it clearly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this LPAD relocation intend to serve the "real" purpose of a relocation? That is, ask linkers to fill-in some value (in this case, the label of the lpad instructions) to some offset at link time. When prototyping this relocation in LLVM, it appears to me that the LLVM backend assumes places to be relocated have a placeholder value 0 encoded, so to emit the LPAD relocations, 0s would have to be encoded at the label locations in relocatable files. This can of course be changed to encode the correct label along with relocation emitted, but I want to ask if this change is needed, or I can rely on linkers to fill-in the correct labels when relocating at static time. |
||
<| | ||
.2+| 67-190 .2+| *Reserved* .2+| - | .2+| Reserved for future standard use | ||
<| | ||
.2+| 191 .2+| VENDOR .2+| Static | .2+| Paired with a vendor-specific relocation and must be placed immediately before it, indicates which vendor owns the relocation. | ||
<| | ||
|
@@ -1210,6 +1212,7 @@ The defined processor-specific section types are listed in <<rv-section-type>>. | |
| Name | Value | Attributes | ||
|
||
| SHT_RISCV_ATTRIBUTES | 0x70000003 | none | ||
| SHT_RISCV_LADING_PAD_INFO | 0x70000004 | none | ||
|=== | ||
|
||
==== Special Sections | ||
|
@@ -1224,12 +1227,16 @@ The defined processor-specific section types are listed in <<rv-section-type>>. | |
| Name | Type | Attributes | ||
|
||
| .riscv.attributes | SHT_RISCV_ATTRIBUTES | none | ||
| .riscv.lpadinfo | SHT_RISCV_LADING_PAD_INFO | none | ||
| .riscv.jvt | SHT_PROGBITS | SHF_ALLOC + SHF_EXECINSTR | ||
| .note.gnu.property | SHT_NOTE | SHF_ALLOC | ||
|=== | ||
|
||
+++.riscv.attributes+++ names a section that contains RISC-V ELF attributes. | ||
|
||
+++.riscv.lpadinfo+++ names a section that contains RISC-V landing pad | ||
information, which used for generating PLT and also can be used for debugging. | ||
|
||
+++.riscv.jvt+++ is a linker-created section to store table jump | ||
target addresses. The minimum alignment of this section is 64 bytes. | ||
|
||
|
@@ -1568,6 +1575,51 @@ the `Zicfilp` extension. An executable or shared library with this bit set is | |
required to generate PLTs with the landing pad (`lpad`) instruction, and all | ||
label are set to a value which hashed from its function signature. | ||
|
||
=== Landing Pad Information Section (`.riscv.lpadinfo`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to specify section header info, including the |
||
|
||
Landing pad information section is a section that contains the nessary information | ||
for generating function signature based landing pad PLT, this section also may | ||
exsiting when the unlabeled landing pad scheme is used. | ||
Comment on lines
+1580
to
+1582
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Change to: The landing pad information section is a section that contains the information to generate PLT with landing pads. The static linker is required to use the landing pad values provided by this section when generating This change allows us to:
Update: Exclude the unlabeled CFI scheme from adopting |
||
|
||
This section is consist by the entries of the following structure: | ||
|
||
``` | ||
typedef struct | ||
{ | ||
Elf32_Word lpi_name; /* Symbol name (string tbl index) */ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious: Why isn't this the index of symbol in the symbol table? The symbol table index provides a more definite reference to the symbol, considering the case of repeated symbol names (though this may not really happen with function symbols as its an error to globally define a function more than once.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've come up with a case that would cause the current format to fail: // public_foo.c
void foo() {}
// private_foo_1.c
static int foo(int i) { return i; }
void *get_foo_1() { return foo; }
// private_foo_2.c
static char foo(char c) { return c; }
void *get_foo_2() { return foo; } Compile them to a shared library, and you get 1 GLOBAL FUNC and 2 LOCAL FUNC To support showing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TL;DR: I will update to the symbol index in next update. And let me tell the story why I use the symbol name before: In the earlier version, we have mapping symbol at the beginning of the function, so local function could use that only, but we gonna to remove the mapping symbol, so that means local symbol may also rely this mechanism to record the signature. Recording the symbol name is fine before since global symbol has unique symbol name (in theory), and local symbol may have duplicated symbol name, but it's not a problem before. The concern for using the symbol index is: So the conclusion is: I think we should use symbol index here, rather than symbol name as your suggestion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for your explanation, and sorry for ruining your original design about mapping symbol 😥 The main reason I was against the mapping symbol approach was that it's intended usage was obscure. It only had a reference from the LPAD relocation that is intended to provide the disassembling string for the lpad label, but considering that LPAD relocations would be gone after static linking, this solution looks incomplete (it only provides the disassembling string for relocatables, but not for executables and shared libraries), so it didn't convince me that we should adopt the mapping symbol approach. But if how the mapping symbols were associated to the lpad insns is clarified and unified by dropping the reference in the LPAD relocation, and solely relies on the mapping symbols having the
The only disadvantage (I can come up with) of the updated mapping symbol approach would be a bloated symbol table, but symbol tables are strippable in the final executables/shared libraries, so it does not matter much in production after all. Saying so much, would you mind to clarify how the mapping symbol approach was supposed to work, and let's reconsider it (and the format Thank you 🙇♂️ |
||
Elf32_Word lpi_sig; /* Signature for the symbol (string tbl index) */ | ||
kito-cheng marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Elf32_Word lpi_value; /* Landing pad value for the symbol */ | ||
} Elf32_Lpadinfo; | ||
kito-cheng marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
typedef struct | ||
{ | ||
Elf64_Word lpi_name; /* Symbol name (string tbl index) */ | ||
Elf64_Word lpi_sig; /* Signature for the symbol (string tbl index) */ | ||
kito-cheng marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Elf64_Word lpi_value; /* Landing pad value for the symbol */ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lpad labels are only 20-bit wide, so we can use |
||
} Elf64_Lpadinfo; | ||
``` | ||
|
||
The `lpi_name` field is the index into the string table for the symbol name, | ||
the `lpi_signature` field is the index into the string table for the function | ||
signature, it can be 0 if the signature string is not present, | ||
and the `lpi_value` field is the landing pad value for the symbol. | ||
|
||
The string hold by `lpi_signature` field is the function signature string, which | ||
is encoded as same as the mapping symbol of the function signature. | ||
|
||
NOTE: Using same encoding as mapping symbol aims to reduce the size of the | ||
string table | ||
Comment on lines
+1607
to
+1611
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This means the signatures stored start with If the goal is to save bytes in the string table, I think we can always use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Drop mapping symbol so this paragraph will be removed |
||
|
||
Every symbol with global or weak bind must has a corresponding entry in this | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you mean "Every function symbol with global or weak"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh yeah, should just restrict to symbol with function type There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Update: No need to have the above requirement, since trimming down the size of the |
||
section, the `lpi_name` field must be the same as the symbol name string table | ||
index. | ||
kito-cheng marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This section can be discard after static linking stage. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since you mentioned that "Every symbol with global or weak bind must has a corresponding entry", I think it implies that the lpad labels are provided by the object file that defines the function, right? If this is the case, we can't discard this section after static linking when creating a shared library, since library users would expect to find lpad labels later when linking against this share library. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can still know the signature/landing pad label value when we reference to a symbol which is undefined yet, because we always need declare the prototype in the source code. "Every symbol with global or weak bind must has a corresponding entry" -> we didn't exclude the undefined symbol, so we can link to the shared library without that info There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If labels can be provided by the object that uses but doesn't define the function, why require labels to be there in the defining object ("Every symbol with global or weak bind must has a corresponding entry")? For the sake of checking if the use-site and define-site agree on the same lpad label? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's still better to put the Case 1: When writing assembly by hand, the pseudo Case 2: Compiler inserts calls to builtins or instrumentations. These extra function calls are often not inserted at the C source level, but at the compiler IR level. This makes knowing the C prototype of the called target hard as there would not be a C language data structure to represent the called target in the AST. In these cases, it's not easy to obtain the C language prototype, as the only assumption in these cases is that there would be a defined symbol to which the linker can resolve the called target. Considering these cases, I would still prefer to have the shared libraries provide the lpad labels for functions defined by them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with you, so...I just say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm, if this is the intention, I would suggest you make it clear that it means the section is strip-able, and does not mean that the section can be discarded under all circumstances after static linkage. I believe if a user strips his binary, he should know the consequence, so marking a section that is only strip-able under certain situations strip-able is acceptable. My concern is that if this sentence is considered by a linker implementation to be the spec of linker behavior that uniformly allows the section to be dropped after static linkage, the spec is flawed, since when producing shared libraries, the section cannot be dropped or otherwise the library could risk linkage failure to relocatables. |
||
|
||
Static linker should emit error if objects with same symbol but different | ||
landing pad value are beging merged, however it may suppress the error if | ||
linker enable the landing pad schem relaxation. | ||
Comment on lines
+1619
to
+1621
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it beneficial that we also implement this check in the dynamic linker for dynamic symbols? E.g. Abort the program if label mismatches are found at program startup or |
||
|
||
=== Mapping Symbol | ||
|
||
The section can have a mixture of code and data or code with different ISAs. | ||
|
@@ -1582,6 +1634,7 @@ A number of symbols, named mapping symbols, describe the boundaries. | |
| $x.<any> | ||
| $x<ISA> .2+| Start of a sequence of instructions with <ISA> extension. | ||
| $x<ISA>.<any> | ||
| $s<function-signature-string> | Marker for the landing pad instruction. This should only be used with the function signature-based scheme and should be placed only at the beginning of the function. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't quite get the purpose of this mapping symbol: It looks like the only reference to these symbols come from the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's kinda debugging propose only, so it safe to strip like all other mapping symbols There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the purpose is to display function signatures when disassembling, this mechanism seems a bit incomplete (?) I suppose since the relocation is a static one, it would not stay in the binary after static linking, thus if a user disassembles a linked ELF, it's still the label numbers instead of signatures that get displayed? Update: Assuming it's relying on the mapping symbol having the same address as the lpad insn to associate an lpad insn to a function signature (so that the signature can be displayed when disassembling a linked binary), why do relocations refer to these symbols? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the purpose of the mapping symbol is to provide debug/disassembling info, I think after the introduction of the |
||
|=== | ||
|
||
The mapping symbol should set the type to `STT_NOTYPE`, binding to `STB_LOCAL`, | ||
|
@@ -2317,6 +2370,96 @@ instructions. It is recommended to initialize `jvt` CSR immediately after | |
csrw jvt, a0 | ||
---- | ||
|
||
==== Landing Pad Relaxation | ||
|
||
Target Relocation::: R_RISCV_LPAD | ||
|
||
Description:: This relaxation type allows the `lpad` instruction to be removed. | ||
However, if `R_RISCV_RELAX` is not present, the `lpad` instruction can only be | ||
replaced with a sequence of `nop` instructions of the same length as the | ||
original instruction. | ||
|
||
Description:: This relaxation type can relax lpad instruction into a none, | ||
which removed the lpad instruction. | ||
This relaxation type can be performed even without `R_RISCV_RELAX`, | ||
but the linker should pad nop instruction to the same length of the original | ||
instruction sequence. | ||
kito-cheng marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Condition:: This relaxation can only be applied if the symbol is **NOT** | ||
exported to the dynamic symbol table and is only referenced by `R_RISCV_CALL` | ||
or `R_RISCV_CALL_PLT` relocations. If the symbol is exported or referenced by | ||
other relocations, relaxation cannot be performed. | ||
|
||
Relaxation:: | ||
- Lpad instruction associated with `R_RISCV_LPAD` can be removed. | ||
- Lpad instruction associated with `R_RISCV_LPAD` can be replaced with nop | ||
instruction if the relacation isn't paired with `R_RISCV_RELAX`. | ||
|
||
Example:: | ||
+ | ||
-- | ||
Relaxation candidate: | ||
[,asm] | ||
---- | ||
lpad 0x123 # R_RISCV_LPAD, R_RISCV_RELAX | ||
---- | ||
|
||
Relaxation result: | ||
[,asm] | ||
---- | ||
# No instruction | ||
---- | ||
Can be relaxed into `nop` if no `R_RISCV_RELAX` is paired with `R_RISCV_LPAD`. | ||
[,asm] | ||
---- | ||
nop | ||
---- | ||
-- | ||
|
||
==== Landing Pad Scheme Relaxation | ||
|
||
Target Relocation::: R_RISCV_LPAD | ||
|
||
Description:: This relaxation type allows an `lpad` instruction to be relaxed | ||
into `lpad 0`, which is a universal landing pad that ignores the label value | ||
comparison. This relaxation is used when the label value is not computed | ||
correctly. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what would be the cases where a label may be computed incorrectly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some legacy programs don’t properly declare function prototypes before calling them. In these cases, the compiler will infer a function prototype based on the language standards, but it often ends up being incorrect. One common example is dhrystone[1]. In most versions you find online, Func_2 isn’t declared before it’s called, so the compiler will assume the prototype is [1] https://github.com/sifive/benchmark-dhrystone/blob/master/dhry_1.c#L164 Another common potential issue in C is with qsort. Function pointers can be compatible but not perfectly match the expected type. For example, here’s how qsort is declared: void qsort(void* ptr, size_t count, size_t size, int (*comp)(const void*, const void*)); But in practice, you can pass in a compatible, but not exactly matching, type for the comparison function, and it works in most cases: #include <stdlib.h>
int compare(int *a, int *b) // The signature isn’t int (*)(const void*, const void*)
{
return *(int *)a - *(int *)b;
}
void foo(int *x, size_t count, size_t size)
{
qsort(x, count, size, compare); // But in practice, this works fine
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But how is the linker expected to know the incorrectness so it can perform this relaxation? The Zicfilp mechanism is employed when issuing an indirect call through function pointers, and when calling functions through PLT: In the first case (indirect calls through pointers), to know that an lpad insn needs to be relaxed to In the second case (calls through PLT), the indirect call happens in the PLT, which is generated by linkers. The label which linkers use to generate PLT would come from the addend of the The above is my guess and understanding of the intended usage of this relaxation. If we're not on the same page, please do let me know. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Linker never know (or not always know), and also that's not the right layer to analysis (or guess:P ), so I expect that relaxation should only enabled when user pass something like |
||
|
||
Condition:: This relaxation can be performed without `R_RISCV_RELAX`, and | ||
should not be enabled by default. The user must explicitly enable this | ||
relaxation. Additionally, if this relaxation is applied, it must be applied | ||
consistently to all `R_RISCV_LPAD` relocations in the entire binary. | ||
|
||
Relaxation:: | ||
- Lpad instruction associated with `R_RISCV_LPAD` will be replaced with | ||
`lpad 0`. | ||
- The GNU property must be adjusted to reflect the use of this relaxation. | ||
- The format of the PLT entries must also be adjusted accordingly. | ||
|
||
Example:: | ||
+ | ||
-- | ||
Relaxation candidate: | ||
[,asm] | ||
---- | ||
lpad 0x123 # R_RISCV_LPAD | ||
---- | ||
|
||
Relaxation result: | ||
[,asm] | ||
---- | ||
lpad 0 | ||
---- | ||
-- | ||
|
||
NOTE: This relaxation is designed to be compatible with legacy programs that | ||
may not declare the function signature correctly. | ||
|
||
NOTE: Dependent shared libraries will not undergo the corresponding | ||
transformation. Therefore, if this Landing Pad Scheme Relaxation is used in a | ||
dynamically linked environment, ensure that all dependent shared libraries are | ||
rebuilt with the corresponding version. | ||
|
||
[bibliography] | ||
== References | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "label value of the landing pad?