ELF loading may not be necessary and increases complexity and vulnerabilities #23620

jackcmay · 2022-03-11T20:54:40Z

Problem

ELF loading may not be necessary and increases complexity and vulnerabilities

Solana programs are linked as ELF files which are loaded by the runtime. ELF was chosen because it is a known and common executable format and there were future intentions of runtime linking with other support libraries, etc...

ELF loading consumes compute time and expands the attack surface considerably. These factors could be significantly reduced by using a use-case focused and simplified format.

Proposed Solution

Solana has a fairly constrained and simple runtime environment and the following points lend it to a simpler executable

Fixed and known memory map
Known set of external linkable symbols
No read-write data

Because of this, a Solana executable could instead be as simple as:

Header
Position-independent code bits
- Fully resolved except for syscalls
RO data

Doing so would eliminate:

ELF verification complexity
ELF symbol and hash tables and the processing of
ELF relocation tables and the processing of
Read-only executable code modification by relocations
Dependency on 3rd party ELF parser

Changing to the new executable format could be done independently of SBFv2 as it is a change to the executable format and not the bytecode format. A suitable off-the-shelf format could be used but since the new format is so simple it should probably be a new format, maybe Solana Executable Format (SEF).

jackcmay · 2022-03-11T20:55:55Z

@Lichtso and I were discussing this, @alessandrod @dmakarov What do you guys think, if in favor can you two pick it up?

dmakarov · 2022-03-11T21:14:49Z

ELF is also a container for debug information. I guess we could still generate both ELF for debugging purposes and somewhat raw binary for loading by the runtime loader.

jackcmay · 2022-03-11T21:28:19Z

Good point, the ELF and SEF would have to align vaddr wise in order for the debug info to be useful. Maybe we can optionally add the dwarf to the SEF?

dmakarov · 2022-03-11T21:37:21Z

I worked on a system where we loaded a raw binary into a simulator or on a physical board, but used a separate ELF file to load into lldb. We had a debug server that communicated with the simulator or the board, and it worked quite well. The debug server knew how to map the symbols to addresses when it interacted with the board. I think it shouldn't be necessary to complicate our custom binary format with debug info, as long as we can still generate an elf file with the same code.

jackcmay · 2022-03-11T21:42:16Z

Sure, same code and same memory layout

CherryWorm · 2022-03-12T21:22:36Z

We're currently working on a security.txt like standard that currently uses a custom ELF section in https://github.com/neodyme-labs/solana-security-txt . It still works without that section because the actualy data is stored in .ro, so this proposal wouldn't break anything, but it would be awesome if this new format would allow for custom key-value attributes like name, project link, security policy and so on

alessandrod · 2022-03-14T05:10:09Z

Doing so would eliminate:

* ELF verification complexity

* ELF symbol and hash tables and the processing of

* ELF relocation tables and the processing of

* Read-only executable code modification by relocations

* Dependency on 3rd party ELF parser

Wouldn't we be able to achieve all this, keeping ELF but switching to emitting statically linked object files instead of shared objects like we do now? And as we've been discussing we could write a very simple ELF parser and stop using goblin.

I think the reason for keeping ELF would be ecosystem tooling. Right now you can use objdump, readelf, ghidra, r2, cargo-bloat, elfcat, etc to debug profile and optimize binaries (and I regularly use all of those). With a new format we'd break all those.

Also while starting from scratch is nice, I'm worried that we'd start simple, then add one extra thing at a time (DWARF, also see @CherryWorm's comment already) and slowly end up with frankenstein-ELF with another name.

jackcmay · 2022-03-14T05:55:18Z

Yeah, good points, I do like the tooling support for ELFs.

The question is with ELF can we do the following for executables (debug versions would have string/symbol/dwarf):

PI code with no relocations, just unresolved syscall symbols and even those we could use call x where x is a unique syscall id.
No need for hash or symbol/string tables, want just a header, code bits, and RO bits.
We could support cherryworm's suggestion by continuing to allow note sections

dmakarov · 2022-03-16T23:27:03Z

Also while starting from scratch is nice, I'm worried that we'd start simple, then add one extra thing at a time (DWARF, also see @CherryWorm's comment already) and slowly end up with frankenstein-ELF with another name.

I share these concerns. The tooling for working BPF programs isn't good now, but with the custom or raw binary format it will be next to non-existent. Adding more and more information to a raw binary we'll end up with a bad substitute for ELF.

Lichtso · 2022-03-17T07:09:31Z

I am also fine with sticking with a minimal subset of ELF.
That is we support only a very small selected subset of its features and make our own parser.
Because maintaining the dependency chain of the current ELF parser is already killing us.

Lichtso · 2022-03-22T18:05:23Z

Have been working on generating static linked executables with our BPF toolchain today.

linker_script.ld

PHDRS {
    /*header PT_PHDR FILEHDR PHDRS;*/
    prog PT_LOAD FLAGS(5);
    stack PT_GNU_STACK FLAGS(6);
}

SECTIONS {
    .text 0x100000000 : {
        *(.text*)
    } :prog
    .rodata : {
        *(.rodata*)
    } :prog
    .stack 0x200000000 (NOLOAD) : {
        . = .;
    } :stack
    .heap 0x300000000 (NOLOAD) : {
        *(.bss*)
    } :prog
    /DISCARD/ : {
        *(.data*)
        *(*.hash)
        *(.eh_frame)
        *(.dynamic)
        *(*.dyn*)
    }
}

main.rs

#![no_main]
#![no_std]

#[panic_handler]
fn panic(_panic: &core::panic::PanicInfo) -> ! {
    loop {}
}

static STRING: &'static str = "Hello World";
static mut HEAP: [u8; 64] = [0u8; 64];

#[no_mangle]
pub extern fn _start() -> *const u8 {
    STRING.as_ptr()
}

#[no_mangle]
pub extern fn get_heap() -> *const u8 {
    unsafe { HEAP.as_ptr() }
}

What works (using only rustc)

SDK=solana/sdk/bpf/dependencies/bpf-tools
$SDK/rust/bin/rustc --target bpfel-unknown-unknown --emit=obj -O -o main.o main.rs
$SDK/llvm/bin/ld.lld --discard-all --nmagic --script=linker_script.ld --format=elf -o main.elf main.o
$SDK/llvm/bin/llvm-objdump -xd main.elf
$SDK/llvm/bin/llvm-readelf --all main.elf

That results in a nice main.elf which looks pretty much as I would want it to.
However, that has the drawback that it is no-std and thus not using our BPF sysroot or any other dependencies for that matter.

What does not work (using cargo)

RUSTFLAGS="--crate-type=staticlib" rustup run bpf cargo build --target bpfel-unknown-unknown --release

The resulting staticlib.a contains all compiled objects as far as I can tell, so all symbols should be resolvable.
But, I could not get lld to link the staticlib.a into one (ELF standalone executable) so far.

@alessandrod I think you were tinkering with the linker script and build process as well, any ideas?

jackcmay · 2022-03-22T18:14:46Z

Looks like you are building the deps/stdlib separately into a static library(s), will that allow the linker to strip unused code in the final ELF? What does this look like from a developer's perspective, do they need to do anything special besides update their Cargo.toml and call cargo build-sbf to pick up new dependencies?

I was thinking that we could build/link similar to what we do now except that we force PIC without ANY dependency on relocation/hash/string/symbol. For syscalls the linker updates the call -1 with a known syscall index that either fixup at load time or translate at runtime.

alessandrod · 2022-03-22T19:43:20Z

I was thinking that we could build/link similar to what we do now except that we force PIC without ANY dependency on relocation/hash/string/symbol. For syscalls the linker updates the call -1 with a known syscall index that either fixup at load time or translate at runtime.

Yep I was thinking something like that, which would require changes to the LLVM target to avoid relocs. Then somehow link in our own custom syscalls.o file which would allow us to implement syscalls with indexes, and either change the rustc target to always go through LTO, or somehow teach lld to (effectively) delete dead code.

Except for the relocs part (kernel relocations are special), this is pretty much how the upstream rust bpf target works btw, where the equivalent of syscalls.o is provided by aya.

alessandrod · 2022-03-23T02:05:19Z

But, I could not get lld to link the staticlib.a into one (ELF standalone executable) so far.

This would require a bit of work in rustc, but I don't think it's necessarily the best approach. I'd leave the crate type as cdylib or bin and then we're pretty much allowed to output anything we want in that case.

Lichtso · 2022-03-23T08:51:59Z

This would require a bit of work in rustc, but I don't think it's necessarily the best approach. I'd leave the crate type as cdylib or bin and then we're pretty much allowed to output anything we want in that case.

Agreed, ideally it would be bin. The syscalls which are essentially unresolved symbols are going to be the problem when we want to get rid of dynamic linking and loading all together. Either, as you two already proposed, have a static mapping of syscalls and virtual addresses or leave them as dynamic relocations. The later might also be ok, because these are not up to the users choosing, we can easily validate them and bail if one does not exactly match.

So far I have only been experimenting with different ways to put the compilation and linking together.
My primary goal was just to get a statically linked BPF ELF, which I have now, so I can start working on the static ELF loader.

Lichtso · 2022-03-23T20:40:13Z

solana-labs/rbpf#290

ant4g0nist · 2022-03-28T20:49:44Z

as part of this, removing the debug headers before deploying would be good. That should get rid of this issue as well #23354

riptl · 2022-04-23T17:10:08Z

Assuming all relocations for syscalls and cross-refs are getting removed, the entrypoint symbol is still going to generate some hash table entries.

Simplifying the entrypoint definition would allow removing symbol hashing logic entirely.

Another way to define an entrypoint is using a new .init section containing executable code. If an .init section is present, the loader could just jump to the first instruction of .init.

We'd be left with only 5 4 sections. For comparison, Rust programs today have 11 sections.

.shstrtab
.init (entrypoint)
.text, .rodata (prog bits) ~~.eh_frame~~ (eh_frame can be removed by disabling unwinding)

Pseudo code of the init section object:

crt0.s:

.section .init$__00
.global _start
_start:
    j entrypoint

Lichtso · 2022-04-23T19:21:39Z

There is actually a special field in the ELF header for the entrypoint.
The only question is if the linker does fill that one out correctly.

https://refspecs.linuxfoundation.org/elf/gabi4+/ch4.eheader.html

e_entry
This member gives the virtual address to which the system first transfers control, thus starting the process. If the file has no associated entry point, this member holds zero.

riptl · 2022-04-23T20:27:52Z

There is actually a special field in the ELF header for the entrypoint.
The only question is if the linker does fill that one out correctly.

Thanks for pointing this out. I'll see if I have any luck with the --entry flag:
https://manpages.debian.org/experimental/lld-10/ld.lld-10.1.en.html#entry

alessandrod · 2022-04-24T00:16:55Z

The linker is setting the entry field correctly (or pretty much nothing would work)

Lichtso mentioned this issue Mar 25, 2022

New ELF parser solana-labs/rbpf#290

Closed

pgarg66 added the runtime Issues related to runtime, BPF, and LLVM label Oct 26, 2022

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Oct 27, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 3, 2023

ellttBen mentioned this issue May 8, 2024

Fix writing ELF section for sbf targets neodyme-labs/solana-security-txt#48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ELF loading may not be necessary and increases complexity and vulnerabilities #23620

ELF loading may not be necessary and increases complexity and vulnerabilities #23620

jackcmay commented Mar 11, 2022

jackcmay commented Mar 11, 2022

dmakarov commented Mar 11, 2022

jackcmay commented Mar 11, 2022

dmakarov commented Mar 11, 2022

jackcmay commented Mar 11, 2022

CherryWorm commented Mar 12, 2022

alessandrod commented Mar 14, 2022

jackcmay commented Mar 14, 2022

dmakarov commented Mar 16, 2022

Lichtso commented Mar 17, 2022

Lichtso commented Mar 22, 2022 •

edited

Loading

jackcmay commented Mar 22, 2022

alessandrod commented Mar 22, 2022

alessandrod commented Mar 23, 2022

Lichtso commented Mar 23, 2022

Lichtso commented Mar 23, 2022

ant4g0nist commented Mar 28, 2022

riptl commented Apr 23, 2022 •

edited

Loading

Lichtso commented Apr 23, 2022

riptl commented Apr 23, 2022

alessandrod commented Apr 24, 2022

ELF loading may not be necessary and increases complexity and vulnerabilities #23620

ELF loading may not be necessary and increases complexity and vulnerabilities #23620

Comments

jackcmay commented Mar 11, 2022

Problem

Proposed Solution

jackcmay commented Mar 11, 2022

dmakarov commented Mar 11, 2022

jackcmay commented Mar 11, 2022

dmakarov commented Mar 11, 2022

jackcmay commented Mar 11, 2022

CherryWorm commented Mar 12, 2022

alessandrod commented Mar 14, 2022

jackcmay commented Mar 14, 2022

dmakarov commented Mar 16, 2022

Lichtso commented Mar 17, 2022

Lichtso commented Mar 22, 2022 • edited Loading

linker_script.ld

main.rs

What works (using only rustc)

What does not work (using cargo)

jackcmay commented Mar 22, 2022

alessandrod commented Mar 22, 2022

alessandrod commented Mar 23, 2022

Lichtso commented Mar 23, 2022

Lichtso commented Mar 23, 2022

ant4g0nist commented Mar 28, 2022

riptl commented Apr 23, 2022 • edited Loading

Lichtso commented Apr 23, 2022

riptl commented Apr 23, 2022

alessandrod commented Apr 24, 2022

Lichtso commented Mar 22, 2022 •

edited

Loading

riptl commented Apr 23, 2022 •

edited

Loading