Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELF loading may not be necessary and increases complexity and vulnerabilities #23620

Closed
jackcmay opened this issue Mar 11, 2022 · 21 comments
Closed
Labels
runtime Issues related to runtime, BPF, and LLVM stale [bot only] Added to stale content; results in auto-close after a week.

Comments

@jackcmay
Copy link
Contributor

Problem

ELF loading may not be necessary and increases complexity and vulnerabilities

Solana programs are linked as ELF files which are loaded by the runtime. ELF was chosen because it is a known and common executable format and there were future intentions of runtime linking with other support libraries, etc...

ELF loading consumes compute time and expands the attack surface considerably. These factors could be significantly reduced by using a use-case focused and simplified format.

Proposed Solution

Solana has a fairly constrained and simple runtime environment and the following points lend it to a simpler executable

  • Fixed and known memory map
  • Known set of external linkable symbols
  • No read-write data

Because of this, a Solana executable could instead be as simple as:

  • Header
  • Position-independent code bits
    • Fully resolved except for syscalls
  • RO data

Doing so would eliminate:

  • ELF verification complexity
  • ELF symbol and hash tables and the processing of
  • ELF relocation tables and the processing of
  • Read-only executable code modification by relocations
  • Dependency on 3rd party ELF parser

Changing to the new executable format could be done independently of SBFv2 as it is a change to the executable format and not the bytecode format. A suitable off-the-shelf format could be used but since the new format is so simple it should probably be a new format, maybe Solana Executable Format (SEF).

@jackcmay
Copy link
Contributor Author

@Lichtso and I were discussing this, @alessandrod @dmakarov What do you guys think, if in favor can you two pick it up?

@dmakarov
Copy link
Contributor

ELF is also a container for debug information. I guess we could still generate both ELF for debugging purposes and somewhat raw binary for loading by the runtime loader.

@jackcmay
Copy link
Contributor Author

Good point, the ELF and SEF would have to align vaddr wise in order for the debug info to be useful. Maybe we can optionally add the dwarf to the SEF?

@dmakarov
Copy link
Contributor

I worked on a system where we loaded a raw binary into a simulator or on a physical board, but used a separate ELF file to load into lldb. We had a debug server that communicated with the simulator or the board, and it worked quite well. The debug server knew how to map the symbols to addresses when it interacted with the board. I think it shouldn't be necessary to complicate our custom binary format with debug info, as long as we can still generate an elf file with the same code.

@jackcmay
Copy link
Contributor Author

Sure, same code and same memory layout

@CherryWorm
Copy link
Contributor

We're currently working on a security.txt like standard that currently uses a custom ELF section in https://github.com/neodyme-labs/solana-security-txt . It still works without that section because the actualy data is stored in .ro, so this proposal wouldn't break anything, but it would be awesome if this new format would allow for custom key-value attributes like name, project link, security policy and so on

@alessandrod
Copy link
Contributor

Doing so would eliminate:

* ELF verification complexity

* ELF symbol and hash tables and the processing of

* ELF relocation tables and the processing of

* Read-only executable code modification by relocations

* Dependency on 3rd party ELF parser

Wouldn't we be able to achieve all this, keeping ELF but switching to emitting statically linked object files instead of shared objects like we do now? And as we've been discussing we could write a very simple ELF parser and stop using goblin.

I think the reason for keeping ELF would be ecosystem tooling. Right now you can use objdump, readelf, ghidra, r2, cargo-bloat, elfcat, etc to debug profile and optimize binaries (and I regularly use all of those). With a new format we'd break all those.

Also while starting from scratch is nice, I'm worried that we'd start simple, then add one extra thing at a time (DWARF, also see @CherryWorm's comment already) and slowly end up with frankenstein-ELF with another name.

@jackcmay
Copy link
Contributor Author

Yeah, good points, I do like the tooling support for ELFs.

The question is with ELF can we do the following for executables (debug versions would have string/symbol/dwarf):

  • PI code with no relocations, just unresolved syscall symbols and even those we could use call x where x is a unique syscall id.
  • No need for hash or symbol/string tables, want just a header, code bits, and RO bits.
  • We could support cherryworm's suggestion by continuing to allow note sections

@dmakarov
Copy link
Contributor

Also while starting from scratch is nice, I'm worried that we'd start simple, then add one extra thing at a time (DWARF, also see @CherryWorm's comment already) and slowly end up with frankenstein-ELF with another name.

I share these concerns. The tooling for working BPF programs isn't good now, but with the custom or raw binary format it will be next to non-existent. Adding more and more information to a raw binary we'll end up with a bad substitute for ELF.

@Lichtso
Copy link
Contributor

Lichtso commented Mar 17, 2022

I am also fine with sticking with a minimal subset of ELF.
That is we support only a very small selected subset of its features and make our own parser.
Because maintaining the dependency chain of the current ELF parser is already killing us.

@Lichtso
Copy link
Contributor

Lichtso commented Mar 22, 2022

Have been working on generating static linked executables with our BPF toolchain today.

linker_script.ld

PHDRS {
    /*header PT_PHDR FILEHDR PHDRS;*/
    prog PT_LOAD FLAGS(5);
    stack PT_GNU_STACK FLAGS(6);
}

SECTIONS {
    .text 0x100000000 : {
        *(.text*)
    } :prog
    .rodata : {
        *(.rodata*)
    } :prog
    .stack 0x200000000 (NOLOAD) : {
        . = .;
    } :stack
    .heap 0x300000000 (NOLOAD) : {
        *(.bss*)
    } :prog
    /DISCARD/ : {
        *(.data*)
        *(*.hash)
        *(.eh_frame)
        *(.dynamic)
        *(*.dyn*)
    }
}

main.rs

#![no_main]
#![no_std]

#[panic_handler]
fn panic(_panic: &core::panic::PanicInfo) -> ! {
    loop {}
}

static STRING: &'static str = "Hello World";
static mut HEAP: [u8; 64] = [0u8; 64];

#[no_mangle]
pub extern fn _start() -> *const u8 {
    STRING.as_ptr()
}

#[no_mangle]
pub extern fn get_heap() -> *const u8 {
    unsafe { HEAP.as_ptr() }
}

What works (using only rustc)

SDK=solana/sdk/bpf/dependencies/bpf-tools
$SDK/rust/bin/rustc --target bpfel-unknown-unknown --emit=obj -O -o main.o main.rs
$SDK/llvm/bin/ld.lld --discard-all --nmagic --script=linker_script.ld --format=elf -o main.elf main.o
$SDK/llvm/bin/llvm-objdump -xd main.elf
$SDK/llvm/bin/llvm-readelf --all main.elf

That results in a nice main.elf which looks pretty much as I would want it to.
However, that has the drawback that it is no-std and thus not using our BPF sysroot or any other dependencies for that matter.

What does not work (using cargo)

RUSTFLAGS="--crate-type=staticlib" rustup run bpf cargo build --target bpfel-unknown-unknown --release

The resulting staticlib.a contains all compiled objects as far as I can tell, so all symbols should be resolvable.
But, I could not get lld to link the staticlib.a into one (ELF standalone executable) so far.

@alessandrod I think you were tinkering with the linker script and build process as well, any ideas?

@jackcmay
Copy link
Contributor Author

Looks like you are building the deps/stdlib separately into a static library(s), will that allow the linker to strip unused code in the final ELF? What does this look like from a developer's perspective, do they need to do anything special besides update their Cargo.toml and call cargo build-sbf to pick up new dependencies?

I was thinking that we could build/link similar to what we do now except that we force PIC without ANY dependency on relocation/hash/string/symbol. For syscalls the linker updates the call -1 with a known syscall index that either fixup at load time or translate at runtime.

@alessandrod
Copy link
Contributor

I was thinking that we could build/link similar to what we do now except that we force PIC without ANY dependency on relocation/hash/string/symbol. For syscalls the linker updates the call -1 with a known syscall index that either fixup at load time or translate at runtime.

Yep I was thinking something like that, which would require changes to the LLVM target to avoid relocs. Then somehow link in our own custom syscalls.o file which would allow us to implement syscalls with indexes, and either change the rustc target to always go through LTO, or somehow teach lld to (effectively) delete dead code.

Except for the relocs part (kernel relocations are special), this is pretty much how the upstream rust bpf target works btw, where the equivalent of syscalls.o is provided by aya.

@alessandrod
Copy link
Contributor

But, I could not get lld to link the staticlib.a into one (ELF standalone executable) so far.

This would require a bit of work in rustc, but I don't think it's necessarily the best approach. I'd leave the crate type as cdylib or bin and then we're pretty much allowed to output anything we want in that case.

@Lichtso
Copy link
Contributor

Lichtso commented Mar 23, 2022

This would require a bit of work in rustc, but I don't think it's necessarily the best approach. I'd leave the crate type as cdylib or bin and then we're pretty much allowed to output anything we want in that case.

Agreed, ideally it would be bin. The syscalls which are essentially unresolved symbols are going to be the problem when we want to get rid of dynamic linking and loading all together. Either, as you two already proposed, have a static mapping of syscalls and virtual addresses or leave them as dynamic relocations. The later might also be ok, because these are not up to the users choosing, we can easily validate them and bail if one does not exactly match.


So far I have only been experimenting with different ways to put the compilation and linking together.
My primary goal was just to get a statically linked BPF ELF, which I have now, so I can start working on the static ELF loader.

@Lichtso
Copy link
Contributor

Lichtso commented Mar 23, 2022

solana-labs/rbpf#290

@ant4g0nist
Copy link

as part of this, removing the debug headers before deploying would be good. That should get rid of this issue as well #23354

@riptl
Copy link
Contributor

riptl commented Apr 23, 2022

Assuming all relocations for syscalls and cross-refs are getting removed, the entrypoint symbol is still going to generate some hash table entries.

Simplifying the entrypoint definition would allow removing symbol hashing logic entirely.

Another way to define an entrypoint is using a new .init section containing executable code. If an .init section is present, the loader could just jump to the first instruction of .init.

We'd be left with only 5 4 sections. For comparison, Rust programs today have 11 sections.

  • .shstrtab
  • .init (entrypoint)
  • .text, .rodata (prog bits) .eh_frame (eh_frame can be removed by disabling unwinding)

Pseudo code of the init section object:

crt0.s:

.section .init$__00
.global _start
_start:
    j entrypoint

@Lichtso
Copy link
Contributor

Lichtso commented Apr 23, 2022

There is actually a special field in the ELF header for the entrypoint.
The only question is if the linker does fill that one out correctly.

https://refspecs.linuxfoundation.org/elf/gabi4+/ch4.eheader.html

e_entry
This member gives the virtual address to which the system first transfers control, thus starting the process. If the file has no associated entry point, this member holds zero.

@riptl
Copy link
Contributor

riptl commented Apr 23, 2022

There is actually a special field in the ELF header for the entrypoint.
The only question is if the linker does fill that one out correctly.

Thanks for pointing this out. I'll see if I have any luck with the --entry flag:
https://manpages.debian.org/experimental/lld-10/ld.lld-10.1.en.html#entry

@alessandrod
Copy link
Contributor

The linker is setting the entry field correctly (or pretty much nothing would work)

@pgarg66 pgarg66 added the runtime Issues related to runtime, BPF, and LLVM label Oct 26, 2022
@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Oct 27, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
runtime Issues related to runtime, BPF, and LLVM stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

No branches or pull requests

8 participants