Skip to content

Conversation

joshlf
Copy link

@joshlf joshlf commented Jul 11, 2025

Replace a number of lines of unsafe code with safe equivalents, some using zerocopy. In one instance, fix a soundness hole (#720).

Closes #720

@joshlf joshlf force-pushed the remove-unsafe branch 5 times, most recently from c46b513 to 745b847 Compare July 11, 2025 19:00
@ChrisDenton
Copy link
Member

Adding a dependency on zerocopy means that zerocopy becomes a dependency of std.

That might be fine! But it is something that will need to be considered.

Cargo.toml Outdated
[dependencies]
cfg-if = "1.0"
rustc-demangle = "0.1.24"
zerocopy = { version = "0.8.26", features = ["derive"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this won't work. The standard library can't depend on proc-macros without completely breaking cross-compilation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given #720 (comment), I'll need to go back to the drawing board on this anyway. I'll see how much of this I can salvage without relying on derives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the definitions are auto-generated, it is still possible to add an impl block outside of that file. It's all crate-local anyway. I dunno if that helps you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't really help with the safety - the point of the derives is the analysis they perform which validates that certain safety properties hold. While it would technically work to write the impls by hand, it wouldn't improve the safety situation.

@joshlf
Copy link
Author

joshlf commented Jul 11, 2025

Okay, I've removed all uses of derives to address #721 (comment) and #720 (comment).

@joshlf
Copy link
Author

joshlf commented Aug 27, 2025

Friendly ping on this 🙂

@joshlf
Copy link
Author

joshlf commented Aug 27, 2025

Adding a dependency on zerocopy means that zerocopy becomes a dependency of std.

That might be fine! But it is something that will need to be considered.

Oh sorry I should have mentioned this before: zerocopy may already be a dependency of std. It appears in Rust's Cargo.lock, although I haven't traced it yet to figure out where that dep comes from.

@philipc
Copy link
Contributor

philipc commented Aug 28, 2025

I think std dependencies are in the library Cargo.lock. And probably zerocopy would need the rustc-dep-of-std hack too.

@joshlf
Copy link
Author

joshlf commented Aug 28, 2025

I think std dependencies are in the library Cargo.lock. And probably zerocopy would need the rustc-dep-of-std hack too.

Ah gotcha. Well we'd be happy to add that hack if need be - if y'all decide that this PR is good to merge, then we can add it.


let hdr: *const [u32; 3] = hdr;
// SAFETY: `[u32; 3]` and `Elf_Nhdr` have the same layout.
Some(unsafe { &*hdr.cast::<Elf_Nhdr>() })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fix anything, or only replace one unsafe usage with another?

If it fixes something, how simple would it be to fix that without the dependency?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fix anything

Yeah, it fixes a soundness hole in the original – in particular, that alignment isn't checked. This comment in the original is misleading:

    // Note that sice_of::<Elf_Nhdr>() is always 4-byte aligned.
    *bytes = &bytes[size_of::<Elf_Nhdr>()..];

While it's true that size_of::<Elf_Nhdr>() is a multiple of 4, there's no guarantee about the alignment of bytes itself, and so &bytes[size_of::<Elf_Nhdr>()..] is similarly not guaranteed to be aligned. As far as I can tell (from reading this git blame on this file), that's just an oversight that dates back to when this code was first written.

...or only replace one unsafe usage with another?

The unsafe usage in the new version is required so long as Elf_Nhdr doesn't implement some zerocopy traits. Implementing them would require a proc macro derive, which it sounds like isn't an option.

That said, the new unsafe usage is simpler than the old one - it only relies on [u32; 3] and Elf_Nhdr having the same layout and bit validity, while the preceding ref_from_prefix is doing the heavy lifting of making sure that the &[u8] to &[u32; 3] cast doesn't violate bit validity and performs runtime validation of size and alignment.

If it fixes something, how simple would it be to fix that without the dependency?

It wouldn't be hard in principle. The concern is more that it'd be brittle - it would be easy for a future developer who doesn't understand the subtleties we're discussing here to re-introduce a soundness bug. IMO the biggest clue to this is the "sice_of::<Elf_Nhdr>() is always 4-byte aligned" comment, which is patently incorrect and yet made it into the codebase and past code review. I won't call anyone out here by name, but if you git blame this file, you'll see that it was written by a very experienced Rust programmer (and one whom I personally hold in high regard) – if an experienced programmer could make this mistake, then presumably it's an easy and subtle mistake to make.

Copy link
Contributor

@philipc philipc Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the biggest clue to this is the "sice_of::<Elf_Nhdr>() is always 4-byte aligned" comment, which is patently incorrect and yet made it into the codebase and past code review

I don't think that comment is so badly wrong. It could be clearer, but I take that comment to mean that it preserves the 4-byte alignment required by ELF, which is what is important for that line.

The thing that is wrong is that the function doesn't check the 4-byte alignment to start with, and the comment prior to this function thinks that this requirement is optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backtrace's maintenance is not idyllic. If it was anyone's first priority, maybe I would be paid to do it, yet I'm not. Even experienced programmers do not fire on all cylinders at all times. Certainly not all maintainers are willing to be insufferable exacting pedants, with borderline-antisocial disregard for whether someone needs the code to be merged for someone's job requirements, instead caring only about the code quality and correctness. As you seem to want the code to be reviewed by that kind of demon, here I am! But I have not retrospectively audited every last line of every last platform, to be sure. Improvements to their condition are certainly welcome.

Based on the conversation so far, it seems the soundness requirements have not been properly documented. Certainly, I am not yet convinced the current version of the code as present in this PR documents them. I would like to have the requirements properly documented first, so for all comments to be updated appropriately in this file, at the very least.

cfg-if = "1.0"
rustc-demangle = "0.1.21"
libc = { version = "0.2.156", default-features = false }
zerocopy = "0.8.26"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If kept, this could be restricted to target_os = "fuchsia".

Copy link
Member

@bjorn3 bjorn3 Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you please check that zerocopy-derive doesn't end up in the lockfile? If it does end up in there, that will cause a tidy lint checking that the standard library doesn't depend on proc-macros to fail.

Edit: It does end up in the lockfile.

Replace a number of lines of unsafe code with safe equivalents, some
using `zerocopy`. In one instance, fix a soundness hole (rust-lang#720).

Closes rust-lang#720

let hdr: *const [u32; 3] = hdr;
// SAFETY: `[u32; 3]` and `Elf_Nhdr` have the same layout and bit validity.
Some(unsafe { &*hdr.cast::<Elf_Nhdr>() })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backtrace crate already indirectly depends on object, right? Maybe you could use the object::elf::NoteHeader32/64 type instead and then use object::pod::from_bytes() instead of the transmute?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems easy enough, see #725. object is already a direct dependency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, TIL about object::pod.

@workingjubilee
Copy link
Member

The number of dependencies that we already have, due to the hacky way that backtrace is directly transcluded into std, causes significant problems. I do not intend to add more unless it is important that the implementation is encapsulated. One possible reason would be if it concerns something not directly relevant to the crate. Conversely, unsafe code in this crate, especially in key paths such as symbol resolution, must be able to justify itself in terms of itself, otherwise it interferes with auditing this code.

I arrive at the conclusion that it is contradictory to backtrace's goals to include zerocopy as a dependency, as it reduces the ability to audit the unsafe for future auditors.

@workingjubilee
Copy link
Member

workingjubilee commented Sep 23, 2025

...also I seriously don't want to deal with the multiple bugs that will almost certainly arise from internals of the standard library leaking out again due to diagnostics or trait solver quirks, nor ripping open your crate and making sure it is amenable to inclusion in the standard library by doing what rearchitecture of it is necessary.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the vector as you have described this, @joshlf, but I have not studied this in quite as much detail as you, yet. You say "soundness hole". I don't think this phrase has a standard meaning. It is not clear to me whether you mean

  • an active concern: a problem that is already hit by current private usages or in public API, i.e. the code is already UB
  • a latent concern: not reached currently due to some other code blocking the way, but may achieve UB with future changes (e.g. making the functions public API, even if they documented their apparent contracts)

Which is it? Or is it indeterminate because that requires knowing the implementation of external functions?

You also have melded other changes to Fuchsia-related internals with a change to the Windows-related internals. If anything is "pure cleanup", I would like it to not be in the same commit as a commit that deals with a change that we think is important for soundness, unless it happens to touch the very same code and so would conflict with it anyways.

Comment on lines +40 to -42
let mut map: *const LinkMap = core::ptr::null();
let map = unsafe {
let mut map: *const LinkMap = mem::zeroed();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do find this an improvement, this is pure style. All changes like this, that only improve the readability, or otherwise "simply" replace unsafe-but-sound code with safe code, should not be in the same commit, or possibly even same PR, that says "fixes a soundness hole". I want the code in the commit that says "fixes a soundness hole" to be essential for fixing a soundness hole.

Consider it a parallel to my other "TCB"-related concerns, albeit distributed throughout history. Ignorable changes should be ignorable, and non-ignorable ones should be non-ignorable, throughout.


let hdr: *const [u32; 3] = hdr;
// SAFETY: `[u32; 3]` and `Elf_Nhdr` have the same layout.
Some(unsafe { &*hdr.cast::<Elf_Nhdr>() })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backtrace's maintenance is not idyllic. If it was anyone's first priority, maybe I would be paid to do it, yet I'm not. Even experienced programmers do not fire on all cylinders at all times. Certainly not all maintainers are willing to be insufferable exacting pedants, with borderline-antisocial disregard for whether someone needs the code to be merged for someone's job requirements, instead caring only about the code quality and correctness. As you seem to want the code to be reviewed by that kind of demon, here I am! But I have not retrospectively audited every last line of every last platform, to be sure. Improvements to their condition are certainly welcome.

Based on the conversation so far, it seems the soundness requirements have not been properly documented. Certainly, I am not yet convinced the current version of the code as present in this PR documents them. I would like to have the requirements properly documented first, so for all comments to be updated appropriately in this file, at the very least.

// Note that sice_of::<Elf_Nhdr>() is always 4-byte aligned.
*bytes = &bytes[size_of::<Elf_Nhdr>()..];
Some(out)
let (hdr, suffix) = <[u32; 3]>::ref_from_prefix(*bytes).ok()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not deeply familiar with the internals of zerocopy and its API surface, so I am afraid adding its API surface does not improve the ease of verifying the code correctness for me. I know you work very hard on it, but it is very general, whereas backtrace-rs only has one task. Well, two, as symbolization and recovering a trace are different, but it tends to collapse them, for better or worse.

Fixing this soundness problem here by expanding the trusted computing base (TCB) of backtrace means now someone has to verify zerocopy. It may improve things if the zerocopy inclusion was done systematically, but here it is only as a spot-fix, and does not address other locations in the code which could instead use zerocopy's APIs. I would like to separate out discussion of using a new dependency from any such fixes for any soundness problems, and certainly to separate soundness problems from any cleanup.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am likely to accept #725 instead of the Fuchsia changes here, as it seems minimal and complete.

@workingjubilee
Copy link
Member

workingjubilee commented Sep 24, 2025

Implementing #735's cleanups brought me face-to-face with a bunch of the Windows code and added more context to the problems here. Was there a reason you rejected just creating the original data using MaybeUninit, so that there is no time the data is expected to be initialized on Rust's end, except for specific fields we write and read? It seems like a more guaranteed-correct change.

@workingjubilee
Copy link
Member

Diffs to the Cargo.lock from other PRs seem to be blocking this anyways.

@workingjubilee
Copy link
Member

workingjubilee commented Sep 25, 2025

I'm sorry, but I think that #737 is a better solution for the Windows problem.

This PR seems to assume that SymFromAddrW will initialize all fields, and also initialize the full trailing size and not deinitialize it. But we don't control the definition of SymFromAddrW or the way it writes data, and its contract is only to write data to the fields of that struct, and to write up to the lower of the MaxNameLen we give it and the NameLen it leaves behind. Except even that is not quite right, as that doesn't mean it can't, say, write uninit data to the remaining bytes.

While technically no LTO is strong enough to propagate that knowledge all the way from the depths of the Win32 API into Rust binaries we compile, I still would prefer a solution that I know is sound rather than one that is merely adequate because the compiler isn't watching. And explicitly handling each field via raw pointers is something I know to be sound. Unsafety intrudes into the rest of the code but is fairly minor, and it places the burden in the right places.

As I have already accepted #725 for the Fuchsia problem, I will be closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Soundess bug: &mut reference exposes uninitialized bytes
5 participants