Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slice fill_explicit #113

Closed
chorman0773 opened this issue Sep 28, 2022 · 18 comments
Closed

Slice fill_explicit #113

chorman0773 opened this issue Sep 28, 2022 · 18 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@chorman0773
Copy link

chorman0773 commented Sep 28, 2022

Proposal

Problem statement

This adds a fill_explicit function, which allows a slice of T to be cleared, which the guarantee or strong recommendation that implementations do not elide any store. This is useful in secure code where cleaning secrets from memory is necessary, and failing to do so by accident or by malicious compiler optimization.

Motivation, use-cases

The primary motivation is secure/cryptographic code. Specific crates that could benefit would be the zeroize crate (as it could replace use of write_volatile), and any rust implementation of cryptographic algorithms, especially ciphers.

Solution sketches

Two apis are proposed, one for mutable references to slices (mirroring the [T]::set function) and one for pointers (mirroring the core::ptr::write_bytes function). Both are suggested together, though if not both is accepted, the strong recommendation is the pointer one. The functions are not the same as write_volatile or read_volatile (in particular, it should be permissible to reorder them relative to unrelated side effects) and are not suited for use on MMIO.
The proposed APIs are

// in core::slice
impl<T> [T]{
    /// Fills the slice with copies of `T`. 
    /// The implementation shall ensure that calls to this function are not removed
    pub fn fill_explicit(&mut self, value: T) where T: Copy;
}

// in core::ptr
/// Writes `len*size_of::<T>()` bytes to p
/// The implementation shall ensure that calls to this function are not removed
pub unsafe fn write_bytes_explicit<T>(p: *mut T, b: u8, len: usize);

impl<T> *mut T{
    /// Writes `len*size_of::<T>()` bytes to p
    /// The implementation shall ensure that calls to this function are not removed
     pub unsafe fn write_bytes_explicit(self, b: u8, count: usize);
}

Implementation wise, the functions are quite flexible. A naive, pure rust, implementation could simply use a loop to write the bytes/values. write_bytes_explicit could also be vectorized in pure rust (either using existing SIMD intrinsics, or merely a large type). A more efficient implementation could call a compiler-specific intrinsic, but this proposal does not specify such an intrinsic. The most basic implementation of fill_explict is:

pub fn fill_explicit(&mut self, val: T) where T: Copy{
    for v in self{
        unsafe{(v as *mut T).write_volatile(val)}
    }
}

As an alternative solution, fill_explicit could have a signature that takes a u8 (if also declared unsafe), which acts as a slice version of write_bytes_explicit. This would be relatively less useful than the current signature, as slice.fill_bytes_explict(0u8) could be written as let len = slice.len(); core::ptr::write_bytes_explicit(slice.as_mut_ptr(),0u8,len).

The APIs above require an implementation not remove them. As a relaxation, it could be similar in nature to core::hint::black_box, where removing the function is a QoI issue, rather than a compliance issue. I would expect that there would be limited to no distinction, as, unlike black_box, it seems difficult to rely on for soundness and any implementation that removed the call would most likely have a bug filed within weeks or days if not hours or minutes. No further requirements are placed on the functions, but it should be recommended that no copies of the data that are made for w/e reason persist beyond the return of the function.

Links and related work

C paper (accepted in C23): N2897.

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

An ACP was chosen rather than an RFC because a pure rust implementation exists (and use of intrinsics)

@chorman0773 chorman0773 added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Sep 28, 2022
@chorman0773 chorman0773 changed the title Slice set_explicit Slice fill_explicit Sep 28, 2022
@the8472
Copy link
Member

the8472 commented Sep 28, 2022

I think the compiler-team should give their ok whether write_volatile provides the necessary behavior here as a baseline implementation. My understanding is that volatile only guarantees the absence of reorderings relative to other volatile operations.
If that is the case then the compiler could reorder fill_explicit before the operation we try to erase, thus rendering it unfit for its stated purpose. So some additional optimization barrier or an intrinsic with stronger guarantees is needed.

@chorman0773
Copy link
Author

chorman0773 commented Sep 28, 2022 via email

@the8472
Copy link
Member

the8472 commented Sep 28, 2022

There's an open RFC which should provide the necessary semantics. rust-lang/rfcs#3301

@programmerjake
Copy link
Member

There's an open RFC which should provide the necessary semantics. rust-lang/rfcs#3301

atomic memcpy can't be used here because it can be dead-store eliminated.

@programmerjake
Copy link
Member

it would be useful to also have a function that takes a *mut T or *mut [T] rather than just &mut [T]

@the8472
Copy link
Member

the8472 commented Sep 28, 2022

Ah my bad, I thought it was an atomic volatile memcpy.

@programmerjake
Copy link
Member

one possible implementation is to follow a memcpy with asm!("", ptr = in(reg) ptr), since the compiler has to assume the assembly has side effects and reads the results of the memcpy, preventing it from being eliminated.

@chorman0773
Copy link
Author

it would be useful to also have a function that takes a *mut T or *mut [T] rather than just &mut [T]

The former was write_bytes_explicit, though a write_explicit might be useful as well.

one possible implementation is to follow a memcpy with asm!("", ptr = in(reg) ptr), since the compiler has to assume the assembly has side effects and reads the results of the memcpy, preventing it from being eliminated.

Actually, I'm wondering if that's the case. The reason the compiler can't assume the contents of the asm is SMC, but if the compiler knows that there can't be any SMC (which it could trivially here, because the asm generates no code that could be modified), it could possibly eliminate it under as-if. I'm just speculating here, though.

@cuviper
Copy link
Member

cuviper commented Sep 28, 2022

Should we worry about BOLT defeating this? (e.g. dead store elimination on the binary)

@programmerjake
Copy link
Member

one possible implementation is to follow a memcpy with asm!("", ptr = in(reg) ptr), since the compiler has to assume the assembly has side effects and reads the results of the memcpy, preventing it from being eliminated.

Actually, I'm wondering if that's the case. The reason the compiler can't assume the contents of the asm is SMC, but if the compiler knows that there can't be any SMC (which it could trivially here, because the asm generates no code that could be modified), it could possibly eliminate it under as-if. I'm just speculating here, though.

the compiler docs specify that the compiler can't look at the assembly string and assume any behavior from that, it has to treat it as opaque. this means the compiler can't eliminate it as dead code purely because the assembly string is empty:
https://doc.rust-lang.org/reference/inline-assembly.html#rules-for-inline-assembly

  • The compiler cannot assume that the instructions in the asm are the ones that will actually end up executed.
    • This effectively means that the compiler must treat the asm! as a black box and only take the interface specification into account, not the instructions themselves.

@chorman0773
Copy link
Author

The compiler cannot assume that the instructions in the asm are the ones that will actually end up executed.

The compiler can surely assume the number of bytes between its own code, and thus can assume that a span of zero instructions would never become longer than zero. I certainly assume I can compute the offset between any two XIR instructions at codegen time.

@programmerjake
Copy link
Member

The compiler cannot assume that the instructions in the asm are the ones that will actually end up executed.

The compiler can surely assume the number of bytes between its own code, and thus can assume that a span of zero instructions would never become longer than zero.

well, zero bytes doesn't necessarily mean zero instructions -- the cpu could be set to trap at that address and do something.

that said, if you don't buy the above argument, you could easily put a nop in the asm!

@thomcc
Copy link
Member

thomcc commented Sep 29, 2022

Should we worry about BOLT defeating this? (e.g. dead store elimination on the binary)

Does BOLT do that? I thought it was about binary layout... Assuming it does that kind of thing, how does anybody use it in secure code?

Also, do we invoke BOLT on the user's behalf (or rather, is this something we eventually hope to do), or is it something they have to invoke manually -- if they're invoking it manually on rustc output, then it seems fine, since it's entirely out of our hands then, but if not...

@cuviper
Copy link
Member

cuviper commented Sep 30, 2022

Does BOLT do that? I thought it was about binary layout...

It's more than that: Table 1 in the CGO'19 paper shows the optimizations at the time. DSE was not there yet, but maybe it could be done someday -- it seems possible. Maybe a hardware barrier or flush would prevent that (and might be a good idea anyway).

Also, do we invoke BOLT on the user's behalf

No, not yet, but rust-lang/rust#94381 and rust-lang/rust#102487 are trying it on the compiler itself.

@chorman0773
Copy link
Author

Depending on what it does, that seems like it might have problems with volatile as well. If it's removing write_volatiles, then that would be a violation of the specification of write_volatile.

@cuviper
Copy link
Member

cuviper commented Sep 30, 2022

Maybe stuff like write_volatile are reasons it wouldn't do this -- I admit I'm totally speculating.

@Kobzol
Copy link

Kobzol commented Oct 5, 2022

I asked on the LLVM Discord about this, this is the response from maksfb, one of the BOLT maintainers:

maksfb: @Kobzol, are the stores happening to the stack or to the global (heap) memory?
maksfb: Currently BOLT wouldn't eliminate any. In the future it may eliminate local ones, but unlikely by default. Eliminating stores to the global memory is close to impossible without some heavy assumptions about the program (unless it's tiny/simple).

@Amanieu
Copy link
Member

Amanieu commented May 19, 2023

We discussed this in the libs meeting today: we can't make these provide any hard guarantees, that's not possible with the current semantics of the Rust language. However we are happy to add these as optimizer hints in the same vein as black_box in the core::hint module. These would be free functions rather than methods on slices and pointers, to avoid confusion from users (fill_explicit is not very explicit about what it does).

@Amanieu Amanieu closed this as completed May 19, 2023
@the8472 the8472 added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

7 participants