-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: [T]::rejoin #2806
RFC: [T]::rejoin #2806
Conversation
This API has been implemented in several third party crates and discovered to be unsound repeatedly. See rust-lang/rust#62765 for reasons why and links to several past occurrences. @HeroicKatora proposed an alternative API (and implemented it in an unreleased branch of str-concat) which they believe addresses the problem, though:
|
Thanks for pointing those out. I mentioned about the soundness concerns in drawbacks, and so it's as I feared, unsound. The question I was wondering about while writing that, however is whether LLVM supports any way of fixing the situation, such as creating new pointers that would conceptually consider the joined object as one object. It seems however, that LLVM is very tightly geared for C-like memory models so they won't support anything like that? Is the main thing for forbidding all the out-of-bounds accesses the aliasing analysis? It seems unfortunate that even if we have lifetimes and sound references in Rust, we can't still have a function that takes in two references, returns one and say: "this reference may alias BOTH of the input references". It sounds like it could be possible in theory, although not maybe feasible with what LLVM supports. I found this link from the linked thread to be an informative read so I'm linking it here: |
Adding the lang team as well for now due to the soundness concern. Is the soundness issue here one of just interactions with LLVM or something deeper? If the latter is the case perhaps some formal proofs would be in order. |
This API is unimplementable in C, LLVM and Rust. In all of these languages, array/slice accesses have "inbounds" semantics, meaning the computation of the array element address must stay within the bounds of the same allocation. It is impossible to check at run-time if two neighboring pointers are in the same allocation or not. This is a property the informal specs of LLVM and C as well as every single formal model of LLVM and C that I am aware of have in common -- it's about as much consensus as we will get for these languages. The only way I know to implement this is to change array/slice indexing. In this formal model of LLVM, one can do cross-allocation arithmetic by casting the ptr to an int, doing the arithmetic there, and casting back. Unfortunately the C and LLVM specs are too vague around integer-pointer casts to tell if that is indeed intended to work, and LLVM is known to have bugs in this area that could break this in practice. Also see bluss/odds#25.
I would say it is deeper. The proposed API is, as far as I can tell, fundamentally incompatible with our requirement that arrays/slices be contained within a single allocation. |
let self_len = self.len(); | ||
let self_end = self[self_len..].as_ptr(); | ||
if core::ptr::eq(self_end, other.as_ptr()) { | ||
Some(unsafe { core::slice::from_raw_parts(self.as_ptr(), self.len() + other.len()) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the unsoundness: from_raw_parts
requires the entire range to be within a single allocation, which your check does not ensure (the two allocations could be situated right next to each other).
We should probably clarify the docs.
From the implementing PR rust-lang/rust#66088:
I agree we need better docs. As a first step, we should improve the Beyond that, I do not think we have precedent for adding an unsafe function to libstd just for the purpose of documenting that this cannot be done safely. What are you proposing the name, API and docs of that function to be? |
Basically, the same as the the |
Submitted docs PR as: rust-lang/rust#66111
That is correct. But then I wonder, what is the use-case for this? The function you proposed, and that can be found on crates.io, is useful because it is (well, claims to be) safe and does all the run-time checks. The function you are proposjng does some run-time checks ( Its main use comes from the fact that people might search for it in the libstd docs, and then we can say in clear words that it is impossible to implement a safe version of this and that they should forward this claim to any crates.io library that says otherwise. But it seems odd to add a function to libstd whose main purpose is not to be called... |
I'm mainly thinking thinking that catching some of the possible errors is better than none. Besides adjacency, which has false positives but no false negatives, there are lifetimes and slice length, all of which you can get wrong with |
To be sure, I also think that it makes sense to have the function in standard library to prevent people from reimplementing it sloppily. With the above argument I was just trying to say that not being called is not it's only purpose. |
Agreed. I think it's also plausible that some people have a bad day, don't read the docs very clearly, and then go on to use the method wrongly. |
improve from_raw_parts docs Triggered by rust-lang/rfcs#2806. Hopefully this helps clarify that joining slices across allocations is not possible in Rust currently. r? @Centril
improve from_raw_parts docs Triggered by rust-lang/rfcs#2806. Hopefully this helps clarify that joining slices across allocations is not possible in Rust currently. r? @Centril
improve from_raw_parts docs Triggered by rust-lang/rfcs#2806. Hopefully this helps clarify that joining slices across allocations is not possible in Rust currently. r? @Centril
Thanks for the insightful comments. It's clear that this RFC is unacceptable in this form. While I think that the discussion about whether to provide an unsafe API is valuable to have, let's do that once there is a proposal. I'm closing this PR at this time. |
It'd presumably be less ergonomic and expressive, but in principle could invariant lifetimes be used to prove that different references/slices came from the same allocation? |
@glaebhoerl I haven't looked into it but I believe that's what @HeroicKatora's API that @rkruppe linked, does. |
It doesn't, but thanks for the tip. (Maybe it should; I can't tell what the situation is with the dynamic checks it also adds.) |
Add method
fn rejoin(&self, other: &[T]) -> &[T]
for slice, with other related methods (*_mut
,try_*
, andrejoin
,try_rejoin
for&str
). This API allows joining two slices that are adjacent in the memory, into one.Rendered