-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACP: PtrRange<T>
type
#423
Comments
assuming you want |
I think the following would be nice additions impl<T> PtrRange<T> {
// Caller must guarantee that the `length` is a valid offset from `start`.
pub unsafe fn from_raw_parts(start: NonNull<T>, length: usize);
// Totally safe to construct.
pub fn new(slice: &[T]);
} |
If you want this to work for ZSTs, you'll need Fun to see this, since I actually have a PR open to make it internally: https://github.com/rust-lang/rust/pull/127348/files#diff-1f555155b76a53e257b9b4ee13a9a825a7f346cf4447763b2cb98a7a8bf11dd6R414-R426 |
Some previous conversations: rust-lang/rust#91390 |
Hmm, I feel like part of this is ensuring that the pointer range is a legitimate representation of a slice, and so we'd want to make that another precondition for the range. But, that makes sense.
I had totally forgotten that we do this weird thing for ZSTs. Makes sense to me. Updated for both of these: the internal representation matches the one of the existing proposals, and the precondition now requires that the pointers be a multiple of |
5th alternative make use of the DST pointer types |
The main issue with that is that those are explicitly start + length and not start/end ranges. The main purpose of this is to represent slices as pointer ranges, which isn't how the fat pointers work. Unless you meant like, make a new special slice type whose metadata would encode the end pointer. Which isn't the worst idea but I'm not sure how well it'd be received. |
The difference between let block = [1, 2, 3];
let [a @ .., _] = █
let [_, b @ ..] = █
let range = PtrRange::<[i32; 2]>::from_ptr_range(addr_of!(*a)..addr_of!(*b)); because the "length" of the range is going to be 0.5. But your safety requirement already mentioned that So I don't see any reason why you need to store |
the whole point of this type is to store |
@programmerjake Then the type should be named like |
@kennytm Agreed that it makes sense to phrase this as an iterator type. Then it can be normal |
I don't know what the exact shape of this ought to be, but I'm personally excited by experimenting in this space. I'm strongly in favor of making |
I didn't say it is an iterator, just that it makes iteration-style stuff more efficient. I am not proposing that it implement |
I would prefer if it didn't implement (I'm in favor of this API btw, I've had (admittedly idiosyncratic) code where using a pair of |
I also would prefer this not be an iterator, since otherwise it doesn't have much benefit over, for example, just using |
Hmm, I think I see it as the opposite. If it's not an iterator, what does it do that The big reason for the two-pointers representation (instead of pointer-plus-count) is for Thus the mention of calling it |
My particular logic for not making it an iterator is that the representation is useful for iterators, but doesn't have to be an iterator by itself. For example, the use case I was thinking of most recently is accessing expressions in suffix notation. In particular, if you have a slice of nodes like so: struct Node<T> {
token: T,
lhs_len: usize,
rhs_len: usize,
} then you can simplify the computations done to return the operands, so that you don't need to constantly compute Like, by all means, you can make them an iterator, although that incentivises making separate ones for My thought process here is that ultimately, the pointer-pair representation of a slice is useful, and while it's most useful for iterators, it has other uses too, and it's better to simply offer a type that offers the bare-minimum invariants we want and let people implement whatever wrapper around it they need. That could involve offering said iterator wrappers in the standard library as well! I'm just offering the simplest (IMHO) solution rather than requesting the kitchen-sink package. |
Sorry but this example is incomprehensible. Where did I suppose you mean to refactor something like let val = &slice[slice.len() - len]; into let range = slice.as_ptr_range();
{ ... }
let val = unsafe { &*range.end.sub(len) }; and want a structure rather than (This is assuming |
I was trying to be brief to avoid adding more details than necessary, but I suppose that actual code is better than theoretical code, so, here you go, with the most relevant bits: struct Node<T> {
token: T,
lhs_len: usize,
rhs_len: usize,
}
pub struct Expr<'a, T> {
slice: &'a [Node<T>],
}
impl<'a, T> Expr<'a, T> {
pub(crate) fn new(slice: &'a [Node<T>]) -> Option<Expr<'a, T>> {
if slice.is_empty() {
None
} else {
Some(Expr::new_unchecked(slice))
}
}
pub fn operands(self) -> (Option<Expr<'a, T>>, Option<Expr<'a, T>>) {
let (last, rest) = self
.slice
.split_last()
.expect("expression should be non-empty");
let (rest, rhs) = rest
.split_at_checked(rest.len() - last.rhs_len)
.expect("expression RHS should be in bounds");
let (_, lhs) = rest
.split_at_checked(rest.len() - last.lhs_len)
.expect("expression LHS should be in bounds");
(Expr::new(lhs), Expr::new(rhs))
}
} (Side note: allowing LHS or RHS to be empty regardless of circumstance gives a very natural way of distinguishing between prefix operators (like It's an expression in suffix notation, like The point here is that, when navigating through the tree (which can be much larger than the examples I provided), you have to perform effectively double the arithmetic operations to index from the end instead of the beginning, due to the fact that you're computing And this is one of the two primary cases where pointer-pair representations are efficient. There are effectively two of them:
The second one is the one I'm particularly concerned about, but I acknowledge that the first is most common. The only alternative that would also solve the second issue is by having some kind of "reversed slice" representation where you still encode the length, but store the end pointer instead. I've tried this and it makes the compiler very unhappy no matter what you do, and I can't imagine a situation in which we'd actually support it. So, it makes sense to pursue the pointer-pair method instead, since that has much more broad support. Also rereading and replying to this tidbit:
Yes, you can convert a slice into an iterator, but this gains no performance benefits if you do this at each step of the way, rather than keeping a pointer-pair representation throughout. Obtaining the end pointer for a pointer-pair representation requires adding the length to the start to begin with, which is the operation we're trying to avoid doing repeatedly. |
This was assuming you were trying to get the nth last element since you did not specify the purpose. Turns out it is to take a subslice so this is no longer relevant. (Though AFAIK your proposed
I meant Even
(I suppose you meant to say "Obtaining the end pointer for a pointer-length representation requires adding the length to the start to begin with,") |
Thinking about this more, I remembered why I didn't try to expose the raw non-null iterator in rust-lang/rust#127348 : it can't be used in safe code. Basically, if there's a safe constructor from a slice, it can no longer use So I think the big question here is where the safety promise should be made. Is it that it only has unsafe constructors, and the promise made in the constructor is "I won't use this for too long"? That's my least favourite kind of safety precondition, because it's entirely uncheckable at the time, though maybe there's no way to avoid it. Or should it have safe constructors from things like slices, but then Alternatively, should it have a lifetime on it anyway, and rely on the |
If I had my way and didn't care at all about stability guarantees or what others think, I personally would like two new types that represent slices and can be cast between each other:
And then there's no need for extra unsafe APIs to accomplish this stuff. I guess that my main reasoning for the unsafe wrapper in this API is that it could be easily converted into a safe wrapper, but has substantially smaller API guarantees than a safe wrapper would have. But maybe this is the wrong path to go down and a safe wrapper is the way to go. My end + len representation is basically DOA since it requires something to own memory before its pointer, which seems unlikely to be ever supported. But theoretically, with compiler help, it's entirely possible to make an unsized I guess that would be best as an MCP, and not an ACP, then? Since it would be more a land/compiler change than a libs change. |
We discussed this during today's libs-api meeting. Several people were not in favor, although for different reasons. We do understand the performance benefit of having the alternative slice representation, but the benefit of having this as a common abstraction, especially in std, seem unproven. And it's a rather low-level feature where different users might want different safety constraints. In std we also want to support ZSTs in slice iterators, maybe someone else would just want to exclude that special case or handle it differently. Additionally the code patterns that please the optimizers most change with time so the exact needed API surface might change too over time. So for now we would want internal uses like rust-lang/rust#127348 or an implementation outside std. In the future when they've proven themselves useful we may reconsider. |
Proposal
Problem statement
Pointer ranges are a very useful way of working with slices, and they're currently how the standard library implements many of the slice iterator types.
However, these ranges have a problem that is only made more clear with the move toward strict provenance: there is no guarantee that the two pointers in the range are actually related to each other, or that they refer to the same buffer. While there are some efforts that have been made to rectify this within the standard library itself, it would be nice to have a dedicated pointer range type whose internals can be modified based upon whatever is most easily optimisable by the compiler.
Motivating examples or use cases
Solution sketch
The precedent from
NonNull
implies that the pointer range type should serves the purpose of both*const T
and*mut T
ranges. Additionally,NonNull
should be implied, since Rust requires that allocations be non-null; this would enable niche optimisation for things likeOption<PtrRange<T>>
.The internal layout for this type can initially just be the naïve:
(Note: The use of
*const T
for the second argument preserves covariance while also allowing working as you'd expect for ZSTs, which use the offset to encode the length of the range.)Although further refinements can be made to, for example, make
end
just ausize
whose provenance gets taken fromstart
. To allow these kinds of optimisations, these fields should not be made public, and instead have methods:Presumably, there could be a few methods on these that are similar to slices, with the addition of back-indexing.
Presumably, over time, more methods could be added. I don't want to prescribe too much at this stage since I'm not sure what people would use them for.
Alternatives
There are a few options for this:
start
andend
fields public. This feels like a bad option: if we're going to optimise the provenance here, then why do so in a struct which is clearly identical toRange
? We could just optimiseRange
instead. Plus, this would also require something like unsafe fields, since modifying the fields is unsafe.Range
, somehow. This would probably involve adding some weird trickery to the compiler to allow relating the provenance of fields on one struct, and keeping that tagged throughout the entire flow of the program. We've done similar things with the borrow checker, but that feels like a lot of work and unlikely to get done any time soon.Range
, but add methods on it similar to the slice operations. We already havelen
andis_empty
for them (technically, viaExactSizeIterator
), so we're already partially there.Links and related work
slice::DrainRaw
for internal use rust#127348What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution:
The text was updated successfully, but these errors were encountered: