Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add next_array and collect_array #560

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

orlp
Copy link

@orlp orlp commented Jul 15, 2021

With this pull request I add two new functions to the Itertools trait:

fn next_array<T, const N: usize>(&mut self) -> Option<[T; N]>
where Self: Sized + Iterator<Item = T>;

fn collect_array<T, const N: usize>(mut self) -> Option<[T; N]>
where Self: Sized + Iterator<Item = T>;

These behave exactly like next_tuple and collect_tuple, however they return arrays instead. Since these functions require min_const_generics, I added a tiny build script that checks if Rust's version is 1.51 or higher, and if yes to set the has_min_const_generics config variable. This means that Itertools does not suddenly require 1.51 or higher, only these two functions do.

In order to facilitate this I did have to bump the minimum required Rust version to 1.34 from the (documented) 1.32, since Rust 1.32 and 1.33 have trouble parsing the file even if stuff is conditionally compiled. However, this should not result in any (new) breakage, because Itertools actually already requires Rust 1.34 for 9+ months, since 83c0f04 uses saturating_pow which wasn't stabilized until 1.34.


As for rationale, I think these functions are useful, especially for pattern matching and parsing. I don't think there's a high probability they get added to the standard library either, so that's why I directly make a pull request here. When/if TryFromIterator stabilizes we can simplify the implementation, but even then I believe these functions remain a good addition similarly how collect_vec is nice to have despite .collect::<Vec<_>> existing.

@orlp
Copy link
Author

orlp commented Jul 15, 2021

A possible enhancement might be to return Option<A> where A: FromArray<Self::Item, N> instead, and adding the FromArray trait, something similar to this:

trait FromArray<T, const N: usize> {
    fn from_array(array: [T; N]) -> Self;
}

impl<T, const N: usize> FromArray<T, N> for [T; N] { /* .. */ }
impl<T, const N: usize> FromArray<Option<T>, N> for Option<[T; N]> { /* .. */ }
impl<T, E, const N: usize> FromArray<Result<T, E>, N> for Result<[T; N], E> { /* .. */ }

In fact, I think this is highly useful because it allows things like

let ints = line.split_whitespace().map(|n| n.parse());
if let Ok([x, y, z]) = ints.collect_array() {
    ...
}

This would be completely in line with FromIterator.

@orlp
Copy link
Author

orlp commented Jul 16, 2021

So I have a working implementation of the above idea here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9dba690b0dfc362971635e21647a4c19.

It makes this compile:

fn main() {
    let line = "32 -12 24";
    let nums = line.split_whitespace().map(|n| n.parse::<i32>());
    if let Some(Ok([x, y, z])) = nums.collect_array() {
        println!("x: {} y: {} z: {}", x, y, z);
    }
}

It would change the interface to:

trait ArrayCollectible<T>: Sized {
    fn array_from_iter<I: IntoIterator<Item = T>>(iterable: I) -> Option<Self>;
}

trait Itertools: Iterator {
    fn collect_array<A>(self) -> Option<A>
    where
        Self: Sized,
        A: ArrayCollectible<Self::Item>;
}

where

  • ArrayCollectible<T> is implemented for [T; N];
  • ArrayCollectible<Option<T>> is implemented for Option<[T; N]>;
  • ArrayCollectible<Result<T, E>> is implemented for Result<[T; N], E>.

Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there! Thanks for this. I particularly like that you thought about a way of enabling const-generic stuff without raising the minimum required rust version (even if I would imagine something else due to having an aversion against depending on other crates too much).

There has been some discussion recently about basically supporting not only tuples, but also arrays. I just want to make sure that we do not loose input from these discussions when actually settling with your solution:

On top of that, I think there are some changes in there that are not directly related to this issue. If you'd like to have them merged, could you possibly factor them out into separate PRs/commits?

build.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,80 @@
use core::mem::MaybeUninit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phimuemue phimuemue added the const-generics Require Rust 1.51 or newer label Aug 20, 2021
@orlp
Copy link
Author

orlp commented Dec 21, 2021

@phimuemue Any update on this?

@phimuemue
Copy link
Member

@phimuemue Any update on this?

I appreciate your effort, but unfortunately nothing substantial from my side: I changed my mind regarding version-check (so we could use it as a dev-dependency), but I do not have enough time right now to review and merge PRs that ample.

@orlp
Copy link
Author

orlp commented Dec 30, 2021

@phimuemue Just for posterity's sake, version-check would be a build-dependency, not dev-dependency.

@orlp
Copy link
Author

orlp commented Oct 4, 2022

@phimuemue Just checking in what the status is, I feel very strongly about the usefulness of collect_array. I miss it very often in itertools.

@scottmcm
Copy link
Contributor

scottmcm commented Oct 4, 2022

Note that if you want collect_array, you can use https://lib.rs/crates/arrayvec, as the usual way to collect into an array.

I'll also mention Iterator::next_chunk (rust-lang/rust#98326) as a nightly API that'll be next_array.

@Expurple
Copy link

This is a very useful feature. Today there was a thread on Reddit where the author basically asks if there's a crate that provides collect_array(). IMO, itertools should be the crate to do it

@Philippe-Cholet
Copy link
Member

@Expurple
I sure would like to do use const generics and collect_array is one of them.
Our MSRV is quite old (1.43.1 currently) while min-const-generics is 1.51 but I do not think it's the main blocker.
The fact is that there is not much available in recent stable Rust yet which is sad. Iterator::next_chunk and core::array::try_from_fn would be nice to have.
Plus, we currently don't really use unsafe ourselves (only in EitherOrBoth::insert* with obvious unfaillable patterns). I guess we prefer that the std does the heavy work.

@phimuemue
Copy link
Member

I sometimes think about adding arrayvec as a dependency - and falling back to std as soon it's possible. I think it might also solve some other issues (e.g. ExactlyOneError having a manual two-element-arrayvec). Would require Rust 1.51.

Another option I just saw: Crates can offer "nightly-only experimental API" (see https://docs.rs/arrayvec/latest/arrayvec/struct.ArrayVec.html#method.first_chunk for an example) - maybe this would help some users.

I personally would lean towards arrayvec. @jswrenn @Philippe-Cholet Opinions?

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Mar 28, 2024

@phimuemue

Another option I just saw: Crates can offer "nightly-only experimental API" (see https://docs.rs/arrayvec/latest/arrayvec/struct.ArrayVec.html#method.first_chunk for an example) - maybe this would help some users.

ArrayVec<T, CAP> implements Deref<Target = [T]> so (nightly-available) slice methods are directly accessible, that seems to be it.

I sometimes think about adding arrayvec as a dependency - and falling back to std as soon it's possible. I think it might also solve some other issues (e.g. ExactlyOneError having a manual two-element-arrayvec). Would require Rust 1.51.

I'm definitely not opposed to the idea but the ExactlyOneError use case is quite small.
I did not give thoughts before, do you have other examples in mind? (with private usage, in order to fall back to std ASAP).

EDIT: ArrayVec has a maximal capacity of u32::MAX, could it be an issue?

EDIT: Well I have some. With tail and k_smallest (and its variants), I had thoughts of extending them to const where I dreamt of unstable Iterator::next_chunk but I guess we could use arrayvec in the meantime.

(My idea would be that .k_smallest(50) could also support .k_smallest(Const/*::<50> if not inferred elsewhere*/) so that we don't multiply method names too much but merely add a new zero-sized type struct Const<const N: usize>; at places we only gave usize before. Then no allocation.
It's not a magic bullet for every usage though but I see a real usage for it, such as .combinations(Const): internal Vec buffer but would return arrays, so no repetitive slow allocations.)


@scottmcm Small discussion about temporarily adding arrayvec as dependency once we move to const-generics. I just saw a comment of yours related to this. Could you elaborate?

@jswrenn
Copy link
Member

jswrenn commented Mar 28, 2024

For collect_array, I think I'd prefer just taking the time myself to write the unsafe code. We can vendor the not-yet-stabilized helper functions from the standard library that we'll need.

I can allocate some time to this next week.

@orlp
Copy link
Author

orlp commented Mar 28, 2024

@jswrenn Please don't forget that we are discussing this on a PR that already has a working implementation without adding dependencies...

src/next_array.rs Outdated Show resolved Hide resolved
src/next_array.rs Outdated Show resolved Hide resolved
@jswrenn
Copy link
Member

jswrenn commented Mar 28, 2024

@orlp, thanks, I had forgotten that this was a PR and not an issue when I made my reply. Still, we're talking about adding some extremely subtle unsafe code to Itertools. I'd like us to take extreme care to avoid accidentally introducing UB.

A PR adding unsafe to itertools should:

  • rigorously document the safety and panicking conditions of every unsafe function it introduces
  • prove that every invocation of an unsafe function (even invocations occurring within other unsafe functions) satisfies the safety precondition of that invocation, with citations to official Rust documentation
  • rigorously document why any potentially panicking function within an unsafe function does not create invalid state that would cause UB upon panicking unwinds
  • intensively test its API with miri

If you can update this PR to do those things, I can see a path forward to merging it.

Copy link
Member

@jswrenn jswrenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! I like the ArrayBuilder abstraction quite a bit. As I mentioned, this will need additional documentation and testing before it can be merged. See the recent safety comments in my other project, zerocopy for a sense of the paranoia rigor I'd like these safety comments to take.

src/next_array.rs Outdated Show resolved Hide resolved
src/next_array.rs Outdated Show resolved Hide resolved
src/next_array.rs Outdated Show resolved Hide resolved
src/next_array.rs Outdated Show resolved Hide resolved
src/next_array.rs Outdated Show resolved Hide resolved
src/next_array.rs Outdated Show resolved Hide resolved
src/next_array.rs Outdated Show resolved Hide resolved
src/next_array.rs Outdated Show resolved Hide resolved
build.rs Outdated Show resolved Hide resolved
@orlp
Copy link
Author

orlp commented Mar 28, 2024

@jswrenn I will be busy the upcoming week but I'm willing to bring this up to standards after that. If before then you could decide on whether or not to bump the MSRV to 1.51 I could include that in the rewrite.

@jswrenn
Copy link
Member

jswrenn commented May 31, 2024

@orlp Think you might have time to revisit this soon? :-)

@orlp
Copy link
Author

orlp commented May 31, 2024

@jswrenn Yes, I do. Have you reached a decision yet on the MSRV issue?

@Philippe-Cholet
Copy link
Member

I think it's time to set the MSRV to 1.51, I'm pretty sure @jswrenn and @phimuemue will agree.

@jswrenn
Copy link
Member

jswrenn commented Jun 1, 2024

Absolutely. I'd even be fine going up to 1.55, which is nearly three years old. In my other crates, I've found that to be the lowest MSRV that users actually need.

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Jun 1, 2024

After quickly going though release notes, I may have missed something but I only noted two things 1.55 has over 1.51 that I considered to be of potential use for us:

EDIT: 1.51 has those things over 1.43.1: const-generics Require Rust 1.51 or newer 🎉, bool::then, slice::select_nth_unstable[_by[_key]] (unlocking #925), VecDeque::make_contiguous, Option::zip.

@phimuemue
Copy link
Member

Out of curiosity and slightly off-topic: What's a real reason to not update to stable Rust? Does it ever remove support for some platform or raise the system requirements dramatically? Or, put alternatively: Are there situations where someone could use cutting-edge itertools but not stable Rust?

@jswrenn
Copy link
Member

jswrenn commented Jun 4, 2024

Are there situations where someone could use cutting-edge itertools but not stable Rust?

Yes: Libraries that depend on itertools, but set a MSRV lower than stable. They are, of course, welcome to use an older, MSRV-compatible version of itertools, but we currently don't backport bugfixes to older versions.

What's a real reason to not update to stable Rust? Does it ever remove support for some platform or raise the system requirements dramatically?

Rust occasionally does remove support for platforms; e.g.: https://blog.rust-lang.org/2022/08/01/Increasing-glibc-kernel-requirements.html

(The above post suggests that, conservatively, we could increase our MSRV to 1.63 without causing major problem for users. Maybe that's a good target MSRV for now?)

@scottmcm
Copy link
Contributor

@Philippe-Cholet

I just saw a comment of yours related to this. Could you elaborate?

Basically, when I look at something like this:

[...] where the author basically asks if there's a crate that provides collect_array(). IMO, itertools should be the crate to do it

I end up thinking that it shouldn't be itertools doing it, because there's lots of choices for how to do it, and the best answer for it is instead to use one of those other crates that already does it.

Collecting to exactly an array is a pretty narrow use case, IMHO. Collecting to an ArrayVec<T, 2> instead of a [T; 2] is both more general and easier to deal with the edge cases -- after all, what should empty().collect_array::<2>() do? And there's a fallible conversion from ArrayVec to array by checking the length if you need that.

And while of course ArrayVec could be a dependency, but which ArrayVec? I know of at least two relatively-popular ones:

Or maybe one day rust-lang/rfcs#3316 will happen to have one in core too.

Plus, the types that these things should return are not really obvious. https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.next_chunk ends up returning Result<[Self::Item; N], IntoIter<Self::Item, N>> because someone might want to handle the elements from the tail, rather than just dropping them on the floor when returning None. But that IntoIter type has no stable constructors. And if we added a type like it, basically it's re-creating an ArrayVec type.

TL/DR: I think the crates that already have the logic for making arrays from iterators -- both ArrayVecs above are FromIterator -- should offer this, rather than itertools adding code to do it.


I suppose another way would be to intentionally down-scope what this would support in order to avoid complexity. Example (that has some problems, but hopefully demonstrates an idea): we could offer an impl ExactSizeIterator -> Option<[T; N]> that checks the len first, at which point the implementation is just from_fn(|_| it.next().unwrap()) and it never drops elements on the floor.

@Philippe-Cholet
Copy link
Member

Breaking news: the MSRV has been bumped to 1.63.0! 🎉

@scottmcm If I were dreaming of itertools at night, I would have nightmares not having Iterator::next_chunk available to us (stable and within our MSRV).
arrayvec/tinyvec was rejected (probable comments above).
This unsafe work sure is a temporary fix for all that. We will rely on the stdlib in the end.

@jswrenn
Copy link
Member

jswrenn commented Jun 18, 2024

I'll make the necessary safety comment edits to this PR this week.

@jswrenn jswrenn force-pushed the collect_array branch 2 times, most recently from f3e8deb to 461be14 Compare June 28, 2024 18:09
@jswrenn
Copy link
Member

jswrenn commented Jun 28, 2024

@Philippe-Cholet, @phimuemue and @scottmcm, could you give this PR a final review? I've:

  • revised the safety proofs
  • modified the Drop impl avoid raw pointers and to have smaller unsafe blocks
  • modified push to avoid unspecified behavior between builds with checked and wraparound arithmetic

Copy link

codecov bot commented Jun 28, 2024

Codecov Report

Attention: Patch coverage is 99.03846% with 1 line in your changes missing coverage. Please review.

Project coverage is 94.46%. Comparing base (6814180) to head (cf0a160).
Report is 128 commits behind head on master.

Files with missing lines Patch % Lines
src/next_array.rs 98.68% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #560      +/-   ##
==========================================
+ Coverage   94.38%   94.46%   +0.07%     
==========================================
  Files          48       50       +2     
  Lines        6665     6866     +201     
==========================================
+ Hits         6291     6486     +195     
- Misses        374      380       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

/// # Panics
///
/// This panics if `self.len >= N` or if `self.len == usize::MAX`.
pub fn push(&mut self, value: T) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if try_push would be a useful API here


impl<T, const N: usize> Drop for ArrayBuilder<T, N> {
fn drop(&mut self) {
// Select the valid elements of `self.arr`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might add a comment noting that dropping Uninit does nothing (hence the reason for this function existing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// of `MaybeUninit<T>` in the initialized state.
//
// [1]: https://doc.rust-lang.org/std/mem/union.MaybeUninit.html#layout-1
let (_, valid, _): (_, &mut [T], _) = unsafe { valid.align_to_mut::<T>() };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious—why not take the approach above where you use arr.map { unsafe { v.assume_init() } } then drop that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do so here because we're operating on a dynamic — not static — number of elements. Here, the only elements initialized are the ones from index 0 to self.len. By contrast, in ArrayBuilder::take, elements 0 to N (a constant) are initialized. array::map only operates on fixed-sized arrays, not slices.

Comment on lines +64 to +63
// SAFETY: Since `self.len` is 0, `self.arr` may safely contain
// uninitialized elements.
let arr = mem::replace(&mut self.arr, [(); N].map(|_| MaybeUninit::uninit()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be simplified replace all of self with ArrayBuilder::new()?

Copy link
Member

@jswrenn jswrenn Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so. I agree, it would be ideal if we could simply write:

let Self { arr, len } = mem::replace(self, Self::new());

instead. However, because ArrayBuilder has a non-trivial Drop implementation, we cannot move-destructure it in that manner (doing so triggers a compilation error). Instead, we need to read and write each field individually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest using std::ptr::read instead of std::mem::replaceing with [uninit(); N], but in release mode they seem to be optimized to the same thing (I assume the optimizer knows that writing uninit() is a no-op), so whichever is more readable is probably better. (godbolt link)

@jswrenn jswrenn force-pushed the collect_array branch 2 times, most recently from 4af66ad to 9c33e3d Compare June 28, 2024 19:47
Comment on lines 40 to 50
// PANICS: This will panic if `self.len == usize::MAX`.
// SAFETY: By invariant on `self.arr`, all elements at indicies
// `0..self.len` are valid. Due to the above write, the element at
// `self.len` is now also valid. Consequently, all elements at indicies
// `0..(self.len + 1)` are valid, and `self.len` can be safely
// incremented without violating `self.arr`'s invariant. It is fine if
// this increment panics, as we have not created any intermediate
// invalid state.
self.len = match self.len.checked_add(1) {
Some(sum) => sum,
None => panic!("`self.len == usize::MAX`; cannot increment `len`"),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this panic branch unreachable? Since self.arr[self.len] did not panic above, we therefore know that self.len < N, meaning that self.len + 1 <= N and thus self.len + 1 cannot overflow.

Suggested change
// PANICS: This will panic if `self.len == usize::MAX`.
// SAFETY: By invariant on `self.arr`, all elements at indicies
// `0..self.len` are valid. Due to the above write, the element at
// `self.len` is now also valid. Consequently, all elements at indicies
// `0..(self.len + 1)` are valid, and `self.len` can be safely
// incremented without violating `self.arr`'s invariant. It is fine if
// this increment panics, as we have not created any intermediate
// invalid state.
self.len = match self.len.checked_add(1) {
Some(sum) => sum,
None => panic!("`self.len == usize::MAX`; cannot increment `len`"),
};
// PANICS: This cannot panic, since `self.len < N <= usize::MAX`.
// SAFETY: By invariant on `self.arr`, all elements at indicies
// `0..self.len` are valid. Due to the above write, the element at
// `self.len` is now also valid. Consequently, all elements at indicies
// `0..(self.len + 1)` are valid, and `self.len` can be safely
// incremented without violating `self.arr`'s invariant.
self.len += 1;

src/lib.rs Outdated Show resolved Hide resolved
@orlp
Copy link
Author

orlp commented Jun 29, 2024

@jswrenn In your refactor you've introduced two bugs, one of which is undefined behavior:

  1. The new code only frees the first element and leaks the rest on drop.
  2. The new code calls drop on an uninitialized value if the ArrayBuilder is empty.

Please read my original version carefully, I intentionally used ptr::slice_from_raw_parts_mut to transform the pointer-to-first-element (*mut T) returned by [T]::as_mut_ptr into a pointer-to-slice (*mut [T]) to then drop the slice in-place. That's why the variables were called ptr_to_first and ptr_to_slice. My only regret for the original version was not explicitly marking the latter as such using type ascription as well, let ptr_to_slice: *mut [T] = ....

Your new code only makes a pointer to the first element, and forcibly only drops that element, regardless of whether it is initialized or not.

I can't help but point out that adding 60 lines of comments and various extra steps to what used to be a correct 3 LOC + 4 lines of comments implementation did not help with preventing this bug for the writer, and for me personally as a reader it made the issue harder to spot.

@jswrenn
Copy link
Member

jswrenn commented Jun 29, 2024

I appreciate the catch! Given that this passed CI, it looks like you've also uncovered a gap in our test coverage. I'll go ahead and add miri to our CI.

Orson, you will not have me in agreement on the commenting issue. I wouldn't have to read your original version so carefully to guess at your intentions if had you had left comments making them explicit. The comments are for the benefit of maintainers, who need to make sure we don't regress on any of these subtleties in perpetuity.

And the comments did result in a potential soundness issue being discovered by a reviewer — the original code leaves a live reference to an invalid (dropped) referent. In fact, the regression you just discovered was introduced as part of fixing this potential soundness issue that was present in your original PR.

@orlp
Copy link
Author

orlp commented Jun 29, 2024

And the comments did result in a potential soundness issue being discovered by a reviewer — the original code leaves a live reference to an invalid (dropped) referent. In fact, the regression you just discovered was introduced as part of fixing this potential soundness issue that was present in your original PR.

That was never part of my original PR, but also due to your refactor. My original code never left a live reference. I implore you to look at the original code again.

Even if we take that at face value, your purported solution does nothing to resolve it. The following code compiles:

let mut x = 42;
let mut r = &mut x;
{
    // <pointer stuff>
    let _ = r; // This does not drop r! This is a no-op!
}
*r = 0; // r is still live!

But because you wrote this:

// Move `valid` out of the surrounding scope and immediately drop
// it. `ptr` is now the only pointer to `valid`'s referent.
let _ = valid;

I just nodded along, believing the comment, and didn't even catch that it doesn't drop valid at all the first time around. Only when you just now pointed at it again did I take a closer look, ignored the comments and figure out what it's actually doing.

@orlp
Copy link
Author

orlp commented Jun 29, 2024

@jswrenn And it gets worse.

Even if you had correctly turned valid: &mut [T] into *mut [T] and even if you had indeed used drop(valid) to drop valid... it still would be undefined behavior by Miri's stacked borrows model. The slice from which a pointer is derived must stay alive for that pointer to remain valid under stacked borrows. That is, this is UB under stacked borrows:

let mut arr = [1, 2, 3];
let r = &mut arr[..];
let p = r.as_mut_ptr();
drop(r);
unsafe { dbg!(&*p) };
drop as intended pointer-to-slice as intended Outcome
No No Undefined behavior, uninitialized element dropped
No Yes Code accidentally correct, comments incorrect and perceived potential unsoundness from live &mut [T] unadressed
Yes No Undefined behavior, invalidated pointer accessed and uninitialized element dropped
Yes Yes Undefined behavior, invalidated pointer accessed

I kindly ask you to reflect on why the refactor, despite what I assume is your best intent and care with respect to the safety comments, did not catch either of the above issues nor that their intended combination still would be invalid.

@jswrenn
Copy link
Member

jswrenn commented Jun 29, 2024

I kindly ask you to reflect on why the refactor, despite what I assume is your best intent and care with respect to the safety comments, did not catch either of the above issues nor that their intended combination still would be invalid.

Whatever the reason, I'm sure it's not because I wrote too many comments.

I appreciate your knowledgeability, but, again, this abstraction has to survive idiots like me and whoever else contributes to itertools in the future. Its soundness cannot depend on the brilliance of a lone genius. I'm delighted to take your suggestions, but I am going to make the reasoning behind those suggestions explicit with new comments.

@jswrenn
Copy link
Member

jswrenn commented Jun 29, 2024

Stripping most of the comments out for your benefit, could you review this implementation (essentially undoing this):

fn drop(&mut self) {
    // Select the valid elements of `self.arr`.
    let (valid, _) = self.arr.split_at_mut(self.len);

    // Cast `valid` from `&[MaybeUninit<T>]` to `&[T]`
    let (_, valid, _): (_, &mut [T], _) = unsafe { valid.align_to_mut::<T>() };

    // Drop `valid`.
    unsafe {
        ptr::drop_in_place(valid);
    }
}

@orlp
Copy link
Author

orlp commented Jun 29, 2024

@jswrenn While I believe that is sound, I find using align_to_mut to assume the slice is initialized to be very strange, and it's certainly an intent-implementation semantic mismatch.

As an alternative, I would propose that we copy MaybeUninit::slice_assume_init_mut from the standard library with a simple safety comment deferring to the standard library:

unsafe fn maybeuninit_slice_assume_init_mut<T>(slice: &mut [MaybeUninit<T>]) -> &mut [T] {
    // SAFETY: see the standard library implementation of MaybeUninit::slice_assume_init_mut,
    // this implementation is copied from there as it is not available at our MSRV.
    unsafe { &mut *(slice as *mut [MaybeUninit<T>] as *mut [T]) }
}

Furthermore I also don't understand why you use split_at_mut and then promptly ignore one half. May I suggest instead the following, also making things more re-usable / composable:

impl<T, const N: usize> AsMut<[T]> for ArrayBuilder<T, N> {
    fn as_mut(&mut self) -> &mut [T] {
        // SAFETY: self.arr[..self.len] is valid.
        unsafe { maybeuninit_slice_assume_init_mut(&mut self.arr[..self.len]) }
    }
}

Then our drop is painfully simple:

impl<T, const N: usize> Drop for ArrayBuilder<T, N> {
    fn drop(&mut self) {
        unsafe { core::ptr::drop_in_place(self.as_mut()) }
    }
}

EDIT: instead of AsMut I meant implementing Deref<Target = [T]> as well as DerefMut. I'm a bit too lazy to update the above example also adding the immutable variants, but I think in general that would be useful and simplifies / makes more self-contained the unsafe bits.

@jswrenn
Copy link
Member

jswrenn commented Jun 29, 2024

I quite like that formulation!

src/lib.rs Outdated Show resolved Hide resolved
@Philippe-Cholet
Copy link
Member

@jswrenn About CI, I just noticed 😮 that the needs list in all-jobs-succeed does not include miri. Let's add it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
const-generics Require Rust 1.51 or newer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants