Skip to content

Added set_size_hint method #341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

peterjoel
Copy link

Sometimes there is not enough information available for adapters to correctly deduce a useful size hint. However, a developer might have more information and this new iterator method allows them to express it.

A motivating example from the code:

let data = vec![1, 2, 6, 7, 2];

let result: Vec<i32> = data.iter()
    // The `FlatMap` adapter is not able to deduce the size hint itself
    // so `size_hint()` would return `(0, None)`.
    .flat_map(|&x| {
        let repeats = if x % 2 == 0 { 1 } else { 3 };
        itertools::repeat_n(x, repeats)
    })
    // But we know the bounds on the max and min number of items in the
    // resulting iterator, so we can provide that information:
    .set_size_hint(data.len(), Some(3 * data.len()))
    // The `Vec` should not excessively re-allocate, while collecting
    // since it now knows the minimum and maximum number of items.
    .collect();

I am not 100% happy with the method name, set_size_hint, and there are alternatives in which only one of the bounds is set at a time.

Update,
WhileSome,
};
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also rearranged these into alphabetical order, while I was adding one anyway.

@jswrenn
Copy link
Member

jswrenn commented Jul 18, 2019

Thank you for your contribution! I'm convinced this is a compelling use-case, but have a few concerns:

  1. Using internal iteration methods on this iterator will be a significant performance footgun when wrapping iterators that specialize those methods (like Chain, or Intersperse). We should override the 'root' internal iteration method that other methods tend to call with one that directly calls that method on the inner iterator. Currently, that method is Iterator::try_fold. Unfortunately, it was stabilized in 1.27, and Itertools supports 1.24. Iterator::fold is the next best choice.
  2. The name set_size_hint implies the size hint of the wrapped iterator is being mutated. An alternative name. Perhaps with_size_hint? Unfortunately, in the standard library, the name with tends to imply the function consumes a function as its argument.
  3. This feature is a possible candidate for inclusion in the standard library. Per the inclusion policy, could you file a PR to rust-lang/rust?

@jswrenn jswrenn self-requested a review July 18, 2019 16:16
@jswrenn jswrenn self-assigned this Jul 18, 2019
@jswrenn jswrenn removed their request for review July 18, 2019 20:25
@Ten0
Copy link
Contributor

Ten0 commented Sep 26, 2019

This seems like a useful functionality.

  • Performance-wise, it should call many of the original iterator's implementations in order to preserve the possible optimizations (and count should have a custom one based on the size_hint and maybe skip).
  • I also feel like calls that consume part of the iterator should update the size hint accordingly. (its size changes as you consume elements).
  • Regarding the previous note about the name, I understand the point but I can't think of a legitimate iterator where the size hint would be set-able and where it wouldn't consist in wrapping it into another one so I don't find it too confusing. I have another idea though : how about overriding_size_hint()?

In any case, +1

@phimuemue
Copy link
Member

phimuemue commented Sep 26, 2019

I agree that this looks useful.

Could/should we debug_assert that the improved size_hint does not contradict (resp. worsen) the size_hint of the underlying iterator?

@bluss
Copy link
Member

bluss commented Sep 28, 2019

Just to note that try_fold is unfortunately out of bounds for itertools until it can be implemented in stable Rust (we can't name the Try trait in stable Rust yet).

@bluss
Copy link
Member

bluss commented Sep 28, 2019

@phimuemue I'd skip that check, it risks disrupting the program as much as it helps

@jswrenn
Copy link
Member

jswrenn commented Sep 30, 2019

@Ten0 I also feel like calls that consume part of the iterator should update the size hint accordingly. (its size changes as you consume elements).

I agree! @peterjoel, what do you think of this approach?

pub struct SetSizeHint<I> {
    iter: I,
    lower: usize,
    upper: Option<usize>,
}

impl<I> Iterator for SetSizeHint<I>
where
    I: Iterator,
{
    type Item = I::Item;

    #[inline(always)]
    fn next(&mut self) -> Option<Self::Item> {
        let next = self.iter.next();

        if next.is_some() {
            self.lower = self.lower.saturating_sub(1);

            if let Some(ref mut upper) = self.upper {
                *upper = upper.saturating_sub(1);
            }
        }

        next
    }

    #[inline(always)]
    fn fold<B, F>(mut self, init: B, mut f: F) -> B
    where
        Self: Sized,
        F: FnMut(B, Self::Item) -> B,
    {
        let mut consumed: usize = 0;

        let result = self.iter.fold(init, |acc, x| {
            consumed = consumed.saturating_add(1);
            f(acc, x)
        });

        self.lower = self.lower.saturating_sub(consumed);

        if let Some(ref mut upper) = self.upper {
            *upper = upper.saturating_sub(consumed);
        }

        result
    }

    #[inline(always)]
    fn size_hint(&self) -> (usize, Option<usize>) {
        (self.lower, self.upper)
    }
}

There are three notable changes:

  1. The lower and upper bounds of the set size hint are decremented with each consumed item to ensure that the user's estimate of the size bounds are adjusted accordingly as they consume elements from the iterator.
  2. fold is specialized to reduce the internal-iteration performance footgun of this adapter.
  3. The size_hint tuple is flattened into distinct lower and upper bounds fields. This gives rust a little more flexibility in how it lays out SetSizeHint (and potentially saves bytes).

@peterjoel
Copy link
Author

  1. The lower and upper bounds of the set size hint are decremented with each consumed item to ensure that the user's estimate of the size bounds are adjusted accordingly as they consume elements from the iterator.

I'd want to make sure that the decrement is mostly optimised out in cases where the size hint ends up not being used.

@Ten0
Copy link
Contributor

Ten0 commented Oct 1, 2019

I'd want to make sure that the decrement is mostly optimised out in cases where the size hint ends up not being used.

In these cases probably fold/for_each could be used, and also maybe one could avoid calling set_size_hint/override_size_hint in the first place.

@peterjoel
Copy link
Author

That's true. There is no reason to use the method if you don't need it.

@peterjoel
Copy link
Author

peterjoel commented Oct 1, 2019

@jswrenn The methods you refer to, which use with to indicate that an argument is a function, always use _with as a suffix, so I think with_size_hint is ok. It also has precedence in with_hasher and with_capacity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants