Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for adding an Iterable trait family #17

Closed
wants to merge 4 commits into from

Conversation

erickt
Copy link

@erickt erickt commented Mar 21, 2014

Add an Iterable family of traits that will allow a function to consume both
an iterator or a value that can be converted into an iterator.

@pnkfelix
Copy link
Member

Is this not the same as an Iterable trait, which I think has been proposed in the past?

A motivation for Iterable was being able to do for x in v { ... } (where v is of some type implementing Iterable), but there was some technical reason we were not able to put Iterable in at the time that we switched to external iterators.

So I am mostly curious if this proposal ends up being the same as Iterable, apart from a change in the name of the trait and also whether it would imply a (presumably backwards-compatible) change to the for macro.


Update: Ah, I remembered (slash reviewed some irc logs): the technical problem was that people wanted to have a single generic impl<I:Iterator> Iterable for I, which under the current coherence rules is too broad and ends up clashing with attempts to impl Iterable on non-iterator types.

This rfc, as written, may still be the same as the idea of Iterable in the abstract, but the concrete proposal is to have a separate impl of Iterable for each Iterator, rather than a single generic one. So that side-steps the technical issue alluded to above (but carries the ugliness that one probably wants a generic impl for all Iterator. Though there may other ways around that).

@Kimundi
Copy link
Member

Kimundi commented Mar 21, 2014

@pnkfelix: I sketched up what I think is a usable Iterable the last day: https://gist.github.com/Kimundi/9683769. It's basically the same as this RFC.

@erickt: Feel free to incoperate it into the RFC.

@erickt
Copy link
Author

erickt commented Mar 21, 2014

@pnkfelix: I had in mind that this IntoIterator and an Iterable trait would be two separate things, but @Kimundi's gist shows a nice way of getting these two work together. I'll do another pass on the RFC and merge that approach into the proposal.

We could change for x in v { ... } to use Iterable::into_iter, but I'm not sure if it'd be that big of a win. I suspect that most users of for-in cannot consume the value. For example, when iterating over a &[int]. So we'd end up changing most instances to for x in v.refs() { ... }. The one nice thing with this approach is that it's one more thing to elevate moves, which may help cut down on needless copying.

I'm quite looking forward to the day when we can do impl<I: Iterator> Iterable for I :) Yes, this is trading some ugliness to work around the coherence rules. I'll also note that we can refactor this design if that feature ever gets implemented.

@Kimundi: your list looks great. Do you see just one implementation of your PodIterable? Because I think this impl will run into the coherence problems @pnkfelix mentioned.

@erickt erickt changed the title RFC for adding an IntoIterator trait RFC for adding an Iterable trait family Mar 21, 2014
@Kimundi
Copy link
Member

Kimundi commented Mar 21, 2014

@erickt PodIterable would be implemented just once, for all T: Pod, Iterator<&T>, and I don't see how you could usefully specialize it. :)

Also, note that because something like &[T] is Pod, calling into_iter() on it wouldn't invalidate the original ref slice, so it might actually be okay in certain situations.


A few more thoughts:

  • With DST, and a Deref<[T]> impl on Vec<T>, and a for-loop changed to use Iterable, for elem in vec would move the vector, while for elem in *vec would give one references because the &[T] impl gets picked.
  • I think @nikomatsakis was talking about making by-value passing for [T] work, which could change the situation somewhat:
    • If we implement the Iterable trait for [T] directly it could work, but I'm unsure what for x in *vec would do.
    • I think you get &[T], which would not implement Iterable and would probably not allow to call the self method for [T] because of ownership?
    • Could always add a explicit impl for &[T] though.

```

Since every Iterator will have the same implementation so this could be a good
use case for a macro.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or a deriving?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this would be a perfect fit for #[deriving(Iterable)], though it might be confusing because its only for Iterators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuck. This is a really nasty hack, and people will forget a #[deriving(Iterable)] from time to time, leaving it much less usable. The only reasonable solution is to automatically implement it, which probably requires function specialisation to be implemented (rust-lang/rust#7059). Would it be feasible for us to start looking in that direction rather than piling #[deriving] upon #[deriving], line upon line, here a little, there a little?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chris-morgan Note that it is not possible to forget the Iterable impl on iterators: Iterator<A> inherits from Iterable<A, Self>.

So all a deriving mode would enable is to easily provide the implementation as long as the coherence rules prohibit a automatic generic implementation:

#[deriving(Iterable)]
struct MyIter<A> { ... }
impl<A> Iterator<A> for MyIter<A> { ... }

instead of

struct MyIter<A> { ... }
impl<A> Iterator<A> for MyIter<A> { ... }
impl<A> Iterable<A, MyIter<A>> for MyIter<A> { fn into_iter(self) -> MyIter<A> { self } }

Because of its special status of only being for iterators themself, a different name than deriving might be more appropriate though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a custom attribute syntax extension to my example gist https://gist.github.com/Kimundi/9683769.

It implements a #[iterable] attribute that you'd need to put on every implementation of Iterator like this:

#[iterable]
impl<T> Iterator<T> for MyIter<T> {
    fn next(&mut self) -> Option<T> { ... }
}

It would expand to this trivial implementation:

impl<T> Iterator<T> for MyIter<T> {
    fn next(&mut self) -> Option<T> { ... }
}
impl<T> Iterable<T, MyIter<T>> for MyIter<T> {
    fn into_iter(self) -> MyIter<T> { self }
}

@zkamsler
Copy link

If the intention of Iterable is to eventually be used by the for loop, then there really needs to be a concrete plan as to how that would work in order to avoid unfortunate surprises later on.

For containers, one does not usually want to (or is able to) consume the container just to iterate over it. Even for iterators, it is sometimes useful to partially iterate over an iterator without consuming it. The current desugaring just mutably borrows the iterator.

@Kimundi
Copy link
Member

Kimundi commented Mar 25, 2014

@zkamsler: This is a very good point.

The current &mut desugaring works because it either creates a new stack temporary or a reference to an existing one, depending on the argument to the operator.

This duality is what allows todays for loop to either advance an existing iterator, or to iterate over a new one.

I toyed around with a version of the for loop that calls into_iter instead, and found that if

  • Every T: Iterator<A> also implements Iterable<A, Self> and MutIterable<A, ByRef<Self>>
    • Which means for every iterator you can turn self and &mut self into a iterator.
  • And there exist two generic impls
    • For T: RefIterable, impl Iterable for &T
    • For T: MutIterable, impl Iterable for &mut T
    • Which means for every reference to a T: Iterable, you can get the impls for non owning iterator to that T.

Then you can still get both behaviors by explicitly constructing a &mut iter:

let iter = ...;
for_ng x in iter { 
    // ^ Calls iter.into_iter(), consuming it. 
    // The iterator is a type_of(iter)
    ...
}
for_ng x in iter { 
    // ^ move error
    ...
}
let mut iter = ...;
for_ng x in &mut iter { 
    // ^ Calls (&mut iter).into_iter() -> calls iter.mut_iter().
    //  The iterator is a ByRef<type_of(iter)>
    ...
}
for_ng x in &mut iter { 
    // ^ Works
    ...
}

This also works if you create the reference once, like let iter_ref = &mut iter.

I'd argue that the explicitness here is better than for doing something more magical.
To elaborate, I'd say that for trying to take the iterable by value is the most sane way of doing this under Rusts ownership model, and the fact that this leads naive attempts to iterate over a data structure to consume that structure is just the logical conclusion of these rules.

However, doing it this way has two big problems:

  • To be sane, Iterator<A> would need to inherit from Iterable<A, Self> and MutIterable<'a, A, ByRef<'a, Self>>. Implementing this is not a problem, my #[iterable] attribute can simply be extended to generate both impls. But, because of MutIterables lifetime parameter, the signature of Iterator<A> would need to change to Iterator<'a, A>

    • Possible way around this might be associated types, but I'm not sure.
  • This still doesn't solve the problem of being able to do.

    let mut iter = ...;
    for x in iter { 
    ... 
    iter.next() // advance the iter being iterated over. 
    }

    Which is something that is apparently wanted by some.

Now, the possible good news is that the only solution to the second problem is to make for magic enough to not create a temporary if handed a already exiting mutable T: Iterator<A>
In other words, make it choose between Iterable and Iterator on its own.

This would eliminate the need of T: Iterator to implement MutIterable, and thus also eliminate the need for a lifetime argument.

I added my experiments with this to the gist here: https://gist.github.com/Kimundi/9683769.

@Kimundi
Copy link
Member

Kimundi commented Mar 31, 2014

Found a simpler way to accomplish the behavior from my last post:

Instead of adding implementations for

  • MutIterable returning ByRef<Self> for each individual iterator
  • Iterable for &T with T: RefIterable
  • Iterable for &mut T with T: MutIterable

I just added one implementation for

  • Iterable returning ByRef<T> for each &mut T with T: Iterator

Which would look like this in practice:

for_ng x in iterator { ... } // takes iterator by value, making either a copy or a move, and advances it
for_ng x in &mut iterator { ... } // advances and mutates the iterator through the reference
for_ng x in iterable { ... } // takes iterable by value, constructs a iterator, and advances it

Again, this can be found implemented in https://gist.github.com/Kimundi/9683769.

@erickt
Copy link
Author

erickt commented Mar 31, 2014

@Kimundi: That's a neat trick, but I worry that something like that would be surprising. Your solution of having a method on (&mut T).into_iter() that has different semantics than T.into_iter() touches on a bit on the Deref Conventions RFC #25. I'll bring this up over there.

@alexcrichton
Copy link
Member

Sorry I haven't gotten to this in awhile, but here are some thoughts of mine.

With the system proposed in #48, it looks like we're going to enable impl<T, I: Iterator<T>> Iterable<T, I> for I { .. } while still allowing implementations of Iterable for other containers. It seems like that would have a large impact on this RFC, do you think it should be re-worded relative to this ability?

It seems a little unfortunate to need all these new traits rather than just one Iterable trait. Reading the comments, it sounds like for foo in x.iter() { ... } will become for foo in x.refs() { ... }, but this seems like a bit of a step backwards (iter conveys "iteration" better than refs I think). Specifically, could you elaborate more why RefIterable, MutIterable, and PodIterable are required?

@aturon
Copy link
Member

aturon commented May 21, 2014

For the sake of discussion, specifically addressing @alexcrichton's question about multiple traits, here's one possible design using a single trait:

use std::vec::{Vec, MoveItems};
use std::slice::{Items, MutItems};

trait Iterable<T, I: Iterator<T>> {
    fn iter(self) -> I;
}

impl<T> Iterable<T, MoveItems<T>> for Vec<T> {
    fn iter(self) -> MoveItems<T> {
        self.move_iter()
    }
}

impl<'a, T> Iterable<&'a T, Items<'a, T>> for &'a Vec<T> {
    fn iter(self) -> Items<'a, T> {
        self.iter()
    }
}

impl<'a, T> Iterable<&'a mut T, MutItems<'a, T>> for &'a mut Vec<T> {
    fn iter(self) -> MutItems<'a, T> {
        self.mut_iter()
    }
}

And some example code showing it in action:

struct Foo {
    f: u8
}

fn iter_refs<'a, I: Iterator<&'a Foo>, C: Iterable<&'a Foo, I>>(container: C) {
    for foo in container.iter() {
        println!("{}", foo.f)
    }
}

fn iter_mut_refs<'a, I: Iterator<&'a mut Foo>, C: Iterable<&'a mut Foo, I>>(container: C) {
    for foo in container.iter() {
        foo.f += 1;
    }
}

fn iter_consume<'a, I: Iterator<Foo>, C: Iterable<Foo, I>>(container: C) {
    for foo in container.iter() {
        println!("Consuming {}", foo.f)
    }
}

fn main () {
    let mut v = Vec::new();
    v.push(Foo { f: 0 });
    v.push(Foo { f: 1 });

    iter_refs(&v);
    iter_mut_refs(&mut v);
    iter_refs(&v);
    iter_consume(v);
}

Notice that, at least as implemented here for vectors, the kind of iterator you get depends on how you refer to the vector:

  • as owned (Vec<T>), in which case you get a moving iterator;
  • by reference (&Vec<T>), in which case you get a by-ref iterator;
  • by mutable reference (&mut Vec<T>), in which case you get a by-mut-ref iterator.

One argument in favor of this design is: if my function wants to iterate over some elements of type T, I don't usually care how that iteration is happening, e.g., whether the provider of the iterator is being consumed in the process. I just want my items.

@aturon
Copy link
Member

aturon commented May 21, 2014

Another issue for discussion: what should the return type of default methods like map be? As proposed, these functions return iterators, just as they do for the Iterator trait.

But one might expect instead that if v is a vector, then v.map(...) would yield a new vector, and thus avoid having to deal with collect (including pesky type annotations). With HKT, we could provide such methods generically as default methods. (We need a higher-kinded version of FromIterator to do so).

I don't have a strong opinion here, but I wanted to raise the issue.

@Kimundi
Copy link
Member

Kimundi commented Jun 16, 2014

I think I tried the approach of different impls for T, &T and &mut T originally, and deemed them to be too confusing in the presence of autoref, but maybe it could work.

About map() and co returning a iterator: I think there are basically two options:

  1. Keep the RFC roughly as it is, and have them return a iterator, having better composability and performance in the complex case (For example foo.extend(bar.map(...)) cheaply works), but requiring an explicit collect() to get back to a actual collection.
  2. Modify the RFC to only have the iterator constructor on Iterable, keep all adapters on Iterator, and Introduce a HKT-based trait for Iterables that have a wrapper for each adapter to wrap something like foo.into_iter().map().collect::<Self>() as foo.map(...)

1.) has better composabilty, but is more complex in general about where methods are, while 2) Is simpler but has the problem that it doesn't compose. Or rather, it composes, but with hidden cost: foo.map().filter() would result in a temporary of type Self between the map and filter call.

@Kimundi
Copy link
Member

Kimundi commented Jul 1, 2014

@alexcrichton: Most of the design of the RFC originated in my prototype implementation further up in the thread, so I'll give a few basic justifications ;)

Three of the four traits basically just exist to support PodIterator:

  • Iterable - Being able to generically request a iterator at all.
  • RefIterable - uniform way to non-owningly get a iterator over references in a data structure. Could just be a inherent method on each data structure, but that would prevent...
  • PodIterable - extension method to allow creating a value iterator over copyable types, so that you can write my_vec.values().collect() instead of my_vec.refs().map(|x| *x).collect(), which is a common case if you deal with simple types like integers.
  • MutIterable - This is mostly just consistency to RefIterable existing, and could be removed from the equations without problem.

The names refs and mut_refs where mostly chosen to match the convention we developed on strings, like .chars(). Imo its a bit unclear that .iter() returns references just from the name, which refs() would express, given the convention that pluralized method names imply a iterator being returned.

That being said, if I where to actually implement it in the libraries, I'd reuse the existing names at first to make it easier, so I'd consider a rename of them a different RFC.
And that's mostly waiting for the trait coherency changes to happen, so that I can try a prototype without needing to add a custom syntax extension and a bazillion attributes to hack around it. ;)

@japaric
Copy link
Member

japaric commented Aug 7, 2014

This RFC should consider the new naming conventions for (methods that return) iterators.

@aturon aturon mentioned this pull request Sep 11, 2014
@aturon
Copy link
Member

aturon commented Sep 11, 2014

I've posted a collections reform RFC that encompasses some of this proposal (but also points out why we probably don't want to do the whole thing).

@aturon
Copy link
Member

aturon commented Feb 16, 2015

I'm going to close this now that IntoIterator has landed -- and covers some use cases. The full treatment here will likely want to use HKT, and in any case will need to be updated to match the current lang/library state.

Thanks @erickt

@aturon aturon closed this Feb 16, 2015
cramertj added a commit that referenced this pull request Mar 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants