-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: x/exp/xiter: new package with iterator adapters #61898
Comments
The duplication of each function is the first thing that catches the eye. Are there thoughts on why this is acceptable? |
What about an adapter that converts an |
Some typos: EqualFunc2, Map2, Merge2, and MergeFunc2 lack the 2 suffixes on their actual names. They're all correct in the corresponding documentation. |
May I humbly suggest that the name "iterutils" is less susceptible to, uh, unfortunate mispronunciation. |
For |
I'd actually prefer Edit: I just realized that if |
This proposal has been added to the active column of the proposals project |
The more I think about it, the more that I think that API design for this should wait until after a decision is made on #49085. Multiple other languages have proven over and over that a left-to-right chained syntax is vastly superior ergonomically to simple top-level functions for iterators. For example, compare nonNegative := xiter.Filter(
xiter.Map(
bufio.Lines(r),
parseLine,
),
func(v int) bool { return v >= 0 },
) vs. nonNegative := bufio.Lines(r).
Map(parseLine).
Filter(func(v int) bool { return v >= 0 }) Go's a little weird because of the need to put the lines := bufio.Lines(r)
intlines := xiter.Map(lines, parseLine)
nonNegative := xiter.Filter(func(v int) bool { return v >= 0 }) That works, but it clutters up the local namespace and it's significantly harder to edit. For example, if you decide you need to add a new step in the chain, you have to make sure that all of the variables for each iterator match up in the previous and succeeding calls. |
What type does |
You would probably have to wrap the base iterator like:
|
Sorry. I should have stuck a comment in. I was just coming up with some hypothetical function that would give an Not necessarily. The transformative and sink functions on iterators could just be defined as methods on |
I was wrong, it’s not an interface. |
Why do some functions take the names := xiter.Map(func (p Person) string {
return p.Name
}, people) // "people" gets lost
// vs
names := xiter.Map(people, func (p Person) string {
return p.Name
}) |
@DeedleFake There won't be a "decision" on #49085 anytime soon. There are good reasons not to do it yet, but we also don't want to say it never happens. The issue exists to reflect that state. What it comes down to is, would you rather have no iterators (for the foreseeable future) or ones which can't be "chained"? |
No iterators, definitely. I've done fine without them for over a decade. I can wait a bit longer. If a bad implementation goes in, I'll never get a good version. Plus, I can just write my own implementation of whatever iterator functions I need as long as |
Neither chaining nor functional programming has ever been a decisive or recommended technique in Go. Instead, iteration—specifically, procedural 'for' loops—has always been a core technique since the language's inception. The iterator proposals aim to enhance this core approach. While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment. |
I think "a bit" is misleading. We are talking years - if at all. And I don't believe the second part of that sentence is true either, we could always release a v2 of the relevant packages, if we ever manage to do #49085 in a decade or so. |
Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those? Why else is there a proposal here for
Edit: The way this proposal is phrased does actually imply that they may be heavily reevaluated enough in That issue has only been open for 2 years. I think assuming that it'll take a decade to solve is a bit unfair. Yes, a One of my favorite things about Go is how slow and methodical it (usually) is in introducing new features. I think that the fact that it took over a decade to add generics is a good thing, and I really wanted generics. One of the purposes of that approach is to try avoid having to fix it later. Adding those functions in the proposed manner will almost definitely necessitate that later fix, and I very much would like to avoid that if at all possible. |
Java Streams and .NET LINQ build on a standardized iterator system, but they are more than that. Both languages had a generic iterator system before. Iterators are useful without chaining or functional programming.
That would be this very proposal, and it comes with a caveat: "... or perhaps not. There are concerns about how these would affect idiomatic Go code. " This means that not everyone who has read these proposals in advance believes that this part is a good idea. |
Maybe chaining leads to too much of a good thing. It becomes more tempting to write long, hard-to-read chains of functions. You're less likely to do that if you have to nest calls. As an analogy, Go has |
Re #49085, generic methods either require (A) dynamic code generation or (B) terrible speed or (C) hiding those methods from dynamic interface checks or (D) not doing them at all. We have chosen option (D). The issue remains open like so many suggestions people have made, but I don't see a realistic path forward where we choose A, B, or C, nor do I see a fifth option. So it makes sense to assume generic methods are not going to happen and do our work accordingly. |
@DeedleFake The issue is not lack of understanding what a lack of parameterized methods means. It's just that, as @rsc said, wanting them doesn't make them feasible. The issue only being 2 years old is deceptive. The underlying problem is actually as old as Go and one of the main reasons we didn't have generics for most of that. Which you should consider, when you say
We got generics by committing to keep implementation strategies open, thus avoiding the generics dilemma. Not having parametric methods is a pretty direct consequence of that decision. |
Well, I tried. If that's the decision then that's the decision. I'm disappointed, but I guess I'll just be satisfied with what I do like about the current proposal, even if it has, in my opinion, some fairly major problems. Sorry for dragging this a bit off-topic there. |
Hope that it's not noise: I wondered if naming it the |
Those nonstandard Zip definitions look like they would occasionally be useful but I think I'd want the ordinary zip/zipLongest definitions most of the time. Those can be recovered from the proposed with some postprocessing but I'd hate to have to always do that. These should be considered along with Limit: LimitFunc - stop iterating after a predicate matches (often called TakeWhile in other languages) Skip, SkipFunc - drop the first n items (or until the predicate matches) before yielding (opposite of Limit/LimitFunc, often called drop/dropWhile) |
Can you explain the difference? Is it just that |
zip stops after the shorter sequence. zipLongest pads out the missing values of the shorter sequence with a specified value. The provided ones are more general and can be used to build those but I can't really think of any time I've used zip where I needed to know that. I've always either known the lengths were equal by construction so it didn't matter or couldn't do anything other than drop the excess so it didn't matter. Maybe that's peculiar to me and the situations in which I reach for zip, but they've been defined like that in every language I can think I've used which has to be some kind of indicator that I'm not alone in this. I'm not arguing for them to be replaced with the less general more common versions: I want those versions here too so I can use them directly without having to write a shim to the standard definition. |
@jimmyfrasche
Very good point, which I had missed. I'll have to think about this a bit more, it seems. |
My take (playground): func Cut[E any](s iter.Seq[E]) (head E, tail iter.Seq[E], ok bool) {
for v := range s {
head, ok = v, true
break
}
tail = func(yield func(E) bool) {
if !ok {
return
}
first := true
for v := range s {
if first {
first = false
continue
}
if !yield(v) {
return
}
}
}
return head, tail, ok
} Though, for the record, I'm against including something like this, for reasons already mentioned by others. |
@DeedleFake @Merovius After thinking about this a bit more, I have to agree: I can't think of a way for functions like |
Every case I've had for Head has been to simplify pattern I kept coming across in higher order iterators: first, once := true, false
for v := range seq {
if first {
first = false
// prime the pump
} else {
once = true
// actual loop code
}
}
if first && !once {
// special case for one value seq
} In terms of this thread, though, I only mentioned Head as another thing that could be implemented with Push. |
The more I ponder the design of a library that would complement For example, the program below never terminates: package main
import (
"fmt"
"iter"
)
func main() {
fmt.Println(Count(someFunction()))
}
func Count[E any](seq iter.Seq[E]) int {
var n int
for range seq {
n++
}
return n
}
func someFunction() iter.Seq[string] {
return Repeat("foo")
}
func Repeat[E any](e E) iter.Seq[E] {
return func(yield func(E) bool) {
for yield(e) {
// deliberately empty body
}
}
} IMO, sink functions that may not terminate are simply too easy to misuse. As such, they're better left out. Or alternatives guaranteed to terminate like that suggested in #61898 (comment) should be provided instead. |
@jub0bs I'll note that you don't know a priori whether an |
@Merovius Good point, but I suspect that infinite iterators will be more common than func Iterate[E any](e E, f func(E) E) iter.Seq[E] {
return func(yield func(E) bool) {
for yield(e) {
e = f(e)
}
}
} |
Or, if such sinks are included in |
I'm not sure I see the danger with Repeat or Iterate; both seem like elegant solutions to some problems. As @Merovius said, just as with io.Readers, some iterators are finite, some infinite. Just because the Sum or Count of an infinite sequence is bottom doesn't mean there is a problem with Sum or Count, or with your infinite sequence. |
@adonovan I do like all those functions. I just think that people not coming from an FP background should be warned in the documentation that some sinks expect a finite iterator. |
Another function proposal: func Drain[T any](seq iter.Seq[T]) (last T, ok bool) {
for v := range seq {
last = v
ok = true
}
return last, ok
}
This may have been proposed further up, and if so sorry for repeating it. I scanned through but didn't see anything obvious. |
Those functions would leak a pull-based iterator in some cases; and even if the bug in question were fixed, those functions couldn't work with single-use iterators; see golang/go#61898 (comment) and follow-up comments.
In my opinion, a Push converter that pairs with iter.Pull would be useful. Push would allow values to be pushed gradually to functions that take an iterator as an argument. func Push[V any](recv func(iter.Seq[V])) (push func(V) bool) {
var in V
coro := func(yieldCoro func(struct{}) bool) {
seq := func(yieldSeq func(V) bool) {
for yieldSeq(in) {
yieldCoro(struct{}{})
}
}
recv(seq)
}
next, stop := iter.Pull(coro)
return func(v V) bool {
in = v
_, more := next()
if !more {
stop()
return false
}
return true
}
} func main() {
sum := func(src iter.Seq[int]) {
sum := 0
for v := range src {
sum += v
fmt.Printf("- sum: %d\n", sum)
}
}
pushSum := Push(sum)
for {
var n int
fmt.Scanln(&n)
if !pushSum(n) {
break
}
}
}
This is necessary for handling endless data streams, such as real-time metrics or event sequences, as iterators. Additionally, while iter.Pull allowed for integrating multiple iterators like Zip and Merge, conversely, distributing from a single iterator to multiple iterators is difficult without Push. func main() {
fizz := func(src iter.Seq[int]) {
for v := range src {
if v%3 == 0 {
fmt.Println("- fizz")
}
}
}
pushFizz := Push(fizz)
buzz := func(src iter.Seq[int]) {
for v := range src {
if v%5 == 0 {
fmt.Println("- buzz")
}
}
}
pushBuzz := Push(buzz)
sum := func(src iter.Seq[int]) {
sum := 0
for v := range src {
sum += v
fmt.Printf("- sum: %d\n", sum)
if !pushFizz(sum) || !pushBuzz(sum) {
return
}
}
}
pushSum := Push(sum)
for {
var n int
fmt.Scanln(&n)
if !pushSum(n) {
break
}
}
}
|
This will not work. |
That depends how you define it: it correctly returns the last element, which means it must have iterated over the whole sequence. If the Seq is a one-shot, then it will have been drained. "Last" seems like a clearer name for this operation because that's what it guarantees; "Drain" only makes sense for one-shot sequences (which, one hopes, are far from the norm). In any case I'm not convinced that this is something we should put in the library. Users might easily think that Last on a Seq whose representation supports random access, such as a slice, would be O(1), but in fact it is not, and indeed cannot be, efficient, because there is no way to ask a Seq if it supports a narrower random-access interface. |
I was experimenting with this API and one thing I ran into was that there is not a way to turn The use case I ran into was building up a func Split[In, KOut, VOut any](f func(In) (KOut, VOut), seq iter.Seq[In]) iter.Seq2[KOut, VOut] {
return func(yield func(KOut, VOut) bool) {
for in := range seq {
if !yield(f(in)) {
return
}
}
}
}
func Combine[KIn, VIn, Out any](f func(KIn, VIn) Out, seq iter.Seq2[KIn, VIn]) iter.Seq[Out] {
return func(yield func(KIn, VIn) bool) {
for k, v := range seq {
if !yield(f(k, v)) {
return
}
}
}
} That would allow things such as: type data struct {
id string
}
func foo(datas []*data) map[string]*data {
return maps.Collect(xiter.Split(func (d *data) {
return d.id, d
}, slices.Values(datas)))
} Edit: I see now that this has been discussed at length, sorry for the noise. |
@DeedleFake I doubt this is ever going to happen tho, since inferring the number arguments is now dependent of what is before the call. It hurts readability for those unfamiliar with languages like Elm, Haskell and so on. |
#70140 erroneously duplicates Merge but adds Intersect, Union and Subtract, which are only meaningful on sorted sequences. It's unclear whether they should be in a separate package. |
This is breaking change PR: - Reorder `func` in xiter package, so that it's always first. That helps with several transformations in one place. See [this](golang/go#61898 (comment)). - Add `Single` and `Single2` iterators. - Add `Find` and `Find2` iterators. - Rename `Fold` to `Reduce`. - Other small changes. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
We propose to add a new package golang.org/x/exp/xiter that defines adapters on iterators. Perhaps these would one day be moved to the iter package or perhaps not. There are concerns about how these would affect idiomatic Go code. It seems worth defining them in x/exp to help that discussion along, and then we can decide whether they move anywhere else when we have more experience with them.
The package is called xiter to avoid a collision with the standard library iter (see proposal #61897). An alternative would be to have xiter define wrappers and type aliases for all the functions and types in the standard iter package, but the type aliases would depend on #46477, which is not yet implemented.
This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted. See #61897 for a list of related proposals.
Edit, 2024-05-15: Added some missing 2s in function names, and also changed Reduce to take the function first, instead of between sum and seq.
Edit, 2024-07-17: Updated code to match the final Go 1.23 language change. Corrected various typos.
/*
Package xiter implements basic adapters for composing iterator sequences:
*/
The text was updated successfully, but these errors were encountered: