-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iter: new package for iterators #61897
Comments
I think that's supposed to say |
In the "Pulling Values" example, I think |
Isn't this kind of API more preferable? Consider a case when you need to pass the next and stop functions separately to an function, struct, it might be annoying. type PullIter[T any] struct{}
func (*PullIter[T]) Stop()
func (*PullIter[T]) Next() (T, bool)
func Pull[V any](seq Seq[V]) PullIter[V] |
Based on:
the range in the |
I believe // Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
...
next, stop := iter.Pull(it)
...
} should be // Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
...
next, stop := iter.Pull(seq)
...
} |
Meta: Most of the first post is about iterators in general. It's not obviously clear on a first reading what is actually going into the iter package. Also the formatting is a little rough. |
Given that
one way to avoid having |
I know having a separate function/method to call to get the error can lead to people forgetting to do so but I really don't see how It's easy to ignore with It's not easy to not ignore without being awkward. Neither var k Key
var err error
for k, err = range f {
if err != nil {
break
}
proc(k)
}
if err != nil {
handle(err)
} nor for k, err := range f {
if err != nil {
handle(err)
break
}
proc(k)
} look especially readable to me. |
@jimmyfrasche |
I think the other way around, you should have a Seq2 → Seq[Pair[T1, T2]] mapping and make Seq the default most places. Edit: I guess you should have both mappings 1 → 2 and 2 → 1, but I also think 1 will end up being the default for things like Filter and TakeWhile etc. |
I think in most cases pull iters will be used in place, not passed around, so a pair of funcs is better than an object with methods. |
This proposal has been added to the active column of the proposals project |
xiter isn't addressing error-flavored 2-ary forms. There has also been a significant amount of prior discussion about e.g. If I thought an |
I think that method name for iterating through sequence shouldn't be |
@djordje200179 IMO |
Although the proposal is attractive, but to be honest, the Introduce tuple type for What do you think about this potential problem, @rsc. |
This comment was marked as resolved.
This comment was marked as resolved.
Re: errors, the proposal says:
In practice, the .Err() pattern has lead to a lot of bugs where .Err() is omitted. The most glaring example is at https://github.com/features/copilot/ (click on write_sql.go and note the lack of .Err()). I think the existing .Err() iterators in the std library should probably just stick around because they already exist, but we want a new pattern moving forward. Re: SQL, see #61637. |
Cosmetic suggestion: I like |
I'd like to comment on one effect of For example, I am currently experimenting with an iterator-focused API for bezier paths and transformations on them. Consider the current API:
which leads to usage like
or
An alternative API I've considered would use a named iterator type and methods, as follows:
for the following usage:
But a significant downside of this approach is that it requires explicit conversion from I'm not sure, however, if this pattern is a good idea to begin with. It works well for this specific use-case, but would fall apart the moment we'd need generic methods, e.g. for a |
I tend to agree that |
That would not be a compatible change: https://go.dev/play/p/x9MR1A1suez I too am curious as to the pros and cons of Seq being named, not merely an alias. I suppose it means we can add dozens of convenience methods to it later, but I'm not sure whether that's a pro or a con. ;-) |
@adonovan Sorry but your example seems to demonstrate the wrong thing? It seems to demonstrate that it's breaking moving from an alias to a named type, but not vice-versa, no? |
I've tried to explain why some people are "angry" over this design being too complicated in an article, but I think the short of it is that it feels like it goes against the apparent philosophy of Go that many people believe, coupled with it being a very functional way of doing things rather than imperative. And because of those two reasons, I think that is why people don't like the iterator stuff, even if I completely understand the design choices made. It doesn't "feel" like what Go original was to many people. If it was me, I would just not have allowed custom iterators into Go whatsoever, but I am not on the Go team (nor do I want to be). |
@gingerBill If you want to grasp at the value of push vs. pull iterators you should try to rewrite something like this as a pull iterator. I don't see the range-over-func statement as leaning functional, in the imperative-vs-functional paradigm (unlike the issue we are commenting on, which is not about range-over-func and is, in fact, about functional programming). If anything it enables more imperative style programming, if you wanted to call functions you'd call functions, the syntactic sugar is there to let you call break, continue and early-return, which are hardly functional programming constructs. |
My goal of asking this issue to be re-opened for comments wasn't to re-start the discussion on the merits of adding range-over-func. It was to be able to discuss the actual design issues as related to this package, like the above mentioned question about aliases. Please respect that, before the issue has to be locked again. |
Ah, I thought you were proposing to land an emergency change from named to alias before the imminent go1.23 release so that the change be reverted later if desired. My example was evidence that that later change would be breaking. But it's a breaking change either way. For example, |
The Go 1 guarantee isn't a legally binding contract, and e.g. the text/template/parse has broken it. Could we just caveat the 1.23 release notes with a big asterisk saying we reserve the right to switch to an alias in 1.24? Generics came with an asterisk, although the asterisk was never used. |
The interesting direction for backwards compatibility is to think about is Type assertions like Type switches additional could stop compiling if someone tried to do both:
These risks are not 0. They do not seem enormous either. I don't know why one would mix We could write a vet check to discourage usage that might not be backwards compatible in 1.23, but I suspect it would not help all that often or beyond 1.24. (The vet check might require an asterisk in the 1.23 spec to be justified too.) A not very ergonomic solution is to ship iter without Seq and Seq2 and to expand the types until type parameterized aliases are available. It solves backwards compatibility, but would probably make this package much harder to learn and more annoying to use. |
@timothy-king 1. Note that you don't have to do both in a type-switch for the breakage to happen: func F(x any) {
if x.(func(func(int) bool)) {
panic("b0rk")
}
}
func main() {
F(maps.Keys(map[int]int{}))
}
FWIW I'm not convinced this really need a fix. I don't, ultimately, see more reason to make So I don't really see any real argument to even make |
The advantage of making iter.Seq/2 an alias is pretty clear: if package A defines |
@earthboundkid My point is that nothing of what you say seems specific to And I'll note that I did provide a specific reason for why methods on
I'll note that 1. this argument cuts both ways: Making it an alias will prevent us from adding methods to
It is, in my opinion, okay that if you want to define a different type to give it other methods, you need to convert back-and-forth. |
There's a flaw in the What if by writing Like (Go Playground): source := slices.Values([]int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9})
for v := range xiter.Limit(source, 2) {
fmt.Println(v) // 0 1
}
i := 1
for range source {
if i == 3 {
break
}
i++
}
for v := range xiter.Limit(source, 4) {
fmt.Println(v) // expected: 5 6 7 8, got: 0 1 2 3
} Unlike iterators in almost every other programming language that has them, which are modeled as state machines, the "iterators" in this implementation with type signature In other words, if you assign a "go iterator" to a variable, each time you consume it, you're facing a brand new "iterator". Which makes the following logic hard to write: let source = &mut (0..);
let mut target: Vec<_> = source.skip(2).take(3).collect();
target.extend(source.skip(4).take(5));
println!("{:?}", target); // [2, 3, 4, 9, 10, 11, 12, 13] Even if we neglect the problem mentioned above, I think this syntactic sugar: insSeq := gen {
for i := 0; i < 10; i++ {
yield i
}
} which would be desugared into: insSeq := func (yield func(int) bool) {
for i := 0; i < 10; i++ {
if !yield(i) {
return
}
}
} could come in handy. |
One way to look at this is that |
@Sunshine40 To re-state what has been said a bunch of times now: This issue is not about re-visiting the design of iterating over functions. But FWIW the concern that some iterators are re-startable while others are not has been mentioned in the discussion. You can either use func Keep[T any](it iter.Seq[T]) iter.Seq[T] {
ch := make(chan T)
done := make(chan struct{})
var once sync.Once
go func() {
defer close(ch)
for v := range it {
select {
case ch <- v:
case <-done:
return
}
}
}()
return func(yield func(T) bool) {
defer once.Do(func() { close(done) })
for v := range ch {
if !yield(v) {
return
}
}
}
} |
(Also, this issue has been closed for quite a while, which makes it even less appropriate that off-topic comments keep being posted here) |
Where should I post it then? |
Can you give an example? From what I observed, the Also, where can I find the documentation/tutorial for existing iterator adaptors in Go? Edit:I modified func Keep[T any](it iter.Seq[T]) iter.Seq[T] {
ch := make(chan T)
go func() {
defer close(ch)
for v := range it {
ch <- v
}
}()
return func(yield func(T) bool) {
for v := range ch {
if !yield(v) {
return
}
}
}
} The key is to remove the cleaning up mechanics and actually keep the blocked goroutine around. Forever. If we don't exhaust it. |
@Sunshine40 The issue where the language feature was discussed was #61405, which also is closed, as the feature has been accepted and implemented. If you believe there is a bug with the existing implementation, the right place would be to file a bug. If you want to propose a different language change, it would be to file a language change proposal. You can do either from here. Otherwise, if you just want to talk about it, you can do that on golang-nuts, reddit, slack, your personal website or social network of choice. You are correct that my |
Hi @Sunshine40, it's worth recalling this is still an in-flight change, and all the user-facing documentation is not yet in place. Some existing WIP documentation is already published in the iter package documentation and the WIP language spec. (Based on your comments, you might be interested for example in the "Single-Use Iterators" section, which discusses the conventional behavior for restarting iterators in Go vs. single-use iterators). More doc will be written prior to the release, but if you see a specific problem in the WIP documentation, doc issues are always welcome. Also, FWIW, I am about to open a new issue with some more specific suggestions for enhancing the general iteration documentation, as a follow-up to #61897 (comment) and #61897 (comment) above. (I will CC you in case you are interested in commenting there). Separately, I think it is worth pointing out that your edited version of Keep is likely problematic and likely overkill. You might be interested in the sections on "Push functions", "Pull functions" and "Duality of push and pull functions" from the older 2022 pre-proposal discussion, which includes this for converting a pull function into a push function: func push(next func() (V, bool)) func(func(V)bool) {
return func(yield func(V) bool) {
for {
v, ok := next()
if !ok || !yield(v) {
break
}
}
}
} I updated your example to use that |
Yeah there might be extra cost in the goroutine + channel version, but I don't think the difference is that great.
The "cleanup issues" are not specific to a goroutine + channel style implementation - it's that you must expose a "cleanup handle" to be deferred manually. I mentioned this as a shortcoming of Go's push iterators because, in other languages, no one needs to manually clean up iterators since GC/RAII handles this problem. Not even in C++/Rust which are considered lower level system PL. As for enhancing the wrapper, here's a revised version: type Iter[V any] struct {
next func() (V, bool)
stop func()
gen func(yield func(V) bool)
}
type PullIterator[V any] interface {
Next() (V, bool)
}
func (it *Iter[V]) Next() (V, bool) {
return it.next()
}
type PushIterator[V any] interface {
Push(yield func(V) bool)
}
func (it *Iter[V]) Push(yield func(V) bool) {
it.gen(yield)
}
func (it *Iter[V]) Close() error {
it.stop()
return nil
}
type StatefulIterator[V any] interface {
PullIterator[V]
PushIterator[V]
io.Closer
}
func Stateful[V any](seq iter.Seq[V]) StatefulIterator[V] {
next, stop := iter.Pull(seq)
it := Iter[V]{
next: next,
stop: stop,
gen: func(yield func(V) bool) {
for {
v, ok := next()
if !ok || !yield(v) {
return
}
}
},
}
runtime.SetFinalizer(&it, (*Iter[V]).Close)
return &it
} With You could still |
Using iterators with finalizers and a |
Curious if beloved fellow gophers here had considered the elegance of Odin iterators but with a new go keyword,
Inspired from https://www.gingerbill.org/article/2024/06/17/go-iterator-design/ |
That's the same as #54245, which was rejected for good reasons: writing a "pull iterator" is almost always harder than writing a "push iterator" and leaves you with a problem of how to deal with clean up code. Having a push iterator-based design has the tradeoff that you have a problem of how to get pull iterators for the cases where it's more convenient to consume a few values at a time, but the iter package solves that with iter.Pull, which converts a push iterator in a pull iterator. I appreciate that the closure based design of 1.23 iterators is a little noodle baking at first, but this was all debated and decided a long time ago. |
We propose to add a new package
iter
that defines helpful types for iterating over sequences. We expect these types will be used by other APIs to signal that they return iterable functions.This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted.
See also:
Note regarding push vs pull iterator types: The vast majority of the time, push iterators are more convenient to implement and to use, because setup and teardown can be done around the yield calls rather than having to implement those as separate operations and then expose them to the caller. Direct use (including with a range loop) of the push iterator requires giving up storing any data in control flow, so individual clients may occasionally want a pull iterator instead. Any such code can trivially call Pull and defer stop.
I’m unaware of any significant evidence in favor of a parallel set of pull-based APIs: instead, iterators can be defined in push form and preserved in that form by any general combination functions and then only converted to pull form as needed at call sites, once all adapters and other transformations have been applied. This avoids the need for any APIs that make cleanup (stop functions) explicit, other than Pull. Adapters that need to convert to pull form inside an iterator function can defer stop and hide that conversion from callers. See the implementation of the adapters in #TODO for examples.
Note regarding standard library: There a few important reasons to convert the standard library:
Note that os.ReadDir and filepath.Glob do not get iterator treatment, since the sorted results imply they must collect the full slice before returning any elements of the sequence. filepath.SplitList could add an iterator form, but PATH lists are short enough that it doesn’t seem worth adding new API. A few other packages, like bufio, archive/tar, and database/sql might benefit from iterators as well, but they are not used as much, so they seem okay to leave out from the first round of changes.
The iter package would be:
/*
Package iter provides basic definitions and operations related to
iterators over sequences.
Iterators
An iterator is a function that passes successive elements of a
sequence to a callback function, conventionally named yield, stopping
either when the sequence is finished or when yield breaks the sequence
by returning false. This package defines [Seq] and [Seq2]
(pronounced like seek - the first syllable of sequence)
as shorthands for iterators that pass 1 or 2 values per sequence element
to yield:
Seq2 represents a sequence of paired values, conventionally key-value,
index-value, or value-error pairs.
Yield returns true when the iterator should continue with the next
element in the sequence, false if it should stop. The iterator returns
true if it finished the sequence, false if it stopped early at yield's
request. The iterator function's result is used when composing
iterators, such as in [Concat]:
Iterator functions are most often called by a range loop, as in:
Naming
Iterator functions and methods are named for the sequence being walked:
The iterator method on a collection type is conventionally named All,
as in the second example, because it iterates a sequence of all the
values in the collection.
When there are multiple possible iteration orders, the method name may
indicate that order:
If an iterator requires additional configuration, the constructor function
can take additional configuration arguments:
Single-Use Iterators
Most iterators provide the ability to walk an entire sequence:
when called, the iterator does any setup necessary to start the
sequence, then calls yield on successive elements of the sequence,
and then cleans up before returning. Calling the iterator again
walks the sequence again.
Some iterators break that convention, providing the ability to walk a
sequence only once. These “single-use iterators” typically report values
from a data stream that cannot be rewound to start over.
Calling the iterator again after stopping early may continue the
stream, but calling it again after the sequence is finished will yield
no values at all, immediately returning true. Doc comments for
functions or methods that return single-use iterators should document
this fact:
Errors
If iteration can fail, it is conventional to iterate value, error pairs:
Pulling Values
Functions and methods that are iterators or accept or return iterators
should use the standard yield-based function signature, to ensure
compatibility with range loops and with other iterator adapters.
The standard iterators can be thought of as “push iterator”, which
push values to the yield function.
Sometimes a range loop is not the most natural way to consume values
of the sequence. In this case, [Pull] converts a standard push iterator
to a “pull iterator”, which can be called to pull one value at a time
from the sequence. [Pull] starts an iterator and returns a pair
of functions next and stop, which return the next value from the iterator
and stop it, respectively.
For example:
Clients must call stop if they do not read the sequence to completion,
so that the iterator function can be allowed to finish. As shown in
the example, the conventional way to ensure this is to use defer.
Other Packages
Many packages in the standard library provide iterator-based APIs. Here are some notable examples.
Mutation
Iterators only provide the values of the sequence, not any direct way
to modify it. If an iterator wishes to provide a mechanism for modifying
a sequence during iteration, the usual method is to define a position type
with the extra operations and then provide an iterator over positions.
For example, a tree implementation might provide:
And then a client could delete boring values from the tree using:
*/
package iter
Note: If and when generic type aliases are implemented (#46477), we might also want to add
type Yield[V any] = func(V bool)
andtype Yield2[K, V any] = func(K, V) bool
. That way, code writing a function signature to implement Seq or Seq2 can write the argument asyield iter.Yield[V]
.The text was updated successfully, but these errors were encountered: