Skip to content

proposal: iter: add Push function #72083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ncruces opened this issue Mar 3, 2025 · 14 comments
Closed

proposal: iter: add Push function #72083

ncruces opened this issue Mar 3, 2025 · 14 comments
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal
Milestone

Comments

@ncruces
Copy link
Contributor

ncruces commented Mar 3, 2025

Proposal Details

This proposal stems from a thread in golang-nuts.

The idea is to add a Push function to package iter (and likely a similar Push2):

package iter

// Push takes a consumer function, and returns a yield and a stop function.
// It arranges for the consumer to be called with a Seq iterator.
// The iterator will return all the values passed to the yield function.
// The iterator will stop when the stop function is called.
func Push[V any](consumer func(seq iter.Seq[V])) (yield func(V) bool, stop func())

This is similar in concept to the Pull and Pull2 functions, except that yield and stop are used to push values into seq.

It is based on a similar proposal by @ianlancetaylor in the thread, which unfortunately would not hide goroutine creation/management (a goal of mine).

Their worry with my counter proposal (this one) is, quoting:

My concern with your alternative is that it may be fragile. It should be possible for the function to pass the iterator to another goroutine, but it's not clear to me what will happen to the coroutines in that case. With for/range the language inherently constrains what can happen with the coroutines - and the code is still really complicated with all kinds of special cases around panics and passing the yield function around.

However, ISTM that it is possible with my proposal (although potentially racy) to pass the iterator to another goroutine, you just need to ensure this goroutine terminates before consumer (perhaps with a sync.WaitGroup). The potential races don't seem worse than those caused by passing next/stop from Pull to other goroutines.

Purpose

The purpose of this function is basically the same as Pull: adapt/plug existing iteration APIs to the "standard" iter.Seq interface.

It came up when trying to use a function with this signature: func processor(seq iter.Seq[float64]) float64 (a function that receives a sequence of floats, and outputs a single float, e.g. the sum/average/etc) to implement this interface (which allows you to implement custom aggregate functions in SQLite:

type AggregateFunction interface {
	// Step is invoked to add a row to the current window.
	// The function arguments, if any, corresponding to the row being added, are passed to Step.
	// Implementations must not retain arg.
	Step(ctx Context, arg ...Value)

	// Value is invoked to return the current (or final) value of the aggregate.
	Value(ctx Context)
}

In the simple case SQLite calls the AggregateFunction.Step once for each row (arg are the columns of the row, a single column is arg[0].Float()), and then Value once to get the aggregate result (you call ctx.ResultFloat(x) to return the result).

It turns out that it's impossible to use a processor function like the above to implement the interface, without:

  • collection all values (potentially millions) in a slice
  • using goroutines and channels to simulate concurrency without parallelism

Proposed semantics and implementation based on channels

func Push[V any](consumer func(seq iter.Seq[V])) (yield func(V) bool, stop func()) {
	var (
		v          V
		done       bool
		panicValue any
		seqDone    bool // to detect Goexit
		swtch      = make(chan struct{})
	)

	go func() {
		// Recover and propagate panics from consumer.
		defer func() {
			if p := recover(); p != nil {
				panicValue = p
			} else if !seqDone {
				panicValue = goexitPanicValue
			}
			done = true
			close(swtch)
		}()

		<-swtch
		consumer(func(yield func(V) bool) {
			for !done {
				if !yield(v) {
					break
				}
				swtch <- struct{}{}
				<-swtch
			}
		})
		seqDone = true
	}()

	yield = func(v1 V) bool {
		v = v1
		// Yield the next value.
		// Panics if stop has been called.
		swtch <- struct{}{}
		<-swtch

		// Propagate panics and goexits from consumer.
		if panicValue != nil {
			if panicValue == goexitPanicValue {
				// Propagate runtime.Goexit from consumer.
				runtime.Goexit()
			} else {
				panic(panicValue)
			}
		}
		return !done
	}

	stop = func() {
		done = true
		// Finish the iteration.
		// Panics if stop has been called.
		swtch <- struct{}{}
		<-swtch

		// Propagate panics and goexits from consumer.
		if panicValue != nil {
			if panicValue == goexitPanicValue {
				// Propagate runtime.Goexit from consumer.
				runtime.Goexit()
			} else {
				panic(panicValue)
			}
		}
	}

	return yield, stop
}

There's a race (and potential panic: send on closed channel) between the close(swtch) and the swtch <- struct{}{} in consumer if consumer starts a goroutine and the seq outlives consumer. Otherwise, there is no parallelism, only concurrency.

The implementation takes liberally from iter.Pull and can translate to runtime.newcoro/coroswitch by replacing the swtch channel. In that case, the panic becomes fatal error: coroswitch on exited coro.

@gopherbot gopherbot added this to the Proposal milestone Mar 3, 2025
@gabyhelp
Copy link

gabyhelp commented Mar 3, 2025

Related Issues

Related Discussions

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@gabyhelp gabyhelp added the LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool label Mar 3, 2025
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Mar 4, 2025
@ianlancetaylor
Copy link
Member

CC @golang/runtime

@aditya270520

This comment has been minimized.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/654516 mentions this issue: iter: add Push function to create a Seq from pushed values

@shaj13
Copy link

shaj13 commented Mar 5, 2025

+1

Additionally, Push2 needs to be added to support value pairs.

Push/2 will also enable fan-out to multiple iter.Seq/2 instances.

func fanout(i iter.Seq2[any, error]) {
	y, stop := Push(func(seq iter.Seq[any]) {
		for res := range seq {
			fmt.Println("exp1", res)
		}
	})

	defer stop()

	y2, stop := Push(func(seq iter.Seq[any]) {
		for res := range seq {
			fmt.Println("exp2", res)
		}
	})

	defer stop()

	for v, _ := range i {
		y(v)
		y2(v)
	}

	return nil
}

@eihigh
Copy link

eihigh commented Mar 6, 2025

+1. I also had the same idea. #61898 (comment)
Push can be implemented based on iter.Pull (without chan) so I commented at the xiter proposal.

func Push[V any](recv func(iter.Seq[V])) (push func(V) bool) {
	var in V

	coro := func(yieldCoro func(struct{}) bool) {
		seq := func(yieldSeq func(V) bool) {
			for yieldSeq(in) {
				yieldCoro(struct{}{})
			}
		}
		recv(seq)
	}

	next, stop := iter.Pull(coro)
	return func(v V) bool {
		in = v
		_, more := next()
		if !more {
			stop()
			return false
		}
		return true
	}
}

There is an interactive demo. https://github.com/eihigh/morse-demo

fanout demo: https://go.dev/play/p/r2Wp7DUgAVy

@ncruces
Copy link
Contributor Author

ncruces commented Mar 6, 2025

OK, so I was wrong in golang-nuts and this can be implemented with iter.Pull. @eihigh's solution has a tiny bug (we need a break after yieldCoro), but it pointed me in the right direction.

To me that reduces the need for iter.Push, beyond the fact that this was really hard to figure out: iter.Pull is powerful enough.

If we do still want iter.Push, with my proposed or another API, here's an implementation based on iter.Pull:

func Push[V any](consumer func(seq iter.Seq[V])) (yield func(V) bool, stop func()) {
	var in V

	coro := func(yieldCoro func(struct{}) bool) {
		seq := func(yieldSeq func(V) bool) {
			for yieldSeq(in) {
				if !yieldCoro(struct{}{}) {
					break
				}
			}
		}
		consumer(seq)
	}

	next, stop := iter.Pull(coro)

	yield = func(v V) bool {
		in = v
		_, more := next()
		if !more {
			stop()
		}
		return more
	}

	return yield, stop
}

I do think it's valuable to return the stop function too.

@earthboundkid
Copy link
Contributor

I put that into a gist because I know I'm going to want it eventually and won't want to figure it out again. 😆

@jub0bs
Copy link

jub0bs commented Mar 7, 2025

Given

func Pull[V any](seq Seq[V]) (next func() (V, bool), stop func())

I would expect a function named Push exported by iter to have the following signature and semantics:

// Push converts the “pull-style” iterator
// accessed by the two functions next and stop
// into a “push-style” iterator sequence.
// Push essentially is the inverse of [iter.Pull].
// Note that you must consume the resulting iterator;
// otherwise, the underlying pull-based iterator may leak.
func Push[V any](next func() (V, bool), stop func()) iter.Seq[V]

See https://pkg.go.dev/github.com/jub0bs/iterutil#Push.

@ncruces
Copy link
Contributor Author

ncruces commented Mar 7, 2025

Do you have a practical purpose for a function with that signature? Your test/example simply consumes the output of Pull and puts it back into a Seq?

For example, can you implement fan-out, @eihigh and @shaj13 demonstrated?

I'm perfectly happy to withdraw my proposal, as Pull is clearly sufficient to express my needs. But I don't see what problem this new proposal is solving.

@jub0bs
Copy link

jub0bs commented Mar 8, 2025

@ncruces True, my Push function merely turns a pull-based iterator into a push-based one, and the example given in the doc doesn't really sell it.

What I'm questioning about your function is its the name rather than its expressive power. Given the prominence of the "push"/"pull" parlance in documentation about Go iterators, I would expect a function named "Push" to be the inverse of iter.Pull.

@ncruces
Copy link
Contributor Author

ncruces commented Mar 8, 2025

There's no agreement here, and iter.Pull clearly does everything that I needed (in terms of creating and safely exposing) optimized coroutines (avoiding unnecessary parallelism and data races).

Beyond iter.Pull, the real magic, IMO, the bit that I found really hard to figure out, is this bit of @eihigh's code:

coro := func(yieldCoro func(struct{}) bool) {
	seq := func(yieldSeq func(V) bool) {
		for yieldSeq(in) {
			if !yieldCoro(struct{}{}) {
				break
			}
		}
	}
	consumer(seq)
}

I'm not sure what's the best way to wrap it into a useful API, what name to give it, and how best to explain it.

Beyond the fact that both I and @eihigh independently invented the same API, with the same name, months apart (although I only managed to implement it through channels and goroutines), and apparently both used it to solve real world problems, I have nothing to recommend this API and name over others.

So I'll personally withdraw the proposal, though the Go team is obviously free to reopen it. Maybe the real solution is to first start by introducing a version of this in the xiter package. Consider this a strong upvote to @eihigh's submission there.

@ncruces ncruces closed this as completed Mar 8, 2025
@dmitshur dmitshur closed this as not planned Won't fix, can't repro, duplicate, stale Mar 8, 2025
@DeedleFake
Copy link

DeedleFake commented Mar 10, 2025

@ncruces, I just saw this. I actually have a function that can handle this scenario in my personal xiter package, especially after some updates to it that I just did after trying to implement this using it. I've put together a playground with an implementation of a simplified interface similar to AggregateFunction to show how it works. The function was originally written for the heck of it with the idea of exploiting closures to allow for a full coroutine-like interface implemented via iter.Pull.

@ncruces
Copy link
Contributor Author

ncruces commented Mar 11, 2025

Thanks @DeedleFake. For now I "inlined" everything around the "kernel" @eihigh came up with, just because for a given aggregate this can be called millions of times, so every bit counts (the channel based solution I had found was terrible at this).

But I agree that a nicer generalized API for coroutines (like yours? I'll definitely study it further) would probably be useful. I'm not sure iter.Pull is it (though it's great to use, as it's the only public interface to runtime.newcoro). I definitly found it challenging to coerce iter.Pull to do what I needed. At first glance, your implementation does look nicer and more straight forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal
Projects
None yet
Development

No branches or pull requests