Skip to content

iter: iter.Pull should forward panics and runtime.Goexit [freeze exception] #67712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hajimehoshi opened this issue May 30, 2024 · 25 comments
Closed
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@hajimehoshi
Copy link
Member

Go version

devel go1.23-a3a584e4ab Wed May 29 13:52:34 2024 +0000

Output of go env in your module/workspace:

n/a since this was tested on the playground (gotip)

What did you do?

Run this on the playground

package main

import (
	"fmt"
	"iter"
	"runtime"
)

func main() {
	fmt.Println(runtime.Version())
	defer func() {
		if err := recover(); err != nil {
			fmt.Println(err)
		}
	}()
	next, _ := iter.Pull[int](func(yield func(int) bool) {
		yield(1)
		panic("!")
	})
	fmt.Println(next())
	fmt.Println(next())
}

https://go.dev/play/p/_BjecTfmPE8?v=gotip

What did you see happen?

The panic is not caught at recover()

devel go1.23-a3a584e4ab Wed May 29 13:52:34 2024 +0000
1 true
panic: !

goroutine 4 [running]:
main.main.func2(0x0?)
	/tmp/sandbox4071758638/prog.go:18 +0x2e
iter.Pull[...].func1()
	/usr/local/go-faketime/src/iter/iter.go:75 +0x10b
created by iter.Pull[...] in goroutine 1
	/usr/local/go-faketime/src/iter/iter.go:59 +0x119

What did you expect to see?

The panic is caught at recover()

@hajimehoshi
Copy link
Member Author

cc @golang/compiler

@mknyszek
Copy link
Contributor

iter.Pull isn't explicitly defined as running on another goroutine, but in practice that is what it's doing, leading to the behavior you see here.

We've gone back and forth on whether it should be documented in terms of goroutine+channel semantics (that is, iter.Pull is defined as having the semantics of running the iterator on another goroutine, with next and stop communicating with it via channel), but decided to leave it more ambiguous for now since it accepts more possible implementations, including the current one (though goroutine+channel semantics are a valid implementation of iter.Pull).

I think maybe what does have to be documented is that panics do not propagate through an iter.Pull iterator. I think this must be true of any implementation (and AFAICT, it is true of all the implementations we've thought of so far), because the abstraction here is something like a coroutine, and it's not clear to me that panics should unwind through coroutine boundaries.

CC @dr2chase @aclements

@hajimehoshi
Copy link
Member Author

hajimehoshi commented May 30, 2024

iter.Pull isn't explicitly defined as running on another goroutine, but in practice that is what it's doing, leading to the behavior you see here.

IIUC, the actual implementation is a coroutine rather than a regular goroutine. If the semantics is the same as goroutine, I was wondering why a coroutine is used instead of a regular goroutine. Is this for performance? If so, now we have issues around Cgo with iter.Pull (#67499), isn't this a kind of early optimization? Or, might the semantics be changed in the future?

@mknyszek
Copy link
Contributor

mknyszek commented May 30, 2024

IIUC, the actual implementation is a coroutine rather than a regular goroutine.

It's really a fast, direct switch between goroutines that doesn't go through the scheduler. This is functionally similar to a coroutine switch, hence why it's called coroswitch in the implementation.

If the semantics is the same as goroutine, I was wondering why a coroutine is used instead of a regular goroutine.

The semantics are explicitly not the same as a goroutine. The semantics are a little less well-defined than that at the moment.

Also, even if we formalized the notion of coroutines in the language, I'm not sure I agree that a panic should be able to cross coroutine boundaries.

Is this for performance? If so, now we have issues around Cgo with iter.Pull (#67499), isn't this a kind of early optimization? Or, might the semantics be changed in the future?

Yes, it's for performance. The direct switch is very, very cheap compared to implementing the same thing with channels. I get what you're saying about the optimization possibly happening too early (this was discussed at length), but the performance of iter.Pull was considered too important. The issues around cgo will hopefully not be permanent, but they're also a big part of why the semantics are not defined as goroutines.

@jimmyfrasche
Copy link
Member

Are there many languages where an exception in a coroutine isn't propagated?

@hajimehoshi
Copy link
Member Author

hajimehoshi commented May 30, 2024

@mknyszek
Copy link
Contributor

mknyszek commented May 31, 2024

Oh, that's interesting... At least in JS it really makes sense that exceptions would propagate through. On the caller side (and the definition side!) it just looks almost like calling a regular function, especially with the syntax.

I think what's tricky about Go is that it's already possible for functions to do more complicated things with a closure (like sending it to another goroutine) inside of a function call, and this is a somewhat normal thing to do.

That being said, I do think you changed my mind about:

Also, even if we formalized the notion of coroutines in the language, I'm not sure I agree that a panic should be able to cross coroutine boundaries.

I think there's a good argument to be made about that.

However, for iter.Pull specifically, it was a soft goal of ours that a possible implementation of iter.Pull would be using just plain goroutines and channels as part of the semantics. That goal does seem to require panics not to propagate.

@jimmyfrasche
Copy link
Member

A goroutine implementation could capture the panic and send it to the other side to be re-panic'd when next/close get invoked. https://go.dev/play/p/ltXl_x_CPvm

@mknyszek
Copy link
Contributor

mknyszek commented May 31, 2024

@jimmyfrasche That's true, that you can re-panic, but what I meant by propagation is that the original panic stack follows (for example, if the panic is uncaught, you lose the context by re-panicking). But maybe that doesn't matter? The JS example at least doesn't seem to provide any context on an uncaught exception (though that alone isn't a reason to replicate the behavior).

I think I'm personally still leaning toward the panic not propagating just because that seems more consistent within the language. Is there any existing API that runs a function on another goroutine and possibly re-panics like this? Coroutines don't currently exist in the language, so it would be a little weird to say iter.Pull behaves like coroutines in other languages.

@mknyszek
Copy link
Contributor

mknyszek commented May 31, 2024

After discussing offline with @aclements, I think I'm convinced that we should propagate panics. We can start out by just re-panicking: the user-visible behavior is the most important part for this release cycle.

Here's the information that changed how I was leaning.

Although the primary example that came to mind (exec.Command) doesn't propagate panics from the goroutines it starts, there are a few examples of where, going forward or already today, we do propagate panics.

  • sync.OnceFunc explicitly propagates panics from its callback, and does so even if the sync.OnceFunc is accessed multiple times (it won't surprisingly succeed on future calls). Although it doesn't create a goroutine, it is an API that takes a function an executes it on behalf of the caller.
  • x/sync/errgroup: propagate panics and Goexits through Wait #53757 is an accepted proposal to propagate panics for golang.org/x/sync/errgroup through Wait.

Furthermore, we wanted to leave the door open to a compiler-driven CPS transform as a valid implementation of an iter.Pull call. That implementation actually does suggest propagating panics.

And this is a situation of "if we can, we should," and we definitely "can," since the implementation controls the creation of the iter.Pull. While std is currently a bit inconsistent about forwarding panics, my general sense is that the project wants to move toward propagating them.

I will send a CL later to forward panics.

@mknyszek mknyszek self-assigned this May 31, 2024
@mknyszek mknyszek added NeedsFix The path to resolution is known, but the work has not been done. release-blocker labels May 31, 2024
@mknyszek mknyszek added this to the Go1.23 milestone May 31, 2024
@aclements aclements changed the title iter: panic in iter.Pull is not recovered proposal: iter: iter.Pull should forward panics May 31, 2024
@aclements
Copy link
Member

I've retitled this issue to focus on the API question rather than the implementation. The core API question is what should happen if the sequence function passed to iter.Pull panics. Regardless of the mechanics of iter.Pull, we have two options: passing it through to the caller of iter.Pull, or crashing the program. Whether its implemented with goroutines+channels, coroutines, or CPS transform doesn't matter, since any of those implementations can support either semantics (though obviously which is more "natural" varies).

I'm also leaning in favor of passing the panic through. I worry about making changes like this so late in the release, but I think this will also be a simple change. I think once @mknyszek has prepared a CL, we'll have to assess its risk and weigh requesting a freeze exception.

Passing the panic through is consistent with the APIs @mknyszek pointed out above. I think it's also more consistent with where we landed on panic handling of for-range loops. Supposing you have a Zip implementation that uses iter.Pull for one of the two iterators, consider:

for x := range Zip(a, b) { ... }

It seems unfortunate that a panic in a passes through and runs defers on the stack (which may recover the panic), but a panic in b immediately crashes the program. If we instead pass panics through iter.Pull, then a panic in either will behave roughly the same.

@rsc
Copy link
Contributor

rsc commented May 31, 2024

I believe I intended panics to forward: https://research.swtch.com/coro#panic.

@mknyszek mknyszek changed the title proposal: iter: iter.Pull should forward panics proposal: iter: iter.Pull should forward panics [freeze exception] May 31, 2024
@mknyszek
Copy link
Contributor

mknyszek commented May 31, 2024

I have an implementation that propagates panics, and also converts a runtime.Goexit in the iterator into a panic originating from next and/or stop. Regarding runtime.Goexit, are those semantics we want? I picked that because it was convenient, but I'm not actually sure if there's anything better.

Also, the implementation is pretty simple, so I think it would be OK to land during the freeze, especially since nobody is using iter.Pull yet. (That is, the risk of breakage is low.) Furthermore, I don't think we want to ship with this not having the semantics we want. As soon as it's out there, it'll be depended on.

https://go.dev/cl/589136 and https://go.dev/cl/589137 are the CLs.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/589137 mentions this issue: iter: convert runtime.Goexit in an iterator passed to Pull into a panic

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/589136 mentions this issue: iter: propagate panics from the iterator passed to Pull

@mknyszek
Copy link
Contributor

mknyszek commented May 31, 2024

I'm requesting a freeze exception for this issue. See my previous comment for a rationale.

CC @golang/release

@hajimehoshi
Copy link
Member Author

Thanks!

https://go.dev/cl/589136

Woudn't stack trace info be continuous?

@aclements
Copy link
Member

converts a runtime.Goexit in the iterator into a panic originating from next and/or stop. Regarding runtime.Goexit, are those semantics we want?

My inclination would be to turn Goexit from the iterator into a Goexit on the caller. That's sort of "maximum" transparency to the implementation mechanism and most symmetric in my zip example.

@aclements
Copy link
Member

Woudn't stack trace info be continuous?

I'm not positive I understand your question, but with that implementation it's true that the stack trace from seq will be lost, and the stack trace will terminate a closure in Pull. Addressing that would require substantially more runtime mechanism, but I also think that's okay to do in a later release.

(Just recording my thoughts on this: I think we'd do this by having the panic unwinder know that it needs to hop goroutines when it leaves the Pull goroutine, and leave the Pull goroutine around until the panic ends. After that hop, any defers would have to run on the "parent" goroutine, and traceback from the defer would have to understand to jump to the Pull goroutine once it got out of the defers, and then back over to the parent goroutine when it reaches the end of the Pull goroutine. I believe this can be arbitrarily nested. This all sounds pretty complicated, but I think it may not be too bad if we have a mechanism for marking the frames that have these special goroutine jumps.)

@hajimehoshi
Copy link
Member Author

Yeah that's what I wanted to know, thanks

@mknyszek
Copy link
Contributor

mknyszek commented Jun 3, 2024

I updated the changes to propagate runtime.Goexit instead of converting it into a panic.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/590075 mentions this issue: iter: don't iterate if stop is called before next on Pull

@mknyszek
Copy link
Contributor

mknyszek commented Jun 3, 2024

Here is a summary of the semantics we plan to set for iter.Pull before release:

  1. If stop is called before next ever is, then the iterator passed to iter.Pull is guaranteed to never execute.
  2. If the iterator passed to iter.Pull panics, then that panic propagates once through to whoever is waiting on that iterator's latest iteration (a next or stop call). Future calls to next return the zero value and false. Future calls to stop do nothing.
  3. If the iterator passed to iter.Pull calls runtime.Goexit, then that runtime.Goexit propagates once through to whoever is waiting on that iterator's latest iteration (a next or stop call). This effectively destroys the entire chain of goroutines waiting on nested calls to iter.Pull. Future calls to next return the zero value and false. Future calls to stop do nothing.

These are the semantics I have implemented in CL 590075, CL 589136, and CL 589137 respectively.

Note: a stop may observe a panic or runtime.Goexit because the iterator needs a chance to clean up when yield returns false. It may panic during that clean up.

Motivating examples:

  • For (2), functions like Zip should behave the same out-of-the-box regardless of the order of their arguments. If Zip is implemented with one iterator being used with range-over-func and the other with iter.Pull, then it's compelling that panics should propagate through in either case, not just the first.
  • For (2), making the panic happen on each call into next or stop will result in code like defer stop() likely to panic-during-panic, if, say, next panics. This is messy and unexpected.
  • For (3), calling t.Fatal (which uses runtime.Goexit) from an iterator passed to iter.Pull will behave the same as if it were called in a range-over-func statement.

@mknyszek mknyszek changed the title proposal: iter: iter.Pull should forward panics [freeze exception] proposal: iter: iter.Pull should forward panics and runtime.Goexit [freeze exception] Jun 3, 2024
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Jun 5, 2024
@dmitshur
Copy link
Contributor

dmitshur commented Jun 5, 2024

We reviewed this in a release meeting, and agreed to approve its "freeze exception" bit. Thanks for letting us know.

@aclements
Copy link
Member

Discussed in proposal review. We decided that since forwarding panics was intended to be part of the behavior of iter.Pull, this is really just an implementation oversight. Taking out of the proposal process.

@aclements aclements changed the title proposal: iter: iter.Pull should forward panics and runtime.Goexit [freeze exception] iter: iter.Pull should forward panics and runtime.Goexit [freeze exception] Jun 5, 2024
@aclements aclements removed the Proposal label Jun 5, 2024
@dmitshur dmitshur removed this from Proposals Jun 5, 2024
gopherbot pushed a commit that referenced this issue Jun 7, 2024

Verified

This commit was signed with the committer’s verified signature. The key has expired.
tcharding Tobin C. Harding
This change propagates panics from the iterator passed to Pull through
next and stop. Once the panic occurs, next and stop become no-ops (the
iterator is invalidated).

For #67712.

Change-Id: I05e45601d4d10acdf51b53e3164bd891c1b324ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/589136
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Jun 7, 2024

Verified

This commit was signed with the committer’s verified signature. The key has expired.
tcharding Tobin C. Harding
Consider the following code snippet:

    next, stop := iter.Pull(seq)
    stop()

Today, seq will iterate exactly once before it notices that its
iteration is invalid to begin with. This effect is observable in a
variety of ways. For example, if the iterator panics, since that panic
must propagate to the caller of stop. But if the iterator is stateful in
anyway, then it may update some state.

This is somewhat unexpected and because it's observable, can be depended
upon. This behavior does not align well with other possible
implementations of Pull, like CPS performed by the compiler. It's also
just odd to let even one iteration happen, precisely because of
unexpected state modification.

Fix this by not iterating at all of the done flag is set before entering
the iterator.

For #67712.

Change-Id: I18162e29df45a2e8968f68379450d92e1de47c4d
Reviewed-on: https://go-review.googlesource.com/c/go/+/590075
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
Status: Done
Archived in project
Development

No branches or pull requests

7 participants