Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: pipe operator with explicit result passing #70826

Closed
1 of 4 tasks
DeedleFake opened this issue Dec 13, 2024 · 17 comments
Closed
1 of 4 tasks

proposal: spec: pipe operator with explicit result passing #70826

DeedleFake opened this issue Dec 13, 2024 · 17 comments
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal Proposal-FinalCommentPeriod
Milestone

Comments

@DeedleFake
Copy link

DeedleFake commented Dec 13, 2024

Go Programming Experience

Experienced

Other Languages Experience

Elixir, JavaScript, Ruby, Kotlin, Dart, Python, C

Related Idea

  • Has this idea, or one like it, been proposed before?
  • Does this affect error handling?
  • Is this about generics?
  • Is this change backward compatible? Breaking the Go 1 compatibility guarantee is a large cost and requires a large benefit

Has this idea, or one like it, been proposed before?

Yes, several times. This variant directly addresses the main issues brought up in those proposals.

Does this affect error handling?

Not directly, though it possible could in some cases.

Is this about generics?

Not directly, though it addresses a situation that has arisen as a result of generics.

Proposal

When a pipe operator has been proposed before (#33361), the primary issues with it were

  1. Not enough interest.
  2. Not explicit enough.
  3. Might lead to APIs being written specifically to accommodate it, thus potentially making them awkward for no reason.

I think that the first point is arguable as a reason not to consider a feature, but more importantly I think that the situation there has changed and I think that #49085's continued discussion is good evidence that some way to fix the issue that a pipe operator would address is a very popular idea. With generics being added and now iterators, too, some way to write chains of function/method calls in a left-to-right or top-to-bottom manner has, I think, gained a fair bit of usefulness that it didn't have back in 2019 (#33361). Simply using methods like many other languages, such as Rust, do, has a lot of problems that have been pointed out in the above issue, but functions have none of those problems. Their only issue in this regard is simply syntactic.

Points 2 and 3, however, I think are very solvable in a simple way: Add a special variable that is defined for the scope of each piece of the pipe that contains the value of the previous expression instead of magically inserting function arguments. For example, assuming that the bikeshed is painted piped:

a() |> f1(piped, b) |> f2(c, piped)

The first |> operator creates a new scope that exists only for the expression immediately to its right, in this case a call to f1(). In that scope, it defines a variable, piped, containing the result of the expression to its left, in this case just a(). The second |> operator creates a new scope that shadows the existing piped, introducing a new piped variable containing the result of the expression to its left, in this case a() |> f1(piped, b). And so on with a longer pipeline.

This completely fixes problem 2, as it now makes piping extremely explicit. It mostly fixes problem 3 as it makes the operator significantly more flexible, reducing the need for writing APIs specifically to accommodate it. I think this not completely solvable, though, as, at some point, someone will always write something that they probably shouldn't have.

It also allows the pipe operator to become non-exclusive to function calls. Any single-value expression now becomes valid at any point in a pipeline, allowing even things like

a() |> f1(piped, b) |> S{Value: piped} |> f2(c, piped)

For a more practical example, here's some iterator usage:

// Problem: Horrendously unreadable and uneditable.
xiter.Filter(
	func(v int) bool { return v > 0 },
	xiter.Map(
		func(v string) int {
			n, _ := strconv.ParseInt(v, 10, 0)
			return int(n)
		},
		strings.SplitSeq(input, "\n"),
	),
)

// Using multiple explicit variables:
// Problem: Better than the last one in terms of readability, but
// still difficult to edit by, for example, adding or removing
// operations in the middle because of needing to avoid variable name
// reuse if types change and also being sure to pass the correct one
// to the next in the chain.
lines := strings.SplitSeq(input, "\n")
ints := xiter.Map(lines, func(v string) int {
	n, _ := strconv.ParseInt(v, 10, 0)
	return int(n)
})
ints = xiter.Filter(ints, func(v int) bool { return v > 0 })

// With this proposal:
ints := strings.SplitSeq(input, "\n") |>
	xiter.Map(func(v string) int {
		n, _ := strconv.ParseInt(v, 10, 0)
		return n,
	}, piped) |>
	xiter.Filter(func(v int) bool { return v > 0 }, piped)

Side note: I'm not a huge fan of needing to put the |> operator at the end of a line. I think Elixir's way of doing it with the operator at the beginning of each of the subsequent lines looks way better. Unfortunately, Go's semicolon insertion rules kind of make this necessary unless someone can come up with a way to do it that doesn't involve special-casing the |> operator, which I definitely think would be unnecessary. For comparison's sake, here's that same iterator chain written the other way around:

strings.SplitSeq(input, "\n")
|> xiter.Map(func(v string) int {
	n, _ := strconv.ParseInt(v, 10, 0)
	return n,
}, piped)
|> xiter.Filter(func(v int) bool { return v > 0 }, piped)

Language Spec Changes

A section would have to be added about the |> operator. It shouldn't directly affect any existing parts of the spec, I don't think.

Informal Change

The |> operator allows expressions to be written in a left-to-right manner by implicitly passing the result of one into the next in the form of a variable called piped that is scoped only to the right-hand side of each usage of |>, shadowing any existing variables named piped in parent scopes, including previous |> usages in the same pipeline.

Is this change backward compatible?

Yes.

Orthogonality: How does this change interact or overlap with existing features?

It allows a compromise between adding generic types in method calls (#49085) and function calls having poor ergonomics for certain use cases.

Would this change make Go easier or harder to learn, and why?

Slightly harder as the idea of the specially-scoped variable and its automatic shadowing of its counterparts in previous pipeline stages would have to be explained.

Cost Description

Tiny compile-time cost. No runtime costs. Slight increase in language complexity. Slight increase in potential for poorly written code as some people might misuse the operator.

Changes to Go ToolChain

All tools that parse Go code would have to be updated. gofmt and goimports would be affected the most.

Performance Costs

Compile-time cost is minimal. Runtime cost is nonexistent.

Prototype

No response

@DeedleFake DeedleFake added LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal labels Dec 13, 2024
@gopherbot gopherbot added this to the Proposal milestone Dec 13, 2024
@apparentlymart
Copy link

The way I'm understanding the proposal is that the following example from the proposal text...

a() |> f1(piped, b) |> f2(c, piped)

...would behave approximately like the following:

(func () {
    piped := a()
    piped := f1(piped, b)
    return f2(c, piped)
})()

Does that match what you were intending?

Do you expect that the |> operator would require that its left operand produce only one return value, or do you imagine there being some way to capture multiple return values and propagate them all forward together? (Or perhaps something else entirely that allows diverting extra return values into some other control path for error handling?)

@DeedleFake
Copy link
Author

Does that match what you were intending?

Effectively, yeah. That exact code snippet isn't legal because you can't shadow a variable in the same scope, but if you could then that would be pretty much equivalent.

Do you expect that the |> operator would require that its left operand produce only one return value, or do you imagine there being some way to capture multiple return values and propagate them all forward together?

In this version, yeah, single-value only. I thought of a couple of ways to do multi-value, but none of them seemed sufficient. My original idea was to actually introduce a new operator to refer to the previous expression's result and that would work as a stand-in for multiple results, i.e. multiResultFunc() |> doStuff(::), but I decided that a variable would make more sense as all that would require is some special scoping details inside of the pipeline instead. Unfortunately, that does mean that it doesn't really make any sense with multiple results. Still, maybe something could be done along those lines.

@apparentlymart
Copy link

apparentlymart commented Dec 13, 2024

The point about shadowing piped is well taken indeed ... I was trying to keep it as succinct as possible but really I suppose the two :>s should've been expanded as their own func () { ... } too, rather than squashing them both together as one.

I suppose the following is a more "honest" desugaring:

// (this would be easier to write out with something like #21498 ...)
(func () WhateverF2Returns {
    piped := (func () WhateverF1Returns {
        piped := a()
        return f1(piped, b)
    }())
    return f2(c, piped)
}())

That does, however, raise an interesting point about the design:

This design does implicitly shadow the symbol piped, presumably by creating a small nested scope just for the rhs of each |> operator. It's a little quirky to have a nested scope just for one operand, without some braces explicitly marking its bounds, but I admittedly can't think of any way to make that scope more explicit that doesn't quickly degenerate into something a lot like the desugaring I tried in my previous comment and above.

So I guess the key question here would be: is the readability improvement of the overall feature sufficient to overcome the (admittedly subjective) readability decrease caused by an implicit scope created by some syntax that isn't like anything else in today's Go. 🤔

@DeedleFake
Copy link
Author

I admittedly can't think of any way to make that scope more explicit that doesn't quickly degenerate into something a lot like the desugaring I tried in my previous comment and above.

This is pretty much what happened to me, too. I tried to come up with syntaxes that would involve {}, for example, but it quickly became way more trouble than it was worth.

Another difference with the desugared version is that anonymous functions like that would need to have return types.

So I guess the key question here would be: is the readability improvement of the overall feature sufficient to overcome the (admittedly subjective) readability decrease caused by an implicit scope created by some syntax that isn't like anything else in today's Go.

I don't think the scoping is a readability problem. That's just a technical detail. Thinking about it simply as "piped is the result of the expression to the left of the |>." is more than good enough and makes it quite simple to reason about.

There are some potential complications if someone tries to do something with anonymous functions or change the order of operator precedence with parentheses. Something like

f() |> func() {
  f2() |> stuff(piped) // piped is f2(), not f().
}()

// or

f() |> (f2() |> stuff(piped)) |> f3(piped)

In other words, someone explicitly creating a new scope inside of the pipeline could be a bit confusing. In the latter case, I think that's pretty simple to just not allow. Could give a error: nested pipelines are not allowed or something.

In the former case, though, that could be a problem with calling a function that takes a function argument. For example,

strings.SplitSeq(input, "\n") |>
  xiter.Map(
    func(v string) string { v + piped }, // This is weird.
    piped,
  )

Maybe it's possible to not allow access to piped from inside of anonymous functions? In other words, it would not be in scope if a new subscope was created. Alternatively, maybe give a more explicit error if an attempt is made to access it from a manually created subscope such as an anonymous function. I don't think disallowing pipelines inside of anonymous functions inside of pipelines altogether makes sense, as creating a pipeline in such a situation is not uncommon in Elixir when performing, for example, a flat map.

@apparentlymart
Copy link

To be more specific about what I was concerned about with this implicit scope, some details:

  • Unless I'm already familiar with this syntax, there's nothing here to tell me what piped represents, where it came from, and where it goes out of scope. There are no keywords I can search for to learn more, except for this piped variable.
  • If I ask a gopls-supported editor to "go to declaration" on that piped variable, it would presumably need to just send me to the previous expression in the pipeline, because there is no explicit declaration of that variable. That is a small clue as to what is going on with this syntax, but not a very strong one.
  • Each new pipe step effectively shadows the previous piped symbol, which isn't normally allowed in Go unless there's a nested scope. Of course, there is a nested scope here, but unless the reader is already familiar with this syntax it isn't clear where that scope starts and ends.

I don't mean to say that any of what I've been saying is a showstopper. Just some reactions I had while trying to understand the meaning of this new syntax from your proposal trying to put myself in the shoes of someone who doesn't have this proposal text to help them understand what it means.

I guess overall there's just a big tension here where the drive to make this as concise as possible unfortunately makes it quite unlike anything else in Go, because Go is not a very concise language overall. Making it less concise makes it less valuable as a language feature, because the conciseness is the whole point of this.


In my earlier comment I alluded to #21498 in a throwaway code comment in my code example, but it occurs to me that if we did have a more concise anonymous function syntax then it might be more defensible to do something like the earlier proposal #33361, with the requirement that the rhs of |> must be something that can be called with all of the return values from the lhs expression, and the expectation that most of the time the rhs would be a "lightweight anonymous function".

There's still lots of debate over there about exactly which syntax to use for those functions, so I'll just pick one arbitrarily from comments near the end of that issue for an example:

a() |>
    |aResult| f1(aResult, b) |>
    |f1Result| f2(c, f1Result)

Or, the less contrived examples from the older proposal:

s := strings.TrimSpace(x) |>
    |trimmed| strings.ToLower(trimmed) |>
    |lowered| strings.ReplaceAll("/old/", "/new")

http.HandleFunc("/",
    LogMiddleware(IndexHandler) |>
    |m| SomeOtherMiddleware(m) |>
    |m| RequireAuthMiddleware(m)
)

I will concede right up front that I don't think this is substantially more readable than the original proposal, but it does at least avoid an implicit declaration of an arbitrary name in favor of having the author choose an argument name explicitly, and it supports expressions that generate more than one result. It also hopefully removes some of the mystery of what's going on for someone who was already familiar with the shorthand anonymous function syntax.

(Some of the other candidates for lightweight anonymous function syntax include the -> atom between the arguments and the result expression, which I think would make this more confusing to read due to the mix of alternating |> and -> tokens. But that's a general hazard of using assorted ASCII punctuation symbols for operators.)

@seankhliao
Copy link
Member

I think the proposed syntax is still unreadable compared to a simple for loop:

var ints []int
for s := range strings.SplitSeq(input, "\n") {
    if n, _ := strconv.ParseInt(s, 10, 0); n > 0 {
        ints = append(ints, n)
    }
}

and of the syntaxes listed above, I think the intermediate variables one is more readable since each operation is more clearly marked. maybe I just don't mind naming the variables p0, p1, p2, ..., and letting the unused var checker catch failing to pass it to the next operation.

and I don't see the point of f() |> |name| g(name), just use real intermediate variables instead of trying to create a one-liner.

@DeedleFake
Copy link
Author

and I don't see the point of f() |> |name| g(name), just use real intermediate variables instead of trying to create a one-liner.

The problem isn't trying to create a one-liner. It's ability to edit the code later. Using intermediate variables is highly error-prone. For example, let's say I'm constructing some pipeline that needs to do a map and then a filter, and then I come back later and need to insert a new filter in between the existing map and filter. The following happens:

mapped := xiter.Map(mapFunc, seq)
filtered := xiter.Filter(filterFunc, mapped)

// later

mapped := xiter.Map(mapFunc, seq)
filtered := xiter.Filter(insertedFilterFunc, mapped)
filtered = xiter.Filter(filterFunc, mapped) // Accidentally left mapped the same.

It might seem trivial in an example like this, but in larger, more complicated pipeleines it can be a real issue.

I think the proposed syntax is still unreadable compared to a simple for loop:

That's a valid argument, but there are things that literally can not be done with the for loop variant because it requires an intermediary slice. Iterators allow you to pass unevaluated loops around as data. That being said, an argument could certainly be made, given that iterators are just functions in Go, that you could write that as

ints := func(yield func(int) bool) {
  for s := range strings.SplitSeq(input, "\n") {
    if n, _ := strconv.ParseInt(s, 10, 0); n > 0 {
      if !yield(n) {
        return
      }
    }
  }
}

and instead of constructing a pipeline of map/filter/whatever function calls. That's essentially what I've been doing myself in my code because of how unergonomic such pipelines currently are.

@DeedleFake
Copy link
Author

@apparentlymart

I like that. Even without anonymous functions, just changing it to require a manual naming of each pipeline stage is a good one. That also allows you to put the name on the next line, somewhat avoiding the problem with previous lines ending with |> instead of next lines starting with it, as I think that putting that at the end causes some readability issues as I mentioned in the original proposal. It also avoids the problems with editing pipelines that I mentioned above that would happen with simple intermediate variables, as well as allowing name reuse even when types change.

One thing that could be confusing, though, is what exactly the scope of those names is if the naming is part of the |> syntax itself and not simply an anonymous function argument. Does it continue into the next stage of the pipeline?

@apparentlymart
Copy link

Since I was imagining these as just normal anonymous functions, of course for me these symbols exist only inside the function just like any other function argument.

That does admittedly mean that if you want to make additional use of some value at a later pipeline stage then that would need to be done manually somehow, rather than the symbol automatically being in scope. None of the motivating examples we discussed so far required that sort of thing, so I didn't try to solve for it.

My initial instinct is that if you need more than just directly propagating the results from one expression into the directly-following expression then you should probably use a different technique so you can manage the symbols more explicitly. This proposal is already in a tricky part of the concise vs. readable vs. maintainable tradeoff and so I expect that complicating it any further would topple the scale.

@ianlancetaylor
Copy link
Member

Thanks for the well thought out and well described proposal.

One of the goals of Go is that Go programs are mostly comprehensible to people unfamiliar with Go. The code in this proposal doesn't seem to meet that criterion: the new |> operator would be obscure to people who work with languages similar to Go. This is not a deal-breaker, but it is a significant hurdle.

This proposal introduces a new name, piped, that has no explicit declaration. It is presumably only valid on the right side of the |> operator. This too is not a deal-breaker, but the meaning and, especially, the type of the new name are not obvious to the reader.

Most importantly, this doesn't handle errors. Many Go functions return both a value and an error, and there is no way to handle such functions with this operator. That is, this might just apply to special cases unusable by much Go code. That makes the relatively obscurity of the construct worse, in that it will be rare.

The problem this is solving is writing chains of functions using lists of variables (we agree that packing everything into a single expression is difficult to read). That problem does not seem important enough, and common enough, to deserve a fairly complex new language construct.

For these reasons, this is a likely decline. Leaving open for four weeks for final comments.

@DeedleFake
Copy link
Author

DeedleFake commented Jan 9, 2025

@ianlancetaylor

Thanks for the well thought out and well described proposal.

Thank you.

One of the goals of Go is that Go programs are mostly comprehensible to people unfamiliar with Go. The code in this proposal doesn't seem to meet that criterion: the new |> operator would be obscure to people who work with languages similar to Go. This is not a deal-breaker, but it is a significant hurdle.

It is something not particularly common in Go-like languages, that's true, but it's not that uncommon in and of itself. Elm, Elixir, and OCaml all have very similar operators, and the operator itself was originally based on shell pipes, something that many people are familiar with regardless of the programming languages that they primarily use.

This proposal introduces a new name, piped, that has no explicit declaration. It is presumably only valid on the right side of the |> operator. This too is not a deal-breaker, but the meaning and, especially, the type of the new name are not obvious to the reader.

There was some previous discussion about this, including a suggestion to explicitly name the previous value (#70826 (comment)). Although, at that point I feel like an alternative proposal that just simply allows you to shadow variables in the same scope that they were declared in would be a better idea. For example,

iter := data()
^iter := xiter.Map(iter, func(v int) float64 { return float64(v / 2) })
^iter := xiter.Filter(iter, func(v float64) bool { return v >= 1 })
process(iter)

There are a few downsides to that, though, including that it can't be a single expression, which is probably not actually a problem, and that shadowing variables can have unexpected side-effects if they're captured somewhere. Languages that let you do that usually are either immutable or encourage immutability. It might be possible to work around that by simply limiting the usage, though, i.e. you can't shadow a variable that has been captured in between its declaration and the attempt to shadow it.

Or maybe some way to create immutable variables will be added in the future and they will just allow shadowing directly. I might put together a proposal for that based on an older comment I made in another issue somewhere that involved a special naming convention for immutable variables similar to how exporting works now so that you could see at a glance if a variable was immutable or not. Iterators would work well as immutable variables since they're just function values and thus don't have any internal state of their own.

Most importantly, this doesn't handle errors. Many Go functions return both a value and an error, and there is no way to handle such functions with this operator. That is, this might just apply to special cases unusable by much Go code. That makes the relatively obscurity of the construct worse, in that it will be rare.

I think that the previous suggestion might also solve this problem, since you could just handle errors like normal.

@myaaaaaaaaa
Copy link

myaaaaaaaaa commented Jan 16, 2025

@DeedleFake Have you considered tweaking the original function composition operator instead?

// Given that f, g, and h are all functions...
fgh := f | g | h

// The above statement would be transformed into this pseudocode:
fgh := func(...inputs) ...outputs {
	return h(g(f(inputs...)))
}

Instead of executing immediately, | would simply take two func operands and return a combined func.

This definition has the advantage that it requires no changes to the parser - a new case simply needs to be added to the | operator.

Then the example in your proposal would become:

// Note: The xiter functions presented here have different definitions
// than the ones in the original proposal:
func xiter.Map[I, O any](func(I) O)    func(iter.Seq[I]) iter.Seq[O]
func xiter.Filter[T any](func(T) bool) func(iter.Seq[T]) iter.Seq[T]


getInts := strings.SplitSeq |
	xiter.Map(func(v string) int {
		n, _ := strconv.ParseInt(v, 10, 0)
		return int(n)
	}) |
	xiter.Filter(func(v int) bool { return v > 0 })

ints := getInts(lines, "\n")

@apparentlymart
Copy link

Composing functions without executing them is an interesting idea.

I sure hope, though, that it wouldn't encourage folks to write something like this:

ints := (
	strings.SplitSeq |
	xiter.Map(func(v string) int {
		n, _ := strconv.ParseInt(v, 10, 0)
		return int(n)
	}) |
	xiter.Filter(func(v int) bool { return v > 0 })
)(lines, "\n")

Perhaps it's just me, but I don't really find that any more readable than the nested form this proposal was presented as an alternative to. In particular, it took me quite some staring at this code (and, for that matter, the example with a separate getInts variable) to recognize that lines, "\n" are the arguments to strings.SplitSeq.

This does of course also still have the question of how one would actually handle errors. In contrived examples like this it's easy to just ignore the err from strconv.ParseInt, but I expect that in at least some real programs one would want to actually return an error.

(I realize the above is not what was proposed, but it seems like the above would be valid if that proposal were accepted.)

@myaaaaaaaaa
Copy link

myaaaaaaaaa commented Jan 18, 2025

I agree that none of the examples presented so far are particularly convincing. Let me give it a shot:

Given a preexisting file that looks something like this:

[user1]
owner=root
files=hello.txt
other_metadata=...

[user2]
owner=groot
files=a.txt,b.txt

[user3]
owner=noot
files=hello.txt,c.txt

If we want to parse this into a deduplicated list of filenames, normally we'd have to write something like this:

fileSet := map[string]struct{}{}
for _, line := range strings.Split(data, "\n") {
	k, v, _ := strings.Cut(line, "=")
	if k != "files" {
		continue
	}

	for _, file := range strings.Split(v, ",") {
		fileSet[file] = struct{}{}
	}
}

sortedFiles := slices.Sorted(maps.Keys(fileSet))
// Do something with sortedFiles...

However, with the function composition operator |, along with an accompanying package, we could instead write this:

import . "golang.org/x/exp/shell"

var parseFilenames = strings.Lines | Grep("^files=") | Cut("-d= -f2-") | Grep("-oE '[^,]+'") | Sort("-u") | slices.Collect

func main() {
	sortedFiles := parseFilenames(data)
	// Do something with the files
}

// Note: signatures would be something like:
package shell
func Grep(args string) func(iter.Seq[string]) iter.Seq[string]

@ianlancetaylor
Copy link
Member

No change in consensus.

@ianlancetaylor ianlancetaylor closed this as not planned Won't fix, can't repro, duplicate, stale Feb 5, 2025
@morlay
Copy link

morlay commented Feb 12, 2025

A small demo to implements pipe with xiter:

#61898 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal Proposal-FinalCommentPeriod
Projects
None yet
Development

No branches or pull requests

8 participants