Skip to content

proposal: spec: support for easy packing/unpacking of struct types #64613

Open
@griesemer

Description

@griesemer

Introduction and acknowledgements

This proposal is a restatement of ideas that were all originally proposed in #33080 in 2019 but somehow didn't get more traction despite getting essentially very positive feedback.

Specifically, I propose that we take @urandom 's idea of permitting expressions representing multiple values (such as function calls returning multiple results) to create struct values. But instead of using conversions for that purpose, I suggest that we allow such multi-values in struct literals, as proposed by @ianlancetaylor in the same issue. Furthermore, I suggest that we expand the idea to arrays (but not slices), also mentioned by @ianlancetaylor in that issue.

And I propose that we use @urandom' s and @bradfitz 's suggestion (here and here), and write s... to unpack a struct value s.

In short, I propose we give the combined ideas in #33080 serious consideration. They cover a lot of ground which tuple types would cover, without the need for tuple types. The ideas are clean and simple.

Proposal

  1. The ... may be used to unpack a struct or array value v: v... produces the list of element values of the struct or array value v. This list of values can be used exactly like a multi-valued function result (i.e., the same rules apply).

Example:

type Pair struct{a, b int}
var p Pair
x, y := p... // unpack the elements of the pair value p into a multi-value assigned to x, y
  1. Given a struct or array type T with n elements and an expression e that stands for n values (a multi-value) of matching types, the composite literal T{e} creates the struct or array value with the multiple values as the elements.

Example:

func position() (x, y int)
p := Pair{position()} // the multi-value returned by position() can be used directly in the composite literal
  1. ... is added to the list of tokens that cause an automatic semicolon insertion during lexical analysis. This is needed so that it is possible to have ... at the end of a line without the need to manually write a ;.

This is the entire proposal.

Examples

The ... applied to a struct (and an array, which also has a compile-time fixed size) produces the list of elements and can be used exactly like a function returning multiple values.

s := struct{x int}{2}
x := s...  // same as x := s.x

type S2 struct{x, y int}
s2 := S2{1, 2}
a, b := s2...  // unpacking of s2; a = 1, b = 2; shortcut for a, b := s2.x, s2.y

Given s2 above, and

func f2(a, b int)

we can call f2 like so:

f2(s2...) // same as f2(s2.x, s2.y)

Instead of:

func f2() (x, y int)

a, b := f2()  // temporaries for use with composite literal below
s2 := S2{a, b}

we can write

s2 := S2{f()}

leading to the tautology

s2 == S2{s2...}

In other words, s2... is essentially syntactic sugar for s.x, s.y except that we cannot mix and match with other elements. For instance

type Triple struct{x, y, z int}
_ = T{s2..., 3}  // cannot mix a multi-valued expression with other expressions

because we don't allow similar mixing with multi-valued function results passed to other functions. (Lifting this restriction is an independent discussion and should only be considered after having gained experience with this proposal.)

The compiler always knows when there's a multi-value (a multi-valued function result, or an unpacked tuple or array) and it will simply allow such multi-values in places where that exact number of values is permitted: as arguments to a function, as elements for a struct or array composite literal that expects that exact number of elements of matching types.

This allows us to write something like

type data struct{ a, b int; msg string }

func produce() (a, b int, msg string)
func consume(a, b int, msg string)

// send produced data over a channel
var ch chan data
ch <- data{produce()}

// consume data from a channel
consume(<-ch...)

If one needs comma-ok one would write:

d, ok := <-ch
if ok {
   consume(d...)
}

It also makes it easy to convert from arrays to structs that have the same number of elements and types:

p := Pair(1, 2)
a := [2]int{p...}

or even

a := [...]int{p...}  // this version will work even if the number of elements in p changes

and back

p := Pair{a...}

Discussion

Allowing a multi-value in a struct/array literal seems more natural than in a conversion (as proposed originally): for one, composite literals already accept a list of values, and conversions always work on a single value. Providing a multi-value to a composite literal is similar to passing a multi-value as an argument to a function call.

Using ... to unpack a struct or array value is similar in spirit to the use of ... to unpack a slice for passing in a variadic function call.

The unpack operation ... requires a syntax change. Proposal #64457 explored the idea of ... as unpack operator with a concrete prototype implementation to identify potential lexical and grammatical problems (CL 546079). It turns out that to make unpack operations work nicely (without the need for an explicit semicolon), if the ... token appears at the end of a line, a semicolon must be automatically inserted into the token stream immediately after the ... token. This will break array literals using the [...]E{...} notation if the closing bracket ] appears on a different line than the ...; a situation that is extremely unlikely to occur in actual Go code:

// nobody writes code like this, and gofmt will have fixed it
var array = [...  // this will cause a syntax error because the lexer will introduce a semicolon after the ...
]int{1, 2, 3}

func f(args... int)
var s []int
f(s...,  // here the problem doesn't occur because we need a comma anyway
)

@jba has pointed out a perceived issue with backward compatibility: If we allow e.g. S{f()}, if we add a field to S without changing the signature of f, the code will break. For exported structs the recommendation is to use tagged literals (explicit field names). That said, if multiple function results are used in combination with structs to pack them up, if one of them changes, the other will need to change, too. Tagged struct literals allow more flexibility, but they also invite possible bugs because one doesn't get an error if one misses to set a field. In short, the perceived backward-compatibility is a double-edged sword. It may be that the proposed mechanism works best for non-exported code where one can make all the necessary changes without causing API problems. Or perhaps S{f()} could be permitted if f() produces a prefix of all the values needed by S. But that is a different rule from what is proposed here and should be considered separately.

Implementation-wise, there some work needed to allow ... in expressions, and it may require minimal AST adjustments. Type-checking is straight-forward, and so is code generation (packing and unpacking can be seen as a form of syntactic sugar, there's no new machinery required).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions