Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: expression to create pointer to simple types #45624

Open
robpike opened this issue Apr 19, 2021 · 164 comments
Open

proposal: spec: expression to create pointer to simple types #45624

robpike opened this issue Apr 19, 2021 · 164 comments
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal
Milestone

Comments

@robpike
Copy link
Contributor

robpike commented Apr 19, 2021

This notion was addressed in #9097, which was shut down rather summarily. Rather than reopen it, let me take another approach.

When &S{} was added to the language as a way to construct a pointer to a composite literal, it didn't quite feel right to me. The allocation was semi-hidden, magical. But I have gotten used to it, and of course now use it often.

But it still bothers me some, because it is a special case. Why is it only valid for composite literals? There are reasons for this, which we'll get back to, but it still feels wrong that it's easier to create a pointer to a struct:

p := &S{a:3}

than to create a pointer to a simple type:

a := 3
p := &a

I would like to propose two different solutions to this inconsistency.
Now it has been repeatedly suggested that we allow pointers to constants, as in

p := &3

but that has the nasty problem that 3 does not have a type, so that just won't work.

There are two ways forward that could work, though.

Option 1: new

We can add an optional argument to new. If you think about it,

p := &S{a:3}

can be considered to be shorthand for

p := new(S)
*p = S{a:3}

or

var _v = S{a:3}
p := &_v

That's two steps either way. If we focus first on the new version, we could reduce it to one line by allowing a second, optional argument to the builtin:

p := new(S, S{a:3})

That of course doesn't add much, and the stuttering is annoying, but it enables this form, making a number of previously clumsy pointer builds easy:

p1 := new(int, 3)
p2 := new(rune, 10)
p3 := new(Weekday, Tuesday)
p4 := new(Name, "unspecified")
... and so on

Seen in this light, this construct redresses the fact that it's harder to build a pointer to a simple type than to a compound one.

This construct creates an addressible form from a non-addressible one by explicitly allocating the storage for the expression.
It could be applied to lots of places, including function returns:

p := new(T, f())

Moreover, although we could leave out this step (but see Option 2) we could now redefine the & operator applied to a non-addressible typed expression to be,

new(typeOfExpression, Expression)

That is,

p := &expression 

where expr is not an existing memory location is now just defined to be shorthand for

p := new(typeOfExpression, expression)

Option 2

I am more of a fan of the new builtin than most. It's regular and easy to use, just a little verbose.
But a lot of people don't like it, for some reason.
So here's an approach that doesn't change new.
Instead, we define that conversions (and perhaps type assertions, but let's not worry about them here) are addressible.
This gives us another mechanism to define the type of that constant 3:

p := &int(3)

This works because a conversion must always create new storage.
By definition, a conversion changes the type of the result, so it must create a location of that type to hold the value.
We cannot say &3 because there is no type there, but by making the operation apply to a conversion, there is always a defined type.

Here are the examples above, rewritten in this form:

p1 := &int(3)
p2 := &rune(10)
p3 := &Weekday(Tuesday)
p4 := &Name("unspecified")

Discussion

Personally, I find both of these mechanisms attractive, although either one would scratch the itch.
I propose therefore that we do both, but of course the discussion may end up selecting only one.

Template

Would you consider yourself a novice, intermediate, or experienced Go programmer?

I have some experience.

What other languages do you have experience with?

Fortran, C, Forth, Basic, C, C++, Java, Python, and probably more. Just not JavaScript

Would this change make Go easier or harder to learn, and why?

Perhaps a little easier, but it's a niche problem.

Has this idea, or one like it, been proposed before?

Yes, in issue #9097 and probably elsewhere.

If so, how does this proposal differ?

A different justification and a new approach, with an extension of new.

Who does this proposal help, and why?

People annoyed by the difficulty of allocating pointers to simple values.

What is the proposed change?

See above.

Please describe as precisely as possible the change to the language.

See above.

What would change in the language spec?

The new operator would get an optional second argument, and/or conversions would become addressible.

Please also describe the change informally, as in a class teaching Go.

See above.

Is this change backward compatible? Breaking the Go 1 compatibility guarantee is a large cost and requires a large benefit.

Yes. Don't worry.

Show example code before and after the change.

See above.

What is the cost of this proposal? (Every language change has a cost).

Fairly small compiler update compared to some others underway. Will need to touch documentation, spec, perhaps some examples.

How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?

Perhaps none? Not sure.

What is the compile time cost?

Nothing measurable.

What is the run time cost?

Nothing measurable.

Can you describe a possible implementation?

Yes.

Do you have a prototype? (This is not required.)

No.

How would the language spec change?

Answered above. Why is this question here twice?

Orthogonality: how does this change interact or overlap with existing features?

It is orthogonal.

Is the goal of this change a performance improvement?

No.

If so, what quantifiable improvement should we expect?

More regularity for this case, removing a restriction and making some (not terribly common, but irritating) constructs shorter.

How would we measure it?

Eyeballing.

Does this affect error handling?

No.

If so, how does this differ from previous error handling proposals?

N/A

Is this about generics?

No.

If so, how does this differ from the the current design draft and the previous generics proposals?

N/A

@gopherbot gopherbot added this to the Proposal milestone Apr 19, 2021
@seebs
Copy link
Contributor

seebs commented Apr 19, 2021

How much would it break things to let new() take either a type or an expression which has an unambiguous type? Thus, new(int) or new(fnReturningInt()), or possibly even new(int(3)), but not new(3) because that hasn't got an unambiguous type? This would address the stuttering, I guess?

I think I dislike the implicit allocation on taking the address of non-addressible things, because I think basically all this means is that we will finally be able to replace the "loop variable shadowing and goroutines" thing with "i took the address of a thing in a map but writes to it aren't changing it" as the most frequently asked question about Go. If it only happens with &conversion(), though, that seems significantly more clear; conversion is clearly logically creating a new object, even if you convert a thing to exactly the type it already is.

So far as I can tell, object names and type names are the same namespace, it's not like C's struct-tag madness, but at any given time a given identifier refers only to one or the other.

@seankhliao seankhliao added v2 An incompatible library change LanguageChange Suggested changes to the Go language labels Apr 19, 2021
@clausecker
Copy link

Alternatively, what's the problem with adding composite literals of simple types, like int{3}?

@JAicewizard
Copy link

If we have this new new(typeOfExpression, Expression), would it be possible to do new(int32, int64(5))?? Not necessarily this specific case, but for any expression that does not match the specified type will there be an implicit conversion?

I think I like the new idea better, it is more explicit about what happens when you are taking an address of an unaddressable value.

Adding int{3} feels more of a workaround to me, than a solution. Adding a new way to do the same thing, just to solve a problem.

@faiface
Copy link

faiface commented Apr 19, 2021

What about generalizing the second approach and simply allow taking the address of a function result?

p := &f(...) // for any f

Conversions are just special functions, so this would cover them.

@eaglebush
Copy link

eaglebush commented Apr 19, 2021

I like the new proposal option to handle just the simple types and initialize to a value. I have created functions just for this. With the proposal approved, some of the constructs will be soon like this:

i := new(int, 42)

...much shorter than package prepended codes like this:

i := stdutil.NewInt(42)

Consequently...

i := new(int, func() int {
   r := rand.New(rand.NewSource(99))
   return r.Int()
}())

@peterbourgon
Copy link

I am more of a fan of the new builtin than most. It's regular and easy to use, just a little verbose. But a lot of people don't like it, for some reason.

I prefer &T{...} to new whenever possible, because it permits both construction and initialization in a single expression, which I think is important. The only circumstance where it didn't work is addressed by this proposal's second option. Nice! +1 from me.

As far as I can tell, this would make it possible to express all valid Go programs without using the new builtin. Bonus challenge: do the same for make :) I think it would boil down to extending the struct literal initialization syntax in some way that could cover just these 4 things:

make(chan T, n)
make(map[T]U, n)
make([]T, n)
make([]T, n, m)

@benhoyt
Copy link
Contributor

benhoyt commented Apr 19, 2021

@peterbourgon Probably we shouldn't derail this to try to get rid of make. :-) My preference is also the &int(3) type conversion syntax, as I too almost never use new -- not because I dislike new, just because it's not usually necessary. I also want to link to other previous discussions for reference (aside from #9097):

I used to be in favor of &expression (it that Rob's option "1a"?), but now I think there are too many concerns with it. For example, it means & would mean something different for an expression vs a variable: &expression would always give a new address, but &variable would always given the same address -- that seems non-intuitive. Related to this is the point @seebs made that you could then write &m[k], which would make it look like map entries are addressable, but they're not. For these reasons, I think plain &expression is a bad idea, despite it being nice and terse.

@faiface
Copy link

faiface commented Apr 19, 2021

@benhoyt If you only restrict & to variables and function calls, not arbitrary expressions, it's quite consistent because a result of a function call will naturally have a fresh address.

@benhoyt
Copy link
Contributor

benhoyt commented Apr 19, 2021

@faiface Yeah, I think that would be fine -- it doesn't have the problems with &arbitraryExpression that I noted. My issue #22647 actually grew out of trying to type &time.Now() when I was fairly new to Go.

@clausecker
Copy link

When supporting taking the address of return values, the question on whether returning makes a copy of the return value obtains. For example, consider code like this:

func addressTaker(x int, z **int) (y int) {
    y = x
    *z = &y
}

func example() {
    var ptr *int
    x := &addressTaker(42, &ptr)

    // at this point, does x == ptr hold?
}

@benhoy Not really in favour of the new(value) proposal as it opens the can of worms that is having to distinguish between types and expressions in the parser (at least it seems so).

@clausecker
Copy link

I also kinda wonder why the obvious &int{3} idea is not mentioned. Though yes, the type conversion comes with the obvious advantage (or possibly disadvantage?) of being more flexible with the type of its argument. Supporting both uses might even be sensible (one for when you want a type conversion to happen, possible with a go vet if there is none) and one for when you do not want a type conversion.

@golang golang deleted a comment Apr 19, 2021
@mcandre
Copy link

mcandre commented Apr 19, 2021

Rob, don't tell me about such a gap. I was implementing Bliss interface for so long.

@robpike
Copy link
Contributor Author

robpike commented Apr 19, 2021

@clausecker Because why add a new construct (&int{3}) when you can use an existing one?

@clausecker
Copy link

clausecker commented Apr 19, 2021

@robpike Compound literals too are an existing construct and taking the address of them is already legal. So it's as much “adding a new construct” as the &int(3) idea is; in both cases the rules need to be made more lenient to support a case that was previously not allowed with no syntactical changes; in case of &int(3) taking the address must be made legal, in case of &int{3} using a composite literal for a scalar.

@ninedraft
Copy link

The new(T, value) variant has an unpleasant feature: for boolean and string values, it adds excessive visual noise. For example: new(bool, true), new(string, "bottle of ram").

As far as I understand, only numeric literals have a problem with unambiguous type inference.

With the above in mind, & + typecast seems like a more viable approach for me, if it will allow us to omit type in string and boolean cases.

Examples:

_ = &int(42)
_ = &true
_ = &"brains"

type Name string
_ = &Name("what's my name?")

type Count int64
_ =&Count(100500)

@thejerf
Copy link

thejerf commented Apr 19, 2021

The Go 2 playground permits the function:

func PointerOf[T any](t T) *T {
	return &t
}

If I break this issue up into cases, I end up with either "I need this zero times in a module" (by far the dominant case), "I need this once or twice" in which case I would just take the couple of extra lines, and "I need this all over the place" in which case, either define that function or pull it in from somewhere once the generics are out. If one is using this a lot one may prefer a shorter name than PointerOf, I was just going for maximum clarity over length.

I'd suggest just waiting for generics to drop and writing/providing that function.

@zkosanovic
Copy link

@ninedraft

With the above in mind, & + typecast seems like a more viable approach for me, if it will allow us to omit type in string and boolean cases.

But you can't omit the type. The description clearly says that type conversion will be addressable, not the values themselves.

It would have to be:

_ = &bool(true)
_ = &string("brains")

And TBH I'm fine with that. Having something like &"foobar" feels a bit... odd.

But either way, having the Option 2 would be very cool IMO.

@smasher164
Copy link
Member

I'd suggest just waiting for generics to drop and writing/providing that function.

While it's true that generics would allow you to write the PointerOf function, I think this (second) proposal would make it much easier to learn the language. Having to write/use a function for something that has first-class syntax with composite literals is counterintuitive.

@sanggonlee
Copy link

If I can add voice here, I would much prefer option 2 than 1.
The fact that simple type literals had deeper underlying types was hidden away from convenience syntax anyway (for example, 3 having int type as default while it could also have been int32).
Syntax in option 1 seems a bit awkward passing two args separately, one for type and one for expression even though the two are inherently bound with each other.
Technically the same goes for option 2, but in this case at least it gives a stronger visual cue that the 3 belongs to the int32 type in &int32(3), which seems more consistent with type conversion form used widely.

@sethvargo
Copy link
Contributor

Do we have any data on new() vs &{} usage in the wild? Anecdotally (and supported by others on the threads), I feel like &{} is far more common than new, but it would be excellent if we had some data to back that up.

I'm definitely preferential to option 2 (&int64(11)).

@rh-kpatel4
Copy link

Why not &(int64(11))? This is proper scoping to take the output of () and return pointer to it &()?

@FiloSottile
Copy link
Contributor

What about generalizing the second approach and simply allow taking the address of a function result?

p := &f(...) // for any f

Indeed, I understand the difference between conversion and function calls, but I feel like people learning Go will be confused by &int(3) working while &add(1, 2) doesn't.

Function calls have defined types, so I can't think of any issue with taking their pointer, and I definitely had to be reminded by the compiler that it wasn't allowed a few times.

I never use new() simply because I don't want to choose between two ways of doing the same thing, so I am partial to doing just Option 2, but the last part of Option 1 feels like a better landing place for & completeness.

@earthboundkid
Copy link
Contributor

I like that Jerf's PointerOf function adds nothing to the language itself. It could be added to the builtins as newof or newval or something. With the addressTaker example above, it allocates a new pointer for x, which is unambiguous.

@bcmills
Copy link
Contributor

bcmills commented Apr 19, 2021

All of the proposed options seem better than the status quo, but still have the downside of requiring types to be written out explicitly even when they are obvious from the value. Compare:

	d := time.Millisecond
	p1 := &d  // No noise from types!

vs.

	p1 := new(time.Duration, time.Millisecond)
	p2 := &time.Duration(time.Millisecond)

In contrast, the generic approach (#45624 (comment)) does not stutter on types, but requires the introduction of a new name for the generic function.

So I wonder if it would be preferable to add a generic builtin instead:

	d := ptrTo[time.Duration](time.Millisecond)

or

	d := ptrTo(time.Millisecond)

I don't feel strongly about the specific name, but I think the ergonomics of a generic function are much nicer than the proposed ergonomics of new.

@fkarakas
Copy link

When initially @chai2010 made the first proposal, it was considered as "adding a third syntax seems not a good plan" now that rob pike propose it, it is wonderful !!! so go maintainers you can do whatever you like....

@ianlancetaylor
Copy link
Contributor

Several people (@rogpeppe, @robpike and others) have commented that rather than both new(T, v) and new(v), we should only have new(T, v). That means that the first argument to new is always a type.

It's true that this makes the expression more verbose in some cases. However, it seems reasonable to guess that most cases where a long type is used are structs, and for which we have the &S{} literal notation. It seems less likely that people will want to write new(T, v) with a simple v and a complicated T. At least, it would be interesting to see specific examples where that comes up.

new(T, v) also matches make(T, length) syntactically, although the meaning is different.

As discussed above, there are several reasons why using an & syntax seems potentially confusing. The new(T, v) syntax should be fairly clear and never confusing.

@rogpeppe
Copy link
Contributor

@ianlancetaylor That's a slight mischaracterisation of my comment. To repeat, my preference is still to choose a new spelling for a function takes a value only. I'm not keen on shoehorning the functionality into new.

@Merovius
Copy link
Contributor

Merovius commented Jul 20, 2023

I don't particularly like new(T, v) either. I think the similarity to make detracts from its appeal, instead of adding to it: There already is some confusion of why make is used for some types but new (or &T{}) used for others. I also don't think it really applies - in make(T, v), the v means something completely different from what it'd mean in new(T, v). Lastly, in my opinion the extra overhead of typing out the T is significant in many cases, where the value you want it initialized to can be inferred. new(int64(42)) isn't any more to type or read than new(int64, 42), but new(time.Second) is significantly better than new(time.Duration, time.Second). I don't think having the type in there really adds anything. We are already kind of used to inferring the type from a constant literal.

That being said, new(T, v) is still better than where we're at, so if we can't agree on new(v) and if we can't agree on a better name for pointerTo(v), then I'd live with new(T, v).

@icholy
Copy link

icholy commented Jul 20, 2023

but new(time.Second) is significantly better than new(time.Duration, time.Second).

It's better if I already know that time.Second is value. But if you're reading new(pkg.Ident), there's no way to tell which overload you're using without checking the definition of pkg.Ident.

@rogpeppe
Copy link
Contributor

rogpeppe commented Jul 20, 2023

Another point: the new(T, v) form is also inconvenient in the not-uncommon case where we want to make a copy of a pointer type.

e.g.

func f(x Foo) {
   x.field = new(int, *x.field)   // where int is the type of Foo.field
   // use x.field without worrying about shared pointers.
}

We have to mention the type where there's otherwise no need for it and it might not be obvious, making the code a little more brittle.

With a "pointerTo"-style function, it's nicer IMHO:

func f(x Foo) {
   x.field = ref(x.field)
   // use x.field without worrying about shared pointers.
}    

// ... for some spelling of "ref"
func ref[T any](x T) *T { return &x }

@earthboundkid
Copy link
Contributor

I don’t like the new(v) form because it depends on the reader knowing if v is a value or a type. Between new(T, v) and ref(v), I could live with either, and it really just depends on if you think it’s better to not add another predeclared identifier or better to not add an overloaded form for an existing identifier.

@ianlancetaylor
Copy link
Contributor

@rogpeppe Apologies for the mischaracterization.

@Merovius
Copy link
Contributor

FWIW I agree that new(X) is ambiguous because it requires to know if X is a value or a type. I just used that to make the point that I don't like new(T, v). I think my favorite version would be ptr(v), but I assume ptr is not an acceptable name. Someone might have a better one.

But as I said, if we can't agree on any color of bikeshed for a type-argument-less version of this builtin, new(T, v) is still better than nothing.

@DeedleFake
Copy link

If new(T, v) is the form that gets adopted, I will probably just find myself writing

func ptr[T any](v T) *T { return &v }

in random places anyways to avoid the hassle, though admittedly less often as new(T, v) will be useful sometimes. Overall I think overloading new() is the wrong way to go, especially if it's going to force it into a significantly less helpful syntax.

@willfaught
Copy link
Contributor

We could lean into the syntax we already have:

&&value
&&time.Second
var p *int = &&123

@jimmyfrasche
Copy link
Member

One of the subproposals discussed in #34515 is to omit the type in make/new whenever you could omit the type in a call to a generic function. new(Type, value) would likely take that option off the table as that brings it back to new(value).

@metux
Copy link

metux commented Oct 2, 2023

Bonus challenge: do the same for make :) I think it would boil down to extending the struct literal initialization syntax in some way that could cover just these 4 things:

apropos male():

Still havent understood why map fields always need to be explicitly initialized by make(), instead of directly working off the zero value. This makes using maps in structs much more complicated.

@DeedleFake
Copy link

Every type's zero value is literally the memory all set to zero. A map is a pointer to a struct internally, so its zero value is literally just a nil pointer. Trying to make the zero value behave differently would fail in the following situation, among others:

func addThing(m map[string]string) {
  // This would allocate an hmap, but only this m would get set to its address.
  m["example"] = "This is an example."
}

func main() {
  // Remember, this is a *hmap.
  var m map[string]string

  // addThing() gets a copy of the address, currently nil.
  addThing(m)
  // No matter what addThing() does, the local m is still nil at this point.
}

@metux
Copy link

metux commented Oct 4, 2023

Every type's zero value is literally the memory all set to zero. A map is a pointer to a struct internally, so its zero value is literally just a nil pointer. Trying to make the zero value behave differently would fail in the following situation, among others:

func addThing(m map[string]string) {
  // This would allocate an hmap, but only this m would get set to its address.
  m["example"] = "This is an example."
}

Ah, so a map can be directly passed as value (instead of reference/pointer to it) while still
using the same underlying hashtable. Good news to me, wasn't sure that's really the case :)

Indeed, now an on-demand allocation would cause this kind of trouble. If we'd ever go that
way, this would need to be clearly documented and probably code checkers should look
for bugs where the programmer might have forgotten it ... certainly not nice.

But what would happen (besides extra compiler complexity) if we'd let it implicitly emit an
makemap() call, if it doesn't find another one (that probably asks for different initial size) ?
IIRC, the worst that could happen (when there is an explicit make that goes unnoticed),
we might have an wasted allocation or memory clear, but shouldn't hurt semantics at all.

Am I missing something ?

By the way, still haven't fully understood how code generation and runtime code really work
together ... if a program doesn't use maps at all, does the deadcode elimination kick out
all the map-related runtime code ?

thx.

@adonovan
Copy link
Member

adonovan commented Dec 6, 2023

Summarizing some comments:

  • &v is unclear as to whether it allocates, or returns an existing variable's address.
  • new(v) is concise but easily confused for new(T).
  • new(T, v) is clear and similar to make, but the type is redundant.
  • the function can be written as a one-liner: func addr[T any](v T) *T { return &v }

Given that it is trivial to write the helper function, a language change would add marginal value.
@robpike, do you still think there's a need to do anything here?

@robpike
Copy link
Contributor Author

robpike commented Dec 7, 2023

@adonovan There is certainly no need, but I still find the imbalance troubling: it's easier to build a pointer to a complex thing than to a simple one.

None of the bullet points in your list seem fatal to me. The first one is irrelevant to what I suggested, the middle two are true but not clearly problems, while the ease of writing that function doesn't touch the fundamental asymmetry.

@findleyr
Copy link
Contributor

Leaving open until someone brings this discussion to a consensus.
— rfindley for the language proposal review group

@perj
Copy link

perj commented Jan 11, 2024

the function can be written as a one-liner: func addr[T any](v T) *T { return &v }

Judging from the past 1.5 years, I appear to be writing this function about once every second month, when I need it in a new package. The need especially arise with pointers to strings in unit test files, I've noticed.

Admittedly, I work a lot with code generated from API specifications. That code tend to use *string a lot, since it can't be certain that nil and empty string are equivalent (and rightly so, they aren't).

It's not very annoying, but does feel a bit like I'm littering my packages with this function, so not having to write it would be welcome. I do realise I can put it in a package I import, but that also seems overkill for a one-liner.

@smasher164
Copy link
Member

As far as being explicit about allocation is concerned, I think Go is well past that point. Whether or not an object is on the stack or the heap is entirely predicated by escape analysis. If anything, &v is more like ML's ref v which says "Hey here's a mutable reference to this value-whether it's on the stack or the heap is implementation-defined."

@andig
Copy link
Contributor

andig commented Jan 12, 2024

Given that it is trivial to write the helper function, a language change would add marginal value.

The helper function can easily be called with a pointer value which would typically be a programming error. Having a language construct could prevent this, either by rule or by removing a layer of indirection.

@earthboundkid
Copy link
Contributor

  • new(T, v) is clear and similar to make, but the type is redundant.

If the type inference worked well enough (perhaps by just hard coding this) that you could write new(T, { field1, field2 }) instead of new(T, T{ field1, field2 }), it wouldn't be so bad. The comma is weird, but it exists so you don't confuse T with v, which is good.

@ianlancetaylor ianlancetaylor changed the title proposal: expression to create pointer to simple types proposal: spec: expression to create pointer to simple types Aug 6, 2024
@ianlancetaylor ianlancetaylor added LanguageChangeReview Discussed by language change review committee and removed v2 An incompatible library change labels Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal
Projects
None yet
Development

No branches or pull requests