Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: allow cap(make([]T, m, n)) > n #24204

Open
mdempsky opened this issue Mar 1, 2018 · 46 comments
Open

proposal: spec: allow cap(make([]T, m, n)) > n #24204

mdempsky opened this issue Mar 1, 2018 · 46 comments
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal
Milestone

Comments

@mdempsky
Copy link
Contributor

mdempsky commented Mar 1, 2018

Currently, the Go spec requires that cap(make([]T, n)) == n and cap(make([]T, m, n)) == n. I propose relaxing this constraint from == to >=.

Rationale: the Go runtime already pads allocations up to the next malloc bucket size. By treating the user supplied capacity argument as a lower bound rather than exact bound, there's a possibility to make use of this padding space.

Also, if users really need an exact capacity (which seems like it would be very rare), they can write make([]T, n)[:n:n] or make([]T, m, n)[:m:n].

(See also discussion in #24163.)

@mdempsky mdempsky added the v2 An incompatible library change label Mar 1, 2018
@gopherbot gopherbot added this to the Proposal milestone Mar 1, 2018
@bradfitz
Copy link
Contributor

bradfitz commented Mar 1, 2018

It would be hard to "gofix" this in some hypothetical "go1to2" tool. We'd need to conservatively rewrite all make into make([]T, m, n][:m:n] which would make the rewritten code pretty ugly.

@josharian
Copy link
Contributor

It might be an interesting experiment to hack this into the compiler/runtime and do a regression test and see what breaks (@dsnet).

Yes, it'd be ugly, but the gofix rule would at least be very simple.

As I noted in #24163, this optimization could be made opt-in (instead of opt-out) with an appropriate runtime function.

There's another option in this arena, which is to say "roughly" rather than == or >= in the spec, as we do with maps. The compiler and runtime could use this freedom along with PGO or introspection to make better decisions, treating the info from the user as just a hint to be used when no other info was available.

@cznic
Copy link
Contributor

cznic commented Mar 1, 2018

The three-index syntax of a slice was introduced to extend the precise control of slice cap. The proposal, even though in a different place, goes exactly the opposite way. I realize it can bring some performance improvements, but that's IMO not a good reason for breaking backward compatibility with Go 1 programs. Please consider a new syntax enabling this feature. There are plenty of options.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 1, 2018

Please consider a new syntax enabling this feature. There are plenty of options.

make([]T, 2, ~10)

Fortunately the bitwise complement operator is ^, freeing up tilde!

@cznic
Copy link
Contributor

cznic commented Mar 1, 2018

make([]T, 2, 10+)

@josharian
Copy link
Contributor

Also time.NewTicker(~3*time.Second). (I’m mostly joking, but macOS actually does support vague timers; they can be coalesced for power savings or spread to avoid cpu spikes.)

@mdempsky
Copy link
Contributor Author

mdempsky commented Mar 1, 2018

As I noted in #24163, this optimization could be made opt-in (instead of opt-out) with an appropriate runtime function.

That becomes rather clumsy though, because to properly round you'll need to incorporate the element size too, not just the element count. So users would have to write something like:

make([]T, runtime.RoundUp(n * unsafe.Sizeof(T{})) / unsafe.Sizeof(T{}))

or

make([]T, runtime.RoundUp(n, unsafe.Sizeof(T{})))

(And of course T{} needs to be changed appropriately if T is something other than an array/struct/slice/map type.)

The three-index syntax of a slice was introduced to extend the precise control of slice cap.

That control is important, but seems exceedingly rarely used.

Looking through Google's internal Go code base, I've found only a couple uses of full slice expressions, and they're all of the form:

x = append(x[:len(x):len(x)], y)

where the original x slice came from somewhere else (e.g., function argument or global), and they want to make sure to create a new slice.

I strongly suspect the majority of users don't care about how much capacity is allocated for them by make. I don't see any evidence that users are dissatisfied with the lack of control over exactly how much
capacity is allocated by append when it needs to grow the slice, for example. If they were, I'd expect to see full slice operations used in other contexts.

@dsnet dsnet added the LanguageChange Suggested changes to the Go language label Mar 1, 2018
@faiface
Copy link

faiface commented Mar 1, 2018

(Don't get offended.) This proposal basically proposes to cripple the language and make it less predictable (like really, imagine debugging a bug caused by this) in order to accommodate a very minor CPU-architecture-specific performance optimization. This mistake has been done many times in many languages. Don't make Go one of them.

@slrz
Copy link

slrz commented Mar 1, 2018

I have recently written some code that would be broken by this change. Easy to fix, of course, but it'd be interesting to know whether this is more common.

The gist of it was:

var commonPrefix = make([]byte, N)

func f(arg Type) []byte {
        return append(commonPrefix, computeSuffix(arg)...)
}

@mdempsky
Copy link
Contributor Author

mdempsky commented Mar 1, 2018

This proposal basically proposes to cripple the language

I think that's strongly overstating the potential downsides.

Here's a simple experiment to just uniformly overallocate every make call of less than 1000 elements:

--- a/src/runtime/slice.go
+++ b/src/runtime/slice.go
@@ -58,6 +58,10 @@ func makeslice(et *_type, len, cap int) slice {
                panic(errorString("makeslice: cap out of range"))
        }
 
+       if uintptr(cap) < 1000 && et.size < 10000 {
+               cap++
+       }
+
        p := mallocgc(et.size*uintptr(cap), et, true)
        return slice{p, len, cap}
 }

This breaks a couple standard library and regress tests, but they all appear to be for verifying that make itself works according to spec.

There was one hairy crash in runtime/trace, which I've now diagnosed. The process for tracking it down was:

  1. I first searched for appearances of "cap(" in runtime/trace/*.go; finding none, I looked in runtime/*.go.
  2. I noticed "allp[:cap(allp)]" in runtime/trace.go, so then I looked at where allp was allocated.
  3. In proc.go, we allocate make([]*p, nprocs).
  4. I experimentally changed this to make([]*p, nprocs)[:nprocs:nprocs], and the crash went away.

I would argue the real issue here is that runtime is using an inconsistent mix of len(allp) and cap(allp). Another fix that works perfectly fine is just changing all cap(allp) to len(allp), which seems desirable if only to keep the code simpler and easier to understand.

(like really, imagine debugging a bug caused by this)

It didn't seem too bad. :)

@josharian
Copy link
Contributor

Here's a simple experiment

Now that you have it working, if you change it to round up instead, are there any notable stdlib benchmark changes?

Interestingly, allp cap issues were also what I hit with CLs 21813 and 22197, which changed append behavior, which is further evidence that just using len is probably better in this case.

@faiface
Copy link

faiface commented Mar 2, 2018

Btw, if someone really wants to make this happen, I would suggest a slight modification and get the requirements as follows:

cap(make([]T, n)) >= n
cap(make([]T, m, n)) == n

So, when you don't specify the capacity explicitly, the compiler can optimize that for you. This would be sort of acceptable, I guess.

@bcmills
Copy link
Contributor

bcmills commented Mar 2, 2018

We'd need to conservatively rewrite all make into make([]T, m, n][:m:n] which would make the rewritten code pretty ugly.

Fortunately, make is a builtin and not a keyword, right?

So you add a new package, go1, with implementations of any builtins that changed, and rewrite make([]T, m, n) to go1.Make([]T, m, n).

Or, you add a three-arg “bounded make” which specifies a lower and upper bound, and rewrite make([]T, m, n) to make([]T, m, n, n).

@faiface
Copy link

faiface commented Mar 3, 2018

Or, you add a three-arg “bounded make” which specifies a lower and upper bound, and rewrite make([]T, m, n) to make([]T, m, n, n).

I don't think you care about the lower/upper-bound at all. You either care about the exact capacity, or you don't, in which case the compiler can insert anything and it's fine.

@bcmills
Copy link
Contributor

bcmills commented Mar 3, 2018

@slrz You could also write that code as

var commonPrefix = make([]byte, N)

func f(arg Type) []byte {
        return append(append([]byte{}, commonPrefix...), computeSuffix(arg)...)
}

If #18605 and #12854 are accepted, that would simplify further:

func f(arg Type) []byte {
        return append({}, commonPrefix..., computeSuffix(arg)...)
}

That seems much more difficult to go fix, but makes the behavior of the append call no longer depend on an allocation site that may become arbitrarily far away as the code evolves.

@josharian
Copy link
Contributor

A related question for this proposal: Is it ok for slice literals to have cap > len? E.g. can cap([]byte{0}) be > 1?

@ianlancetaylor
Copy link
Contributor

I tend to feel that if you ask for an explicit capacity, as in make([]T, n, m), you ought to get that capacity.

I think there is an argument for not precisely specifying the capacity for make([]T, n).

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 24, 2018
@mdempsky
Copy link
Contributor Author

It occurs to me that even if make([]T, n, m) continues to reserve exactly m capacity, a user could write make([]T, m)[:n] to get the flex-capacity behavior. Also, make([]T, m)[:n] (to get flex-capacity using an exact-capacity make) is slightly less clunky than make([]T, n, m)[:n:m] (to get exact-capacity using a flex-capacity make).

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/111646 mentions this issue: text/tabwriter: allow initial slice size to be rounded up

@josharian
Copy link
Contributor

josharian commented May 6, 2018

Now that CL 109816 is in, you can sorta kinda approximate this in user code:

s := append([]T(nil), make([]T, m)...)[:0]

It's more expensive than make merely rounding up, but not dramatically so.

CL 111646 shows it in action, along with some benchmark results. You can see that there is a pretty significant improvement in some cases. And the regressions in other cases would go away if this were a language change rather than a compiler optimization of a pretty awkward bit of code.

Might be interesting for Go 1.12 to look at other make([]T, 0, m) instances in the standard library and see whether analogous changes are worthwhile.

@martisch
Copy link
Contributor

martisch commented May 6, 2018

Interesting! I had not thought about this issue in relation to https://golang.org/cl/109517 " cmd/compile: optimize append(x, make([]T, y)...) slice extension".

I will have a look at recognizing append([]T(nil), make([]T, m)...)[:0] and rewriting it into
a special makeslice variant (that optimizes the cap for allocation bucket size) and see how much better it could perform vs the current extend optimized version by cl/109816 which goes through growslice.

Others then can use that cl to do their own experiments and see if it helps and we do not have to change the spec for it. If it turns out to be helpful in comparison to the complexity it adds to the compiler we could submit the cl to recognize: append([]T(nil), make([]T, m)...)[:0] or alike making a spec change not necessary to cover the optimization use case until this proposal has been accepted.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/111736 mentions this issue: cmd/compile: optimize append([]T(nil), make([]T, n)...)

@mdempsky
Copy link
Contributor Author

mdempsky commented Nov 8, 2019

If we want to require users to opt-in to flexible capacity allocation, I think CL 111736 is the right solution: just recognize an existing valid code pattern like append([]T(nil), make([]T, n)...) (as horrible as that is).

My hypothesis with this proposal though was that most code would benefit from allowing the runtime to automatically increase capacity of initial slice allocations, just like how it automatically increases it for appends.

@alanfo
Copy link

alanfo commented Nov 8, 2019

Provided it can be done in a backwards compatible, opt-in, basis, this proposal looks worthwhile to me.

Although I haven't always been comfortable with some of the uses proposed for ... lately, @josharian's suggested syntax:

make([]T, n, ...)

looks ideal here.

@tandr
Copy link

tandr commented Nov 8, 2019

make([]T, n, ...)

means at least n allocated.

What about "at least n+100" for example?

make([]T, n, n+100...)
make([]T, n, 100+n...)

does look weird a bit. (and I am not sure about compiler complications)

@magical
Copy link
Contributor

magical commented Nov 8, 2019

What about "at least n+100" for example?

make([]T, n+100, ...)

@josharian
Copy link
Contributor

@tandr or you could write make([]T, n+100, ...)[:n]. Not particularly graceful, but I also don't know how common this case is.

@mdempsky
Copy link
Contributor Author

mdempsky commented Nov 8, 2019

I am not sure about compiler complications

make([]T, n...) and make([]T, n, _) are already valid syntax (e.g., they're already handled by gofmt today); they're just rejected by type-checking. This is a small change.

make([]T, n, ...) is not valid syntax. It's going to require every Go parser and every Go AST to change. This is a large change.

(That said, I'm not convinced new syntax is merited here.)

@tandr
Copy link

tandr commented Nov 8, 2019

Thanks @josharian, it is indeed a possibility.

As I see it for make - first param is a type, second is how much len() will return, and third one is how much (or at least :) ) cap() will return. Changing it that to "ungraceful" syntax breaks the mental model that is here for 10 years already (congrats btw :) ).
I see changes of that magnitude as something that makes code simpler to understand, and more elegant if possible. I do not look forward to a change that makes it easier in one and more painful in some other (already established) scenarios, sorry.

@tandr
Copy link

tandr commented Nov 8, 2019

but I also don't know how common this case is.

We have places in our code we know upper bound, but not a lower one for the slice or map, and start filling it up filtering the values from another (streaming) source. It is not "common" but it does happen. Cases where map or slice got expanded past initial known capacity are rather got collapsed into using default capacity - aka make([]T) or make([]T, n)

@bcmills
Copy link
Contributor

bcmills commented Dec 4, 2020

This change as originally proposed does not comply with http://golang.org/design/28221-go2-transitions#language-redefinitions, so I don't think it could be adopted as-is: the spec clearly states that make(T, n) allocates a slice of type T with length n and capacity n, so this would be a “redefinition”.

@bcmills
Copy link
Contributor

bcmills commented Dec 4, 2020

That said, if we get generics you could envision a generic library function that encapsulates some idiom that the compiler can recognize:

package slices

// Make allocates and returns a slice of n elements of type T, with an arbitrary capacity ≥ n.
func [type T]Make(n int) []T {
	return append([]T(nil), make([]T, n)...)
}

That gives call sites like

var s = slices.Make[T](n)

which to me doesn't seem appreciably worse than any of the other syntax alternatives mentioned above, and does not require any additional language or tooling changes beyond generics per se.

@seebs
Copy link
Contributor

seebs commented Aug 8, 2021

I ran into this recently, in the context of knowing a size I wanted to request, but it would definitely be preferable to actually use the whole space (just building buffers, so no benefit to me in a subslice of a buffer followed by unused space).

I'd been thinking about how to express this, and in something more like C, I'd probably advocate for make([]byte, n, -1) for "enough to hold n bytes, but whatever you have is fine". I don't think that would break existing code, but it feels pretty messy. I also thought of _ for a don't-care value, but again, that feels messy.

@seebs
Copy link
Contributor

seebs commented Sep 26, 2021

And here I am running into exactly this again, having totally forgotten about it since the last time was... apparently a bit over a month ago.

I spent a little time noodling on concepts like, say, runtime.SizeRounded(...), but anything using that looks a lot messier; you end up having to handle size-per-item yourself, and so on, or you pass in an object and runtime is using reflect, or it's a compiler special case... and at that point, we already have make/append being compiler-magic, and doing the right thing.

One alternative that would require a bit of work, but would be more unambiguously new, would be to use a 4-argument make, so (type, len, cap, extra), but it's not obvious that "extra" could have anything meaningful -- really, the only practical cases are (1) i want an exact cap, (2) i want a cap rounded up to size class. if i want more than one size class of rounding, i should probably just specify a larger goal anyway.

So really, I think that of the things that I've seen, make([]foo, l, c...) is probably the clearest; it allows expressing that i definitely want a cap of at least c, which may be larger than l, but that i don't actually need it to be no-more-than, and would prefer the size class rounding if that applies.

@dsnet
Copy link
Member

dsnet commented Sep 26, 2021

With #45955 we have:

// Grow grows the slice's capacity, if necessary, to guarantee space for
// another n elements. After Grow(n), at least n elements can be appended
// to the slice without another allocation. If n is negative or too large to
// allocate the memory, Grow will panic.
func Grow[S constraints.Slice[T], T any](s S, n int) S

which I assume has the freedom to allocate a larger capacity than len(s)+n, and also presumably one that lands on a size class boundary.

Would this suffice for what this issue is aiming to accomplish?

@Jorropo
Copy link
Member

Jorropo commented Apr 3, 2022

Problem

FYI, this is a real issue for me, I am implementing a opt slice optimization pass. And It's made more complex and less effective because of the strict definition of cap(make([]T, l, c)) == c.

What I write fuses slice operations together.
For example it turns:

a = append(append(a, b...), c...)

Into:

a = slices.Grow(a, len(b)+len(c))
a = a[:len(a)+len(b)+len(c)]
copy(a[len(a):], b)
copy(a[len(a)+len(b):], c)

Which is faster, do only one allocation and is smaller.

Doing that with slices from make is not pratical, it requires extensive escape analysis and proving that the shared part of the slice is not overwritten (which does not even work in all cases we would want to).
Simply, with the current spec this behaviour is observed:

a := make([]int, 10)
b := append(a, 1) // b MUST be a copy of a, because the required capacity is 10+1 and a's capacity is 10.
doStuff(a)
fmt.Println(b[0]) // 0 even if doStuff do a[0] = 1

That forbid to write the optimization that we want here:

a := make([]int, 10, 11)
b := a[:10] // Take a reference instead
b[10] = 1
doStuff(a) // a[0] = 1
fmt.Println(b[0]) // it is now 1 modified by doStuff (wrong)

Solutions

On the solutions sides of things, this is a breaking change of the spec, however I have never seen code that relies on that.
To me it seems everyone use new := append(a[:0:0], a...) or new := make([]typ, len(a)); copy(new, a) when you want a copy of a slice.
So maybe, this would be fine to break ? On the other hand, having what used to be copies now be references sounds like a very insidious and really hard to find bug, and that isn't easily statically searchable either because the make and the append can be separated by 3 package boundaries and 2 interface ones and this property still hold (altho in this condition the optimizer would likely not be able to optimize this anyway).

I do like @bcmills's solution (#24204 (comment)).
But I don't know if it's worth having teach to new go developers that there is two 2 make, a slower builtin one, and a potentially faster slice.Make one, their syntax isn't even matching and the one you probably want is the longer to write, so I don't know if we expect should expect new go devs to use it much.

@Jorropo
Copy link
Member

Jorropo commented Apr 3, 2022

@dsnet
Grow isn't really what this issue is asking for.
If you use it that way it's equivalent:

s := slices.Grow([]Type(nil), c)[:l]

But that is very unergonomic.

slices.Grow would be a more less direct equivalent to C++'s vector.reserve call.
In day to day programing you would use it like this:

// assume data is a slice of whatever type
data = slices.Grow(data, len(more))
for _, v := range more {
  data = append(data, process(v))
}

@CAFxX
Copy link
Contributor

CAFxX commented Apr 26, 2022

Was thinking about this again and there may be a way (that AFAICT has not been proposed yet) to get the same benefits without language changes: modifying append to check if there is any unused tail in the current allocation, past the slice capacity.

Let's say the user has initially allocate the slice with make([]byte, 0, n). The slice capacity is n, but the actual allocation is rounded up to the next sizeclass. When the user asks to append to the slice, and the append can't be satisfied with the current capacity n, growslice would then check if by extending the slice capacity to cover the sizeclass of the allocation the append could be satisfied without malloc+move: if so, it extends the capacity of the slice to the sizeclass size and performs just the copy from the source slice.

This would AFAIK preserve all current semantics (I am not aware of any guarantees made by append on the capacity after an append) while allowing to use the whole allocation if possible, would not require users to know specific incantations, and would have the potential benefit of reducing the amount of unused space in allocation tails (because slice capacities would tend to end up sizeclass-aligned after repeated appends). The biggest problem would be figuring out how to know whether the allocation tail is really unused, or if the user has manually shrunk the capacity (and therefore the tail can't really be used) but it doesn't seem like an impossible problem.

@randall77
Copy link
Contributor

@CAFxX I don't think we can quite do that, unfortunately, due to the problem you mention at the end.

s := make([]byte, n, n)
s[n-1] = 77
t := s[:n-1:n-1]

When we append to t, can we overwrite the 77? I don't think we can. Depending on n, I don't see how to distinguish the cases of originally allocating the full size class and 3-arg slicing it down (in which case we can't use the
tail space), versus allocating less than the full size class initially (in which case we can).

We could figure that out by recording the actual allocated size in the heap metadata somehow. But that brings up the additional problem of multiple threads are simultaneously appending to t. How do we coordinate who gets to use that tail space? We'd need some atomic access guarding who gets to use the tail space.

It may be doable, but doesn't seem easy (coding-wise), and possibly seems expensive (runtime-wise).

@CAFxX
Copy link
Contributor

CAFxX commented Apr 26, 2022

Yes, the simplest (but untested and somewhat still incomplete) idea I could come up with so far is storing a small (AFAIK we really need a single bit to signal whether the tail can be used or not) flag in the tail (or head) of the allocation, and accessing it atomically during modifications to the slice capacity (that, assuming uncontended access, shouldn't be horribly slow).

Definitely, it wouldn't be trivial... but OTOH it would have pretty important upsides (works transparently for existing code, no language additions/redefinitions, no special syntaxes to get the compiler to do the right thing, ...)

@thepudds
Copy link
Contributor

Following up on the comment from @dsnet from last fall, one update is that Grow in x/exp/slices does now take advantage of the optimization @josharian described above so that the capacity gets rounded up to the next alloc size class:

func Grow[S ~[]E, E any](s S, n int) S {
	return append(s, make(S, n)...)[:len(s)]
}

A similar optimization might be possible in the Copy function there. (I will look at that and possibly send a CL if it makes sense).

And of course, people could have their own generic Make that follows a similar pattern.

@mateusz834
Copy link
Member

Have you considered making this kind of change with the use of go directive (similarly to loop variable semantics)?

@ianlancetaylor ianlancetaylor changed the title proposal: Go 2: allow cap(make([]T, m, n)) > n proposal: spec: allow cap(make([]T, m, n)) > n Aug 6, 2024
@ianlancetaylor ianlancetaylor added LanguageChangeReview Discussed by language change review committee and removed v2 An incompatible library change NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 6, 2024
@ianlancetaylor
Copy link
Contributor

@mateusz834 The loop variable semantics change was a one-off, done because we felt it was important enough to break our own rules. We aren't going to do it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal
Projects
None yet
Development

No branches or pull requests