Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: detect and optimize slice insertion idiom append(sa, append(sb, sc...)...) #31592

Closed
zigo101 opened this issue Apr 21, 2019 · 12 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@zigo101
Copy link

zigo101 commented Apr 21, 2019

What version of Go are you using (go version)?

go version go1.12.2 linux/amd64

Does this issue reproduce with the latest release?

yes

What did you do?

package main

import (
	"testing"
)

type T = int
var sx = []T{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
var sy = []T{111, 222, 333, 444}
var index = 6

func SliceInsertion_OneLine(base, inserted []T, i int) []T {
	return append(base[:i], append(inserted, base[i:]...)...)
}

func SliceInsertion_Verbose(base, inserted []T, i int) []T {
	if cap(base)-len(base) >= len(inserted) {
		s := base[:len(base)+len(inserted)]
		copy(s[i+len(inserted):], s[i:])
		copy(s[i:], inserted)
		return s
	} else {
		s := make([]T, 0, len(inserted)+len(base))
		s = append(s, base[:i]...)
		s = append(s, inserted...)
		s = append(s, base[i:]...)
		return s
	}
}

var s1 []T
func Benchmark_SliceInsertion_OneLine(b *testing.B) {
	for i := 0; i < b.N; i++ {
		s1 = SliceInsertion_OneLine(sx, sy, index)
	}
}

var s2 []T
func Benchmark_SliceInsertion_Verbose(b *testing.B) {
	for i := 0; i < b.N; i++ {
		s2 = SliceInsertion_Verbose(sx, sy, index)
	}
}

What did you expect to see?

Small performance difference.

What did you see instead?

$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: app/t
Benchmark_SliceInsertion_OneLine-4   	 5000000	       311 ns/op	     368 B/op	       2 allocs/op
Benchmark_SliceInsertion_Verbose-4   	10000000	       146 ns/op	     160 B/op	       1 allocs/op
PASS
ok  	app/t	3.535s
@zigo101
Copy link
Author

zigo101 commented Apr 21, 2019

I would say this optimization is more needed than this one, #21266, made in Go SDK 1.11, for it is used more popularly, and in fact, before Go 1.11, there is already a slice-extending method which is even more efficient than the optimization made in Go SDK 1.11. Evidence:

package main

import (
	"testing"
)

type T = int
var sx = []T{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}

func SliceGrow_OneLine(base []T, newCapacity int) []T {
	return append(base, make([]T, newCapacity-cap(base))...)
}

func SliceGrow_VerboseCopy(base []T, newCapacity int) []T {
	m := make([]T, newCapacity)
	copy(m, base)
	return m
}

var sa []T
func Benchmark_SliceGrow_OneLine(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sa = SliceGrow_OneLine(sx, 100)
	}
}

var sc []T
func Benchmark_SliceGrow_VerboseCopy(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sc = SliceGrow_VerboseCopy(sx, 100)
	}
}

Benchmark result:

Benchmark_SliceGrow_OneLine-4       	 3000000	       426 ns/op	     896 B/op	       1 allocs/op
Benchmark_SliceGrow_VerboseCopy-4   	 5000000	       350 ns/op	     896 B/op	       1 allocs/op

@julieqiu julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 28, 2019
@julieqiu
Copy link
Member

/cc @randall77 @griesemer

@josharian
Copy link
Contributor

cc @martisch

There are a lot of variants of this. E.g. splice, introduced in a go-diff commit, offered very substantial performance improvements.

I'm genuinely not sure how many of these we want to recognize in the compiler vs, say, writing a blog post illustrating how to write these efficiently or wait for generics and implement them in the standard library.

@martisch
Copy link
Contributor

I see at least 3 patterns here:

  • append(slice[:index],` append(elements, slice[index:]...)...)
  • append(slice[:index],` append(elements, slice[index+len(elements):]...)...)
  • append(slice[:index],` append(elements, slice[index+amount:]...)...)

I think the first 2 should be easiest to implement and subjectively also the most used.
If we have them recognized we could look at the how difficult and how much more complexity the third case adds.

If others dont think the first 2 are a no-go I can have go at it :) (but if someone else is eager to work on it dont hold off on it)

@ear7h
Copy link

ear7h commented Sep 11, 2019

I'd like to try this, but I think it could be generalized, as the original comment mentioned.

For nested appends, one would need at most 1 call to growslice (but more calls to memmove). append(s1, append(s2, append( s3, append(. . .)...)...)...) could be transformed to the following by walkappend:

init {
	// continue walking the node tree and place the args
	// in temporaries s1, s2, s3, ...
	newcap := len(s1) + len(s2) + len(s3) + ...
	if newcap > cap(s1) {
	    s1 = growslice(s1, newcap)
	}

	s1 = s1[:s1]
	idx := 0

	copy(s1[idx:], s2)
	idx += len(s2)
	if cap(s2) >= len(s3) {
	    copy(s2[:cap(s2)], s3)
	}

	copy(s1[idx:], s3)
	idx += len(s3)
	if cap(s3) >= len(s4) {
	    copy(s3[:cap(s3)], s4)
	}

	...
}
s

This only allocates for the resulting slice and mimics the behavior of actually calling append where it writes between len and cap iff there's enough room. Since we do the writing conditionally, the number of copies is also reduced (along with allocations). This could also have logic for handling single values, and make efficiently by checking types and ops when generating the additions and copies. In the make case, the copy is replaced with memclr.

I should mention that I think I understand most of what's going on in the slice generation except for this bit:

// General case, with no function calls left as arguments.
// Leave for gen, except that instrumentation requires old form.
if !instrumenting || compiling_runtime {
return n
}

@martisch
Copy link
Contributor

martisch commented Sep 12, 2019

Careful with newcap := len(s1) + len(s2) + len(s3) + ... this can overflow and wrap around making the resulting capacity to small. The adds need to be done one by one with overflow checked each time. If any problem is encountered on the way the up to then good appends need to be applied so there is no difference in semantics. There can be logic that the calculation can not overflow e.g. for an existing length (or lower)+constant insert append(slice[:index], append(elements, slice[index:]...)...)`.

I would still suggest to solve this for the simple case to understand the complexities involved better and then we can see if the generalization is worth even more complexity while still being used in production code.

@quasilyte
Copy link
Contributor

quasilyte commented Dec 6, 2021

To provide some statistics (source: gocorpus with all repositories checked):

  • append($b[:$i], append($_, $b[$i:]...)...) - 9 hits
  • append($b[:$i], append([]$_{$_}, $b[$i:]...)...) - 5 hits
  • $b = append($b, $_); copy($b[$i+1:], $b[$i:]); $b[$i] = $_ - 12 hits (1 match per ~866,636 SLOC)
1-2. output=append(output,0);copy(output[i+1:],output[i:]);output[i]=n (2 matches)
3-4. impDecl.Specs=append(impDecl.Specs,nil);copy(impDecl.Specs[insertAt+1:],impDecl.Specs[insertAt:]);impDecl.Specs[insertAt]=newImport (2 matches)
5. coords=append(coords,geom.Coord{});copy(coords[index+1:],coords[index:]);coords[index]=point.Coords()
6. l.data=append(l.data,nil);copy(l.data[i+1:],l.data[i:]);l.data[i]=s
7. t.inUseKeys=append(t.inUseKeys,writtenKeySpan{});copy(t.inUseKeys[start+1:],t.inUseKeys[start:]);t.inUseKeys[start]=writtenKeySpan{key:span,writer:w,txn:txn}
8. arr=append(arr,nil);copy(arr[idx+1:],arr[idx:]);arr[idx]=val
9. sb.endpoints=append(sb.endpoints,nil);copy(sb.endpoints[i+1:],sb.endpoints[i:]);sb.endpoints[i]=ep
10. s.Calls=append(s.Calls,Call{});copy(s.Calls[index+1:],s.Calls[index:]);s.Calls[index]=Call{Stack:gr.Stack}
11. archive.Files=append(archive.Files,txtar.File{});copy(archive.Files[gosumIdx+1:],archive.Files[gosumIdx:]);archive.Files[gosumIdx]=txtar.File{Name:"go.sum",Data:newGosumData}
12. es=append(es,muxEntry{});copy(es[i+1:],es[i:]);es[i]=e

For comparison, here are some results on the same corpus:

  • append($_, $_) - 37008 hits
  • append($_, $_, $_) - 723 hits

Which is understandable, 2-arguments append is far more common than any other form, but this gives us the answer by how much (~51 times).

(The search patterns are written in gogrep syntax, if you're curious.)


Just out of curiosity, does this slices.Insert perform the required operation in the comparable performance that we can achieve by teaching the compiler to make the suggested rewrite?

@dotaheor
Copy link

dotaheor commented Dec 7, 2021

Just out of curiosity, does this slices.Insert perform the required operation in the comparable performance that we can achieve by teaching the compiler to make the suggested rewrite?

No, it doesn't. Generics doesn't help here.
The make call still zero elements in the generic code.

func Insert[S ~[]E, E any](s S, i int, v ...E) S {
	tot := len(s) + len(v)
	if tot <= cap(s) {
		s2 := s[:tot]
		copy(s2[i+len(v):], s[i:])
		copy(s2[i:], v)
		return s2
	}
	s2 := make(S, tot)
	copy(s2, s[:i])
	copy(s2[i:], v)
	copy(s2[i+len(v):], s[i:])
	return s2
}

That's why I proposed to add merge builtin function (#23905).

On the other hand, the generic code does provide a chance to simplify the potential compiler optimization.

@quasilyte
Copy link
Contributor

quasilyte commented Dec 7, 2021

Excessive zeroing is a known issue. I think it's a separate fruit on its own.
I'm working on it by adding a memclrelim pass.
Right now it doesn't handle make/copy/etc, but there are plans to extend it (see the TODO notes, but there are actually more ideas).
https://go-review.googlesource.com/c/go/+/367496

@dotaheor
Copy link

dotaheor commented Dec 7, 2021

Right now it doesn't handle make/copy/etc

Mostly, except for the case handled by a Go 1.15 optimization:

var s = make([]T, n)
copy(s, x)
copy(s[len(x):], y)

In the above code, the elements within s[:len(x)] are not zeroed.

For the complexities in reality, the optimization has many limitations:

  • the make call must take exact two arguments,
  • x must be an identifier.
  • the elements within s[len(x):] are still zeroed even if the previous conditions are satisfied.

That means, currently, the above generic code could be a little faster:

func Insert[S ~[]E, E any](s S, i int, v ...E) S {
	tot := len(s) + len(v)
	if tot <= cap(s) {
		s2 := s[:tot]
		copy(s2[i+len(v):], s[i:])
		copy(s2[i:], v)
		return s2
	}
	x := s[:i]
	s2 := make(S, tot)
	copy(s2, x)
	copy(s2[i:], v)
	copy(s2[i+len(v):], s[i:])
	return s2
}

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
@mknyszek mknyszek moved this to Triage Backlog in Go Compiler / Runtime Jul 15, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
@zigo101
Copy link
Author

zigo101 commented Jan 30, 2023

I decided to close this issue, in favor of slices.Insert: https://cs.opensource.google/go/x/exp/+/a960b378:slices/slices.go;l=142

The one line trick might cause data incomplete when the result slice length overflows int.

@zigo101 zigo101 closed this as completed Jan 30, 2023
@github-project-automation github-project-automation bot moved this from Triage Backlog to Done in Go Compiler / Runtime Jan 30, 2023
@zigo101
Copy link
Author

zigo101 commented Jan 30, 2023

Sorry, after rethinking, I think data will not become incomplete even if the result slice length overflows int. (Edit: It is still possible when out of memory in the second step).
Anyway, the one-line trick is not recommended to use. So keep closed.

@golang golang locked and limited conversation to collaborators Jan 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

10 participants