Skip to content

Commit

Permalink
Speed-up profile management.
Browse files Browse the repository at this point in the history
Time taken for "top" listing for a large (34MB) profile drops by 15%:

```
name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   11.2s ± 2%  -14.72%  (p=0.008 n=5+5)
```

Furthermore, the time taken to merge/diff 34MB profiles drops by 53%:

```
Merge/2-12   7.74s ± 2%   3.63s ± 2%  -53.09%  (p=0.008 n=5+5)
```

Details follow:

The cost of a trivial merge was very high (4s for 34MB profile).
We now just skip such a merge and save the 4s.

* Only create a Sample the first time a sample key is seen.
* Faster ID to *Location mapping by creating a dense array that handles
  small IDs (this is almost always true).
* Faster sampleKey generation during merging by emitting binary encoding
  of numbers and using a strings.Builder instead of repeated fmt.Sprintf.

The preceding changes drop the cost of merging two copies of the same 34MB
profile by 53%:

```
name        old time/op  new time/op  delta
Merge/2-12   7.74s ± 2%   3.63s ± 2%  -53.09%  (p=0.008 n=5+5)
```

* Use temporary storage when decoding to reduce allocations.
* Pre-allocate space for all locations in one shot when creating a Profile.

The preceding speed up decoding by 13% and encoding by 7%:

```
name      old time/op  new time/op  delta
Parse-12   2.00s ± 4%   1.74s ± 3%  -12.99%  (p=0.008 n=5+5)
Write-12   679ms ± 2%   629ms ± 1%   -7.44%  (p=0.008 n=5+5)
```

When used in interactive mode, each command needs to make a fresh copy
of the profile since a command may mutate the profile. This used to be
done by serializing/compressing/decompressing/deserializing the
profile per command.  We now store the original data in serialized
uncompressed form so that we just need to deserialize the profile per
command. This change can be seen in the improvement in the time needed
to generate the "top" output:

```
name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   12.4s ± 0%  -5.84%  (p=0.008 n=5+5)
```

* Avoid filtering cost when there are no filters to apply.
* Avoid location munging when there are no tag roots or leaves to add.
* Faster stack entry pruning by caching the result of demangling and
  regexp matching for a given function name.

```
name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   12.3s ± 2%  -6.33%  (p=0.008 n=5+5)
```

* Added benchmarks for profile parsing, serializing, merging.
* Added benchmarks for a few web interface endpoints.
* Added a large profile (1.2MB) to proftest/testdata. This profile is from
  a synthetic program that contains ~23K functions that are exercised
  by a combination of stack traces so that we end up with a larger
  profile than typical. Note that the benchmarks above are from an
  even larger profile (34MB) from a real system, but that profile is
  too big to be added to the repository.
  • Loading branch information
ghemawat committed Oct 31, 2022
1 parent 131d412 commit 0730752
Show file tree
Hide file tree
Showing 17 changed files with 447 additions and 95 deletions.
24 changes: 22 additions & 2 deletions internal/driver/driver.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,8 @@ func PProf(eo *plugin.Options) error {
return interactive(p, o)
}

// generateRawReport is allowed to modify p.
func generateRawReport(p *profile.Profile, cmd []string, cfg config, o *plugin.Options) (*command, *report.Report, error) {
p = p.Copy() // Prevent modification to the incoming profile.

// Identify units of numeric tags in profile.
numLabelUnits := identifyNumLabelUnits(p, o.UI)

Expand Down Expand Up @@ -110,6 +109,7 @@ func generateRawReport(p *profile.Profile, cmd []string, cfg config, o *plugin.O
return c, rpt, nil
}

// generateReport is allowed to modify p.
func generateReport(p *profile.Profile, cmd []string, cfg config, o *plugin.Options) error {
c, rpt, err := generateRawReport(p, cmd, cfg, o)
if err != nil {
Expand Down Expand Up @@ -365,3 +365,23 @@ func valueExtractor(ix int) sampleValueFunc {
return v[ix]
}
}

// profileCopier can be used to obtain a fresh copy of a profile.
// It is useful since reporting code may mutate the profile handed to it.
type profileCopier []byte

func makeProfileCopier(src *profile.Profile) profileCopier {
// Pre-serialize the profile. We will deserialize every time a fresh copy is needed.
var buf bytes.Buffer
src.WriteUncompressed(&buf)
return profileCopier(buf.Bytes())
}

// newCopy returns a new copy of the profile.
func (c profileCopier) newCopy() *profile.Profile {
p, err := profile.ParseUncompressed([]byte(c))
if err != nil {
panic(err)
}
return p
}
45 changes: 45 additions & 0 deletions internal/driver/driver_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1029,6 +1029,16 @@ func symzProfile() *profile.Profile {
}
}

func largeProfile(tb testing.TB) *profile.Profile {
tb.Helper()
input := proftest.LargeProfile(tb)
prof, err := profile.Parse(bytes.NewBuffer(input))
if err != nil {
tb.Fatal(err)
}
return prof
}

var autoCompleteTests = []struct {
in string
out string
Expand Down Expand Up @@ -1577,6 +1587,41 @@ func TestSymbolzAfterMerge(t *testing.T) {
}
}

func TestProfileCopier(t *testing.T) {
type testCase struct {
name string
prof *profile.Profile
}
for _, c := range []testCase{
{"cpu", cpuProfile()},
{"heap", heapProfile()},
{"contention", contentionProfile()},
{"symbolz", symzProfile()},
{"long_name_funcs", longNameFuncsProfile()},
{"large", largeProfile(t)},
} {
t.Run(c.name, func(t *testing.T) {
copier := makeProfileCopier(c.prof)

// Muck with one copy to check that fresh copies are unaffected
tmp := copier.newCopy()
tmp.Sample = tmp.Sample[:0]

// Get new copy and check it is same as the original.
want := c.prof.String()
got := copier.newCopy().String()
if got != want {
t.Errorf("New copy is not same as original profile")
diff, err := proftest.Diff([]byte(want), []byte(got))
if err != nil {
t.Fatalf("Diff: %v", err)
}
t.Logf("Diff:\n%s\n", string(diff))
}
})
}
}

type mockObjTool struct{}

func (*mockObjTool) Open(file string, start, limit, offset uint64, relocationSymbol string) (plugin.ObjFile, error) {
Expand Down
5 changes: 5 additions & 0 deletions internal/driver/fetch.go
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,11 @@ func combineProfiles(profiles []*profile.Profile, msrcs []plugin.MappingSources)
return nil, nil, err
}

// Avoid expensive work for the common case of a single profile/src.
if len(profiles) == 1 && len(msrcs) == 1 {
return profiles[0], msrcs[0], nil
}

p, err := profile.Merge(profiles)
if err != nil {
return nil, nil, err
Expand Down
3 changes: 2 additions & 1 deletion internal/driver/interactive.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ func interactive(p *profile.Profile, o *plugin.Options) error {
interactiveMode = true
shortcuts := profileShortcuts(p)

copier := makeProfileCopier(p)
greetings(p, o.UI)
for {
input, err := o.UI.ReadLine("(pprof) ")
Expand Down Expand Up @@ -110,7 +111,7 @@ func interactive(p *profile.Profile, o *plugin.Options) error {

args, cfg, err := parseCommandLine(tokens)
if err == nil {
err = generateReportWrapper(p, args, cfg, o)
err = generateReportWrapper(copier.newCopy(), args, cfg, o)
}

if err != nil {
Expand Down
4 changes: 4 additions & 0 deletions internal/driver/tagroot.go
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,10 @@ func addLabelNodes(p *profile.Profile, rootKeys, leafKeys []string, outputUnit s
leafm = true
}

if len(leavesToAdd)+len(rootsToAdd) == 0 {
continue
}

var newLocs []*profile.Location
newLocs = append(newLocs, leavesToAdd...)
newLocs = append(newLocs, s.Location...)
Expand Down
9 changes: 6 additions & 3 deletions internal/driver/webui.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,14 @@ import (
// webInterface holds the state needed for serving a browser based interface.
type webInterface struct {
prof *profile.Profile
copier profileCopier
options *plugin.Options
help map[string]string
templates *template.Template
settingsFile string
}

func makeWebInterface(p *profile.Profile, opt *plugin.Options) (*webInterface, error) {
func makeWebInterface(p *profile.Profile, copier profileCopier, opt *plugin.Options) (*webInterface, error) {
settingsFile, err := settingsFileName()
if err != nil {
return nil, err
Expand All @@ -52,6 +53,7 @@ func makeWebInterface(p *profile.Profile, opt *plugin.Options) (*webInterface, e
report.AddSourceTemplates(templates)
return &webInterface{
prof: p,
copier: copier,
options: opt,
help: make(map[string]string),
templates: templates,
Expand Down Expand Up @@ -96,7 +98,8 @@ func serveWebInterface(hostport string, p *profile.Profile, o *plugin.Options, d
return err
}
interactiveMode = true
ui, err := makeWebInterface(p, o)
copier := makeProfileCopier(p)
ui, err := makeWebInterface(p, copier, o)
if err != nil {
return err
}
Expand Down Expand Up @@ -266,7 +269,7 @@ func (ui *webInterface) makeReport(w http.ResponseWriter, req *http.Request,
catcher := &errorCatcher{UI: ui.options.UI}
options := *ui.options
options.UI = catcher
_, rpt, err := generateRawReport(ui.prof, cmd, cfg, &options)
_, rpt, err := generateRawReport(ui.copier.newCopy(), cmd, cfg, &options)
if err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
ui.options.UI.PrintErr(err)
Expand Down
43 changes: 39 additions & 4 deletions internal/driver/webui_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,11 @@ import (
"github.com/google/pprof/profile"
)

func TestWebInterface(t *testing.T) {
func makeTestServer(t testing.TB, prof *profile.Profile) *httptest.Server {
if runtime.GOOS == "nacl" || runtime.GOOS == "js" {
t.Skip("test assumes tcp available")
}

prof := makeFakeProfile()

// Custom http server creator
var server *httptest.Server
serverCreated := make(chan bool)
Expand All @@ -60,8 +58,16 @@ func TestWebInterface(t *testing.T) {
HTTPServer: creator,
}, false)
<-serverCreated
defer server.Close()

// Close the server when the test is done.
t.Cleanup(server.Close)

return server
}

func TestWebInterface(t *testing.T) {
prof := makeFakeProfile()
server := makeTestServer(t, prof)
haveDot := false
if _, err := exec.LookPath("dot"); err == nil {
haveDot = true
Expand Down Expand Up @@ -305,3 +311,32 @@ func TestIsLocalHost(t *testing.T) {
}
}
}

func BenchmarkTop(b *testing.B) { benchmarkURL(b, "/top", false) }
func BenchmarkFlame(b *testing.B) { benchmarkURL(b, "/flamegraph2", false) }
func BenchmarkDot(b *testing.B) { benchmarkURL(b, "/", true) }

func benchmarkURL(b *testing.B, path string, needDot bool) {
if needDot {
if _, err := exec.LookPath("dot"); err != nil {
b.Skip("dot not available")
}
}
prof := largeProfile(b)
server := makeTestServer(b, prof)
url := server.URL + path
b.ResetTimer()
for i := 0; i < b.N; i++ {
res, err := http.Get(url)
if err != nil {
b.Fatal(err)
}
data, err := io.ReadAll(res.Body)
if err != nil {
b.Fatal(err)
}
if i == 0 && testing.Verbose() {
b.Logf("%-12s : %10d bytes", path, len(data))
}
}
}
15 changes: 15 additions & 0 deletions internal/proftest/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Description:
# Auto-imported from github.com/google/pprof/internal/proftest

licenses(["notice"])

package(
default_applicable_licenses = ["//third_party/golang/pprof:license"],
default_visibility = ["//third_party/golang/pprof/internal:friends"],
)

go_library(
name = "proftest",
srcs = ["proftest.go"],
embedsrcs = ["testdata/large.cpu"],
)
29 changes: 28 additions & 1 deletion internal/proftest/proftest.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,19 @@ package proftest

import (
"encoding/json"
"flag"
"fmt"
"io"
"os"
"os/exec"
"regexp"
"testing"

_ "embed" // For embedding profiles needed by tests and benchmarks
)

var flagLargeProfile = flag.String("large_profile", "", "The name of a file that contains a profile to use in benchmarks. If empty, a profile of a synthetic program is used.")

// Diff compares two byte arrays using the diff tool to highlight the
// differences. It is meant for testing purposes to display the
// differences between expected and actual output.
Expand Down Expand Up @@ -76,7 +81,7 @@ func EncodeJSON(x interface{}) []byte {
// Also tracks the number of times the error matches AllowRx in
// NumAllowRxMatches.
type TestUI struct {
T *testing.T
T testing.TB
Ignore int
AllowRx string
NumAllowRxMatches int
Expand Down Expand Up @@ -137,3 +142,25 @@ func (ui *TestUI) WantBrowser() bool {
// SetAutoComplete is not supported by the test UI.
func (ui *TestUI) SetAutoComplete(_ func(string) string) {
}

// LargeProfile returns a large profile that may be useful in benchmarks.
//
// If the flag --large_profile is set, the contents of the file
// named by the flag are returned. Otherwise an embedded profile (~1.2MB)
// for a synthetic program is returned.
func LargeProfile(tb testing.TB) []byte {
tb.Helper()
if f := *flagLargeProfile; f != "" {
// Use custom profile.
data, err := os.ReadFile(f)
if err != nil {
tb.Fatalf("custom profile file: %v\n", err)
}
return data
}

return largeProfileData
}

//go:embed testdata/large.cpu
var largeProfileData []byte
Binary file added internal/proftest/testdata/large.cpu
Binary file not shown.
Loading

0 comments on commit 0730752

Please sign in to comment.