Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed-up profile management. #729

Merged
merged 1 commit into from
Nov 2, 2022
Merged

Speed-up profile management. #729

merged 1 commit into from
Nov 2, 2022

Conversation

ghemawat
Copy link
Contributor

Time taken for "top" listing for a large (34MB) profile drops by 15%:

name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   11.2s ± 2%  -14.72%  (p=0.008 n=5+5)

Furthermore, the time taken to merge/diff 34MB profiles drops by 53%:

Merge/2-12   7.74s ± 2%   3.63s ± 2%  -53.09%  (p=0.008 n=5+5)

Details follow:

The cost of a trivial merge was very high (4s for 34MB profile). We now just skip such a merge and save the 4s.

  • Only create a Sample the first time a sample key is seen.
  • Faster ID to *Location mapping by creating a dense array that handles small IDs (this is almost always true).
  • Faster sampleKey generation during merging by emitting binary encoding of numbers and using a strings.Builder instead of repeated fmt.Sprintf.

The preceding changes drop the cost of merging two copies of the same 34MB profile by 53%:

name        old time/op  new time/op  delta
Merge/2-12   7.74s ± 2%   3.63s ± 2%  -53.09%  (p=0.008 n=5+5)
  • Use temporary storage when decoding to reduce allocations.
  • Pre-allocate space for all locations in one shot when creating a Profile.

The preceding speed up decoding by 13% and encoding by 7%:

name      old time/op  new time/op  delta
Parse-12   2.00s ± 4%   1.74s ± 3%  -12.99%  (p=0.008 n=5+5)
Write-12   679ms ± 2%   629ms ± 1%   -7.44%  (p=0.008 n=5+5)

When used in interactive mode, each command needs to make a fresh copy of the profile since a command may mutate the profile. This used to be done by serializing/compressing/decompressing/deserializing the profile per command. We now store the original data in serialized uncompressed form so that we just need to deserialize the profile per command. This change can be seen in the improvement in the time needed to generate the "top" output:

name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   12.4s ± 0%  -5.84%  (p=0.008 n=5+5)
  • Avoid filtering cost when there are no filters to apply.
  • Avoid location munging when there are no tag roots or leaves to add.
  • Faster stack entry pruning by caching the result of demangling and regexp matching for a given function name.
name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   12.3s ± 2%  -6.33%  (p=0.008 n=5+5)
  • Added benchmarks for profile parsing, serializing, merging.
  • Added benchmarks for a few web interface endpoints.
  • Added a large profile (1.2MB) to proftest/testdata. This profile is from a synthetic program that contains ~23K functions that are exercised by a combination of stack traces so that we end up with a larger profile than typical. Note that the benchmarks above are from an even larger profile (34MB) from a real system, but that profile is too big to be added to the repository.

Time taken for "top" listing for a large (34MB) profile drops by 15%:

```
name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   11.2s ± 2%  -14.72%  (p=0.008 n=5+5)
```

Furthermore, the time taken to merge/diff 34MB profiles drops by 53%:

```
Merge/2-12   7.74s ± 2%   3.63s ± 2%  -53.09%  (p=0.008 n=5+5)
```

Details follow:

The cost of a trivial merge was very high (4s for 34MB profile).
We now just skip such a merge and save the 4s.

* Only create a Sample the first time a sample key is seen.
* Faster ID to *Location mapping by creating a dense array that handles
  small IDs (this is almost always true).
* Faster sampleKey generation during merging by emitting binary encoding
  of numbers and using a strings.Builder instead of repeated fmt.Sprintf.

The preceding changes drop the cost of merging two copies of the same 34MB
profile by 53%:

```
name        old time/op  new time/op  delta
Merge/2-12   7.74s ± 2%   3.63s ± 2%  -53.09%  (p=0.008 n=5+5)
```

* Use temporary storage when decoding to reduce allocations.
* Pre-allocate space for all locations in one shot when creating a Profile.

The preceding speed up decoding by 13% and encoding by 7%:

```
name      old time/op  new time/op  delta
Parse-12   2.00s ± 4%   1.74s ± 3%  -12.99%  (p=0.008 n=5+5)
Write-12   679ms ± 2%   629ms ± 1%   -7.44%  (p=0.008 n=5+5)
```

When used in interactive mode, each command needs to make a fresh copy
of the profile since a command may mutate the profile. This used to be
done by serializing/compressing/decompressing/deserializing the
profile per command.  We now store the original data in serialized
uncompressed form so that we just need to deserialize the profile per
command. This change can be seen in the improvement in the time needed
to generate the "top" output:

```
name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   12.4s ± 0%  -5.84%  (p=0.008 n=5+5)
```

* Avoid filtering cost when there are no filters to apply.
* Avoid location munging when there are no tag roots or leaves to add.
* Faster stack entry pruning by caching the result of demangling and
  regexp matching for a given function name.

```
name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   12.3s ± 2%  -6.33%  (p=0.008 n=5+5)
```

* Added benchmarks for profile parsing, serializing, merging.
* Added benchmarks for a few web interface endpoints.
* Added a large profile (1.2MB) to proftest/testdata. This profile is from
  a synthetic program that contains ~23K functions that are exercised
  by a combination of stack traces so that we end up with a larger
  profile than typical. Note that the benchmarks above are from an
  even larger profile (34MB) from a real system, but that profile is
  too big to be added to the repository.
@aalexand aalexand merged commit 76f304f into google:main Nov 2, 2022
@ghemawat ghemawat deleted the speedup branch November 2, 2022 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants