btf: Optimizing BTF parsing by merging `readTypes` and `inflateRawTypes` #1211

dylandreimerink · 2023-11-07T13:06:06Z

After profiling the BTF parsing code, it became apparent that a lot of time was spent in readTypes was spent allocating rawTypes. These rawTypes are only used as an intermediate step to create the final inflated types so all of the allocation work gets thrown away.

This commit merges readTypes and inflateRawTypes into a single function. This allows us to re-use the intermediate objects and only allocate the final inflated types.

This results in the following performance improvements:

goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
                │ before.txt  │          after.txt          │
                │   sec/op    │   sec/op     vs base        │
ParseVmlinux-16   49.05m ± 1%   46.08m ± 1%  -6.06% (n=100)

                │  before.txt  │           after.txt           │
                │     B/op     │     B/op      vs base         │
ParseVmlinux-16   31.45Mi ± 0%   26.65Mi ± 0%  -15.28% (n=100)

                │ before.txt  │          after.txt           │
                │  allocs/op  │  allocs/op   vs base         │
ParseVmlinux-16   534.1k ± 0%   467.5k ± 0%  -12.48% (n=100)

After profiling the BTF parsing code, it became apparent that a lot of time was spent in `readTypes` was spent allocating rawTypes. These rawTypes are only used as an intermediate step to create the final inflated types so all of the allocation work gets thrown away. This commit merges `readTypes` and `inflateRawTypes` into a single function. This allows us to re-use the intermediate objects and only allocate the final inflated types. This results in the following performance improvements: ``` goos: linux goarch: amd64 pkg: github.com/cilium/ebpf/btf cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ ParseVmlinux-16 49.05m ± 1% 46.08m ± 1% -6.06% (n=100) │ before.txt │ after.txt │ │ B/op │ B/op vs base │ ParseVmlinux-16 31.45Mi ± 0% 26.65Mi ± 0% -15.28% (n=100) │ before.txt │ after.txt │ │ allocs/op │ allocs/op vs base │ ParseVmlinux-16 534.1k ± 0% 467.5k ± 0% -12.48% (n=100) ``` Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

btf/types.go

btf/btf_types.go

lmb · 2023-11-08T10:15:38Z

btf/btf_types.go

@@ -300,6 +326,17 @@ const (
 	btfIntBitsShift     = 0
 )

+var btfIntLen = packedSize[btfInt]()
+
+func unmarshalBtfInt(bi *btfInt, b []byte, bo binary.ByteOrder) (int, error) {


Why not make these methods on btfInt and so on?

That works for the single types, but for the types where we have slices such as []btfMember and []btfParam its nice to have a function to marshal the whole slice and do the bounds checks ect.

I thought this worked just as well.

We can also go the method route but we would have to declare a type alias for the slices to be able to add methods to them.

type btfMembers []btfMember func (bm btfMembers) Marshal(b []byte, bo binary.ByteOrder) (int, error) { ....

But that seemed like to much added code for the same effect.

Fair enough. Medium term we need to get rid of all of these manual marshalers anyways.

During profiling `binary.Read` uses up a significant amount of CPU time. This seems to be due to it using reflection to calculate the amount of bytes to read at runtime and not caching these results. By doing manual `io.ReadFull` calls, pre-calculating struct sizes and reusing types and buffers where possible, we can reduce the CPU time spent in `readAndInflateTypes` by almost 25%. ``` goos: linux goarch: amd64 pkg: github.com/cilium/ebpf/btf cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ ParseVmlinux-16 46.08m ± 1% 34.59m ± 2% -24.93% (n=100) │ before.txt │ after.txt │ │ B/op │ B/op vs base │ ParseVmlinux-16 26.65Mi ± 0% 23.49Mi ± 0% -11.87% (n=100) │ before.txt │ after.txt │ │ allocs/op │ allocs/op vs base │ ParseVmlinux-16 467.5k ± 0% 267.7k ± 0% -42.73% (n=100) ``` Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

dylandreimerink requested a review from lmb November 7, 2023 13:06

lmb requested changes Nov 7, 2023

View reviewed changes

btf/types.go Outdated Show resolved Hide resolved

btf/types.go Show resolved Hide resolved

dylandreimerink requested a review from lmb November 7, 2023 16:34

dylandreimerink force-pushed the feature/merge-type-parsing-readerat branch 2 times, most recently from b69fd09 to 97891cc Compare November 7, 2023 16:39

lmb reviewed Nov 8, 2023

View reviewed changes

btf/btf_types.go Outdated Show resolved Hide resolved

lmb reviewed Nov 8, 2023

View reviewed changes

dylandreimerink force-pushed the feature/merge-type-parsing-readerat branch 2 times, most recently from 8a2f371 to 37d0488 Compare November 8, 2023 18:27

dylandreimerink force-pushed the feature/merge-type-parsing-readerat branch from 37d0488 to 475e2e2 Compare November 8, 2023 18:36

dylandreimerink requested a review from lmb November 8, 2023 18:41

lmb approved these changes Nov 9, 2023

View reviewed changes

lmb merged commit 1a65b78 into cilium:main Nov 9, 2023
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

btf: Optimizing BTF parsing by merging `readTypes` and `inflateRawTypes` #1211

btf: Optimizing BTF parsing by merging `readTypes` and `inflateRawTypes` #1211

dylandreimerink commented Nov 7, 2023

lmb Nov 8, 2023

dylandreimerink Nov 8, 2023

lmb Nov 9, 2023

btf: Optimizing BTF parsing by merging readTypes and inflateRawTypes #1211

btf: Optimizing BTF parsing by merging readTypes and inflateRawTypes #1211

Conversation

dylandreimerink commented Nov 7, 2023

lmb Nov 8, 2023

Choose a reason for hiding this comment

dylandreimerink Nov 8, 2023

Choose a reason for hiding this comment

lmb Nov 9, 2023

Choose a reason for hiding this comment

btf: Optimizing BTF parsing by merging `readTypes` and `inflateRawTypes` #1211

btf: Optimizing BTF parsing by merging `readTypes` and `inflateRawTypes` #1211