Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btf: Optimizing BTF parsing by merging readTypes and inflateRawTypes #1211

Merged

Conversation

dylandreimerink
Copy link
Member

After profiling the BTF parsing code, it became apparent that a lot of time was spent in readTypes was spent allocating rawTypes. These rawTypes are only used as an intermediate step to create the final inflated types so all of the allocation work gets thrown away.

This commit merges readTypes and inflateRawTypes into a single function. This allows us to re-use the intermediate objects and only allocate the final inflated types.

This results in the following performance improvements:

goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
                │ before.txt  │          after.txt          │
                │   sec/op    │   sec/op     vs base        │
ParseVmlinux-16   49.05m ± 1%   46.08m ± 1%  -6.06% (n=100)

                │  before.txt  │           after.txt           │
                │     B/op     │     B/op      vs base         │
ParseVmlinux-16   31.45Mi ± 0%   26.65Mi ± 0%  -15.28% (n=100)

                │ before.txt  │          after.txt           │
                │  allocs/op  │  allocs/op   vs base         │
ParseVmlinux-16   534.1k ± 0%   467.5k ± 0%  -12.48% (n=100)

After profiling the BTF parsing code, it became apparent that a lot of
time was spent in `readTypes` was spent allocating rawTypes. These
rawTypes are only used as an intermediate step to create the final
inflated types so all of the allocation work gets thrown away.

This commit merges `readTypes` and `inflateRawTypes` into a single
function. This allows us to re-use the intermediate objects and only
allocate the final inflated types.

This results in the following performance improvements:

```
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
                │ before.txt  │          after.txt          │
                │   sec/op    │   sec/op     vs base        │
ParseVmlinux-16   49.05m ± 1%   46.08m ± 1%  -6.06% (n=100)

                │  before.txt  │           after.txt           │
                │     B/op     │     B/op      vs base         │
ParseVmlinux-16   31.45Mi ± 0%   26.65Mi ± 0%  -15.28% (n=100)

                │ before.txt  │          after.txt           │
                │  allocs/op  │  allocs/op   vs base         │
ParseVmlinux-16   534.1k ± 0%   467.5k ± 0%  -12.48% (n=100)
```

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
btf/types.go Outdated Show resolved Hide resolved
btf/types.go Show resolved Hide resolved
@dylandreimerink dylandreimerink force-pushed the feature/merge-type-parsing-readerat branch 2 times, most recently from b69fd09 to 97891cc Compare November 7, 2023 16:39
btf/btf_types.go Outdated Show resolved Hide resolved
@@ -300,6 +326,17 @@ const (
btfIntBitsShift = 0
)

var btfIntLen = packedSize[btfInt]()

func unmarshalBtfInt(bi *btfInt, b []byte, bo binary.ByteOrder) (int, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make these methods on btfInt and so on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for the single types, but for the types where we have slices such as []btfMember and []btfParam its nice to have a function to marshal the whole slice and do the bounds checks ect.

I thought this worked just as well.

We can also go the method route but we would have to declare a type alias for the slices to be able to add methods to them.

type btfMembers []btfMember

func (bm btfMembers) Marshal(b []byte, bo binary.ByteOrder) (int, error) { ....

But that seemed like to much added code for the same effect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Medium term we need to get rid of all of these manual marshalers anyways.

@dylandreimerink dylandreimerink force-pushed the feature/merge-type-parsing-readerat branch 2 times, most recently from 8a2f371 to 37d0488 Compare November 8, 2023 18:27
During profiling `binary.Read` uses up a significant amount of CPU time.
This seems to be due to it using reflection to calculate the amount of
bytes to read at runtime and not caching these results.

By doing manual `io.ReadFull` calls, pre-calculating struct sizes and
reusing types and buffers where possible, we can reduce the CPU time
spent in `readAndInflateTypes` by almost 25%.

```
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
                │ before.txt  │          after.txt           │
                │   sec/op    │   sec/op     vs base         │
ParseVmlinux-16   46.08m ± 1%   34.59m ± 2%  -24.93% (n=100)

                │  before.txt  │           after.txt           │
                │     B/op     │     B/op      vs base         │
ParseVmlinux-16   26.65Mi ± 0%   23.49Mi ± 0%  -11.87% (n=100)

                │ before.txt  │          after.txt           │
                │  allocs/op  │  allocs/op   vs base         │
ParseVmlinux-16   467.5k ± 0%   267.7k ± 0%  -42.73% (n=100)
```

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@lmb lmb merged commit 1a65b78 into cilium:main Nov 9, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants