Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(perf): support reading symbols from jitdump #1051

Merged
merged 27 commits into from
Feb 8, 2023

Conversation

maxbrunet
Copy link
Member

@maxbrunet maxbrunet commented Nov 18, 2022

Motivation

The agent already supports reading symbols from perf map files, and there is a second format called jitdump:

https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jitdump-specification.txt

Some languages with jitdump support:

See also https://developer.arm.com/documentation/101816/0800/JIT-profiling-support/JIT-profiling

Changes

  • Add a parser for the JITDUMP format
    • Everything is parsed, expect the EH Frames (if any) for now
    • It is currently optimistic and warns if a field cannot be read, hoping the rest can be (happy to discuss if it is good or bad)
    • They maybe some refactoring possible regarding how reads are done, I expect a lot of feedback 😀
  • Add JITDUMP as a fallback to Perf map, looking for the file path in /proc/<pid>/maps (the file is mmap()'d to be used as marker)

Currently only JIT Code Loads records are used to mimic a Perf map, but the format has potential for more, it could be used to build Debug Info and upload to Parca, but the problem is JIT Code does not have a build ID. Perf Maps keep priority over jitdumps as they are more lightweight, but if we find advantages in the richer format, it might change in the future. (Otherwise, I might reconfigure the parser to discard unnecessary data as an optimization)

Testing

I have in mind to add added jitdumps and their JSON snapshot to https://github.com/parca-dev/testdata

Right now, you can:

  • Run NodeJS with the --perf-prof flag
  • Run Erlang with the ERL_FLAGS='+JPperf dump' environment variable
  • Run Wasmtime with the --jitdump flag

@maxbrunet maxbrunet requested a review from a team as a code owner November 18, 2022 18:43
@v-thakkar v-thakkar self-requested a review November 21, 2022 12:12
Copy link
Member Author

@maxbrunet maxbrunet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will not work further on this until I get a 1st round of feedback.

Some comments from a self-review:

  • Define static errors, so they can easily be caught by caller

pkg/jit/jitdump.go Outdated Show resolved Hide resolved
pkg/jit/jitdump.go Outdated Show resolved Hide resolved
pkg/jit/jitdump.go Outdated Show resolved Hide resolved
pkg/perf/perf.go Outdated Show resolved Hide resolved
pkg/jit/jitdump.go Outdated Show resolved Hide resolved
pkg/jit/jitdump.go Outdated Show resolved Hide resolved
Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. We can improve some code structure and start testing it.
Thanks for realizing and enhancing these parts.

var m Map
var err error
switch {
case strings.HasSuffix(perfFile, ".map"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about converting the Map to an interface type and have 2 different implementations for .map and .dump?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lookup(addr uint64) (string, error) can be the only method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, the idea went through my mind, I wired things up quickly to get it to work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the interface feels like over-engineering too me, MapFromDump() would still be needed to compute end addresses and sort them to allow for binary search. And we would still have a switch-case to instantiate one or the other

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's over-engineering. It will make this cleaner and easier to read. We write for humans to read and computers to run 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm my thought exactly, simplicity is good for humans, the pattern would feel a bit forced to me 🙂

Anyway I'd like to keep changes to the minimum in this file, we can revisit in another PR, I will only fixed the performance issues I introduced

pkg/jit/jitdump.go Outdated Show resolved Hide resolved
pkg/jit/jitdump.go Outdated Show resolved Hide resolved
)

// JITHeader represent a jitdump file header.
type JITHeader struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We should consider which types and fields we export if we don't depend on reflection or serialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I am not sure how the package will be used yet, so I do not have an opinion on that, the ratio useful/expensive needs to be taken into account too

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to merge this. But I see several to do items. Do you plan to handle them in this PR or shall we merge this? @maxbrunet

var m Map
var err error
switch {
case strings.HasSuffix(perfFile, ".map"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's over-engineering. It will make this cleaner and easier to read. We write for humans to read and computers to run 😄

@maxbrunet
Copy link
Member Author

I'm happy to merge this. But I see several to do items. Do you plan to handle them in this PR or shall we merge this?

Yes, I'd like to handle them here, but before I do so, I would appreciate some feedback from @v-thakkar and/or @javierhonduco since self-requested a review 🙏

@maxbrunet
Copy link
Member Author

maxbrunet commented Dec 3, 2022

@kakkoyun Ready to merge! 🙇 I'll do further work in other PRs. 🙂

pkg/perf/perf.go Outdated Show resolved Hide resolved
@@ -116,7 +116,7 @@ func MapFromDump(logger log.Logger, fs fs.FS, fileName string) (Map, error) {
}
// Some runtimes update their dump all the time (e.g. libperf_jvmti.so),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this will be the norm in case you attach to a running process like parca will do.

@kakkoyun
Copy link
Member

kakkoyun commented Dec 5, 2022

I want to proceed with this. Vaishali is on PTO. @javierhonduco could you take a look at this? Is there a blocker on your side?

@javierhonduco
Copy link
Contributor

javierhonduco commented Dec 5, 2022

Haven't had the time, yet. I will make sure I review it by end of the week.

edit: Sorry I haven't gotten around to reviewing this. Will do next week

@maxbrunet
Copy link
Member Author

I added tests, here is a first benchmark:

$ ls -l testdata/jitdump/*.dump
-rw-r--r-- 1 maxime users   506419 Dec  9 12:11 testdata/jitdump/dotnet.dump
-rw-r--r-- 1 maxime users 13040830 Dec  9 12:11 testdata/jitdump/erlang.dump
-rw-r--r-- 1 maxime users    10582 Dec  9 12:11 testdata/jitdump/julia.dump
-rw-r--r-- 1 maxime users 13082624 Dec  9 12:11 testdata/jitdump/libperf-jvmti.dump
-rw-r--r-- 1 maxime users  1562469 Dec  9 12:11 testdata/jitdump/nodejs.dump
-rw-r--r-- 1 maxime users     6516 Dec  9 12:11 testdata/jitdump/php.dump
-rw-r--r-- 1 maxime users   129246 Dec  9 12:11 testdata/jitdump/wasmtime.dump

$ go test -bench=. -benchmem ./pkg/jit
goos: linux
goarch: amd64
pkg: github.com/parca-dev/parca-agent/pkg/jit
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkLoadJITDump/dotnet-8               1468            969714 ns/op          692432 B/op      14414 allocs/op
BenchmarkLoadJITDump/erlang-8                 34          34686926 ns/op        22501897 B/op     835150 allocs/op
BenchmarkLoadJITDump/julia-8               21128             49061 ns/op           26057 B/op       1147 allocs/op
BenchmarkLoadJITDump/libperf-jvmti-8          39          27826079 ns/op        20026386 B/op     695169 allocs/op
BenchmarkLoadJITDump/nodejs-8                448           2826668 ns/op         2078581 B/op      39746 allocs/op
BenchmarkLoadJITDump/php-8                 38241             28755 ns/op           17012 B/op        538 allocs/op
BenchmarkLoadJITDump/wasmtime-8             7779            217346 ns/op          174055 B/op       3231 allocs/op
PASS
ok      github.com/parca-dev/parca-agent/pkg/jit        11.512s

@vchuravy
Copy link

Would love to see this merged!

Copy link
Contributor

@javierhonduco javierhonduco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great so far! ❤️ Did a first pass of reviews and things are very close to being ready. Pre-emptively approving but let's ensure that we get one more maintainer onboard 😄

pkg/jit/jitdump.go Show resolved Hide resolved
pkg/jit/jitdump.go Show resolved Hide resolved
pkg/jit/jitdump.go Outdated Show resolved Hide resolved
pkg/jit/jitdump.go Outdated Show resolved Hide resolved
pkg/jit/jitdump.go Show resolved Hide resolved
pkg/jit/jitdump.go Outdated Show resolved Hide resolved
@javierhonduco
Copy link
Contributor

Oh, another minor thing, now we have the latest testdata pulled in, but whenever we grep something in the repo (e.g. using ripgrep) it might match the binary files. Could you please add them to the exclusion list for binary / generated files? Cheers!

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks awesome. Let's try to reduce allocations. We can run the escape analyzer to measure. Let's merge this and iterate, I'd say.

pkg/jit/jitdump.go Outdated Show resolved Hide resolved
@maxbrunet
Copy link
Member Author

Could you please add them to the exclusion list for binary / generated files?

Sorry, what exclusion list are you referring to? I assume that would be in the testdata repo

@maxbrunet
Copy link
Member Author

New benchmark: -60% allocs! Thank you so much @javierhonduco @brancz for spotting this!

$ go test -bench=. -benchmem ./pkg/jit
goos: linux
goarch: amd64
pkg: github.com/parca-dev/parca-agent/pkg/jit
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkLoadJITDump/dotnet-8               1426            752507 ns/op          630455 B/op       4453 allocs/op
BenchmarkLoadJITDump/erlang-8                 44          25866619 ns/op        19224474 B/op     295672 allocs/op
BenchmarkLoadJITDump/julia-8               26508             38855 ns/op           22103 B/op        445 allocs/op
BenchmarkLoadJITDump/libperf-jvmti-8          51          22224994 ns/op        17721051 B/op     272435 allocs/op
BenchmarkLoadJITDump/nodejs-8                504           2339107 ns/op         1906934 B/op      12791 allocs/op
BenchmarkLoadJITDump/php-8                 56421             20795 ns/op           14843 B/op        180 allocs/op
BenchmarkLoadJITDump/wasmtime-8             9472            148831 ns/op          160221 B/op       1010 allocs/op
PASS
ok      github.com/parca-dev/parca-agent/pkg/jit        10.304s
Benchstat before/after
goos: linux
goarch: amd64
pkg: github.com/parca-dev/parca-agent/pkg/jit
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
                            │   old.txt    │               new.txt                │
                            │    sec/op    │    sec/op     vs base                │
LoadJITDump/dotnet-8          1005.9µ ± 6%   690.8µ ± 12%  -31.32% (p=0.000 n=10)
LoadJITDump/erlang-8           34.77m ± 1%   24.12m ±  1%  -30.63% (p=0.000 n=10)
LoadJITDump/julia-8            51.37µ ± 5%   36.74µ ±  3%  -28.48% (p=0.000 n=10)
LoadJITDump/libperf-jvmti-8    28.41m ± 1%   21.14m ±  2%  -25.62% (p=0.000 n=10)
LoadJITDump/nodejs-8           2.990m ± 7%   2.145m ±  9%  -28.26% (p=0.000 n=10)
LoadJITDump/php-8              27.45µ ± 2%   20.86µ ±  2%  -24.00% (p=0.000 n=10)
LoadJITDump/wasmtime-8         204.3µ ± 6%   149.3µ ±  3%  -26.90% (p=0.000 n=10)
geomean                        978.0µ        704.9µ        -27.93%

                            │   old.txt    │               new.txt                │
                            │     B/op     │     B/op      vs base                │
LoadJITDump/dotnet-8          676.2Ki ± 0%   615.7Ki ± 0%   -8.95% (p=0.000 n=10)
LoadJITDump/erlang-8          21.46Mi ± 0%   18.33Mi ± 0%  -14.57% (p=0.000 n=10)
LoadJITDump/julia-8           25.45Ki ± 0%   21.58Ki ± 0%  -15.18% (p=0.000 n=10)
LoadJITDump/libperf-jvmti-8   19.10Mi ± 0%   16.90Mi ± 0%  -11.51% (p=0.000 n=10)
LoadJITDump/nodejs-8          1.982Mi ± 0%   1.819Mi ± 0%   -8.26% (p=0.000 n=10)
LoadJITDump/php-8             16.61Ki ± 0%   14.50Ki ± 0%  -12.75% (p=0.000 n=10)
LoadJITDump/wasmtime-8        170.0Ki ± 0%   156.5Ki ± 0%   -7.95% (p=0.000 n=10)
geomean                       636.6Ki        564.4Ki       -11.35%

                            │   old.txt    │               new.txt               │
                            │  allocs/op   │  allocs/op   vs base                │
LoadJITDump/dotnet-8          14.414k ± 0%   4.453k ± 0%  -69.11% (p=0.000 n=10)
LoadJITDump/erlang-8           835.1k ± 0%   295.7k ± 0%  -64.60% (p=0.000 n=10)
LoadJITDump/julia-8            1147.0 ± 0%    445.0 ± 0%  -61.20% (p=0.000 n=10)
LoadJITDump/libperf-jvmti-8    695.2k ± 0%   272.4k ± 0%  -60.81% (p=0.000 n=10)
LoadJITDump/nodejs-8           39.75k ± 0%   12.79k ± 0%  -67.82% (p=0.000 n=10)
LoadJITDump/php-8               538.0 ± 0%    180.0 ± 0%  -66.54% (p=0.000 n=10)
LoadJITDump/wasmtime-8         3.231k ± 0%   1.010k ± 0%  -68.74% (p=0.000 n=10)
geomean                        18.21k        6.247k       -65.69%

@javierhonduco
Copy link
Contributor

Amazing, thanks so much!

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ❤️ Awesome work! 🤩

@kakkoyun kakkoyun merged commit 62f49d0 into parca-dev:main Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants