Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flamegraph takes a really long time to generate the svg #199

Closed
codedcosmos opened this issue Apr 7, 2022 · 7 comments
Closed

Flamegraph takes a really long time to generate the svg #199

codedcosmos opened this issue Apr 7, 2022 · 7 comments

Comments

@codedcosmos
Copy link

I'm just running cargo-flamegraph flamegraph letting my application run for 10 seconds and closing the application.

At the end I see

[ perf record: Woken up 518 times to write data ]
[ perf record: Captured and wrote 132.523 MB perf.data (16443 samples) ]

Maybe that's a lot but I'm using default settings so IDK.

Now it's just constantly running a single addr2line and then respawning a new one...

codedcosmos@codedcosmos-UBUNTU:~$ pstree -p 40304
cargo-flamegrap(40304)───perf(41559)───sh(43099)───addr2line(43100)
codedcosmos@codedcosmos-UBUNTU:~$ pstree -p 40304
cargo-flamegrap(40304)───perf(41559)───sh(43501)───addr2line(43502)
codedcosmos@codedcosmos-UBUNTU:~$ pstree -p 40304
cargo-flamegrap(40304)───perf(41559)───sh(44021)───addr2line(44022)
codedcosmos@codedcosmos-UBUNTU:~$ 

These are taken several minutes apart.. So I'm kinda getting bottle-necked by this single-threaded addr2line process.

I ran it 40 minutes ago and it's still stuck on that perf record captured and wrote line. I can press Ctrl+C and it will generate an SVG but they are pretty clearly not accurate and they have a lot of [unknown] sections.

It's a vulkan based game engine, so maybe the raw amount of complexity is causing this issue. Cause calling vulkan methods will often call quite complicated driver functions and the stack traces can sometimes get funky.

Also I'm using a Ryzen 9 5950x and I have 32 Gigs of ram, and I only have SSDs.

@mstange
Copy link
Contributor

mstange commented Apr 7, 2022

perf's approach of runningaddr2line individually for every address is just super slow. You can work around it in cargo-flamegraph by opting out of inline information using --no-inline, but it means you'll get worse stacks. Alternatively, there's some information on how to build a faster perf locally in this post: https://eighty-twenty.org/2021/09/09/perf-addr2line-speed-improvement

Is your app open source? I'm currently collecting reproducible instances of perf slowness because I'm considering to work on a faster replacement.

@mstange
Copy link
Contributor

mstange commented Apr 7, 2022

Fixing this is outside of the scope of cargo-flamegraph though, short of switching to a faster perf alternative (once one exists).

@codedcosmos
Copy link
Author

--no-inline causes it to create the svg instantly but for some reason it thinks that vkCreateInstance is taking up most of the CPU time. Which simply isn't true, it's called once and the log for it seems to show it taking a few milliseconds, and I'm running the program for many seconds.

So I guess that's the worst stacks.

It's not open source but I might be able to reproduce it with about a days worth of work.

@djc
Copy link
Contributor

djc commented Apr 7, 2022

I'm currently collecting reproducible instances of perf slowness because I'm considering to work on a faster replacement.

That sounds amazing!

Fixing this is outside of the scope of cargo-flamegraph though, short of switching to a faster perf alternative (once one exists).

Yeah, there's not much we can do in flamegraph. Going to close this (but feel free to keep discussing possible strategies here).

@djc djc closed this as completed Apr 7, 2022
@codedcosmos
Copy link
Author

@mstange
What repo houses the tool that might need a rewrite?
Perhaps a rust rewrite would solve our problems?

@mstange
Copy link
Contributor

mstange commented Apr 10, 2022

--no-inline causes it to create the svg instantly but for some reason it thinks that vkCreateInstance is taking up most of the CPU time. Which simply isn't true, it's called once and the log for it seems to show it taking a few milliseconds, and I'm running the program for many seconds.

So I guess that's the worst stacks.

Well, that's a bit weird. The stacks shouldn't be that much worse - they should just be missing function calls that were inlined away by the compiler. But the overall symbols should still make sense. You could check whether your binaries contain the symbols you'd expect, by running nm on them. If the functions you'd expect don't show up, maybe symbols are getting stripped somewhere in your build system.

@mstange What repo houses the tool that might need a rewrite? Perhaps a rust rewrite would solve our problems?

perf is in the Linux source tree: https://github.com/torvalds/linux/tree/master/tools/perf
It's written in C. The only reason why I'm interested in a Rust rewrite (of some select parts of it) is the fact that it's easier to use the addr2line Rust crate from Rust code - and a lot of performance optimization has gone into that crate. (And personally I am not interested in contributing to a C project.)

Anyway, the first pieces of my perf.data parser are here: https://github.com/mstange/linux-perf-data
It doesn't do symbolication yet.

@davidhewitt
Copy link

davidhewitt commented Jul 20, 2023

Sharing here because I had a related issue on Ubuntu 22.04 with perf being incredibly slow despite using a version with the patch above.

Updating to Ubuntu 23.04 seems to have fixed for me. I think the root cause was this binutils bug https://sourceware.org/bugzilla/show_bug.cgi?id=28588, which lists the affected version as binutils 2.38 (which is what ships in Ubuntu 22.04). The fix commit listed in that bug report shipped in binutils 2.39. Ubuntu 23.04 has binutils 2.40.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants