-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
I'd like to use be able to use perf
on linux to profile CoreCLR quickly and effectively to dig into JIT'd code with perf annotate (e.g. more than just flamegraphs and top functions)
perf has various collection controls and knobs not directly available through the PerfCollect script, such as collecting specific HW events etc, for example:
export COMPlus_PerfMapEnabled=1
perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores ./SomeCoreCLRExecutable
I can get a nice report like this:
Performance counter stats for './SomeCoreCLRExecutable':
754,522,181 L1-dcache-loads
1,727,116 L1-dcache-load-misses # 0.23% of all L1-dcache hits
439,282,834 L1-dcache-stores
0.658335100 seconds time elapsed
0.626475000 seconds user
0.023790000 seconds sys
But if I try to collect the same sort of information with perf record
& display it with perf annotate
to see where/how this is happenning inside the JIT'd code, I can't get annotation of the code generated by the JIT, since there is no executable on disk that perf can peep into with objdump
, so perf complains, often like this, while trying to annotate JIT'd code:
No output from objdump -M intel --start-address=0x00007f7fd56697e0 --stop-address=0x00007f7fd5669b74 -l -d --no-show-raw -C "$1" 2>/dev/null|grep -v "$1:"|expand
It appears such support could be added by providing an alternative objdump executable that can mimic objdump as far as perf is concerned with a --objdump
argument.
It would be amazing if CoreCLR could come with support of some sort for this, so that we could use the native provided tools to quickly analyze and dive into what's happening to our code.