feat: add profiling info to benchmarks #52

ielashi · 2023-02-03T11:55:32Z

Problem

The benchmarks currently return the number of instructions it took to execute each benchmark. While this number is useful to measure performance, it doesn't provide insight into where these instructions are being used and where the performance bottle necks are. Without this information, making informed performance optimizations would require a lot of trial and error.

Solution

The typical solution to this problem is to use some kind of profiler. ic-repl already supports profiling and can output a flamegraph of where instructions are being spent, but it has a few drawbacks that makes it difficult to use:

The names of rust methods are mangled, even when debug = 1 is turned on, making it hard to make sense of the output.
Each benchmark includes logic to first setup, and only after setup would we want to profile, so we'd need a way to programmatically tell the profiler to reset its measurements.
Often we'd like to benchmark blocks of code that aren't functions.
It's difficult to extend, meaning that we can't easily swap the "instruction count" for some other performance measurement, such as "pages written".

To address the issues above, this commit introduces a "poor man profiler". This profiler is manual, in the sense that the developer adds to the code hints for what they care about profiling. In this PR, I added a few of these hints, and the benchmarks now return an output that looks like this:

dfx canister call benchmarks btreemap_insert_blob_256_1024
[Canister rwlgt-iiaaa-aaaaa-aaaaa-cai] {
    "btreemap_insert": "3_511_533_555 (97%)",
    "load_keys": "1_145_464_345 (31%)",
    "load_values": "876_374_691 (24%)",
    "node_load": "2_808_121_190 (77%)",
    "node_save": "182_500_379 (5%)",
}

This approach is simple and effective, but it does have drawbacks.

It makes the instructions count inaccurate. The profiling logic itself consumes cycles, making the instructions count inaccurate. I think we can limit this inaccuracy by making the profiler crate internally account for its own overhead and deducting those from its measurements.
It doesn't currently support depth, meaning that the profiler doesn't know that, in this specific example, it's unaware that load_keys is part node_load, but that's easy to introduce.
Syntax isn't the cleanest, but it can be improved with something like a macro.

roman-kashitsyn · 2023-02-03T12:58:39Z

profiler/src/lib.rs

+
+    #[cfg(not(target_arch = "wasm32"))]
+    {
+        0


Random idea: we can use CPU time as instruction count in native code.

roman-kashitsyn · 2023-02-03T12:59:27Z

Cargo.toml

+[features]
+profile = ["profiler"]


We won't have to declare the feature if it has the same name as the optional dependency.

dsarlis · 2023-02-03T13:55:13Z

Looks like a reasonable approach to me.

Reading the list of drawbacks you mention for the profiler included with ic-repl, I'm wondering whether it would make sense to try to fix those drawbacks instead of introducing our own custom profiler. It's always a bit easier to say "the current tooling isn't working well for us, let's create our own version" but I'm worried that we're reinventing our own approach when there might be a path forward that allows us to reuse existing machinery as much as possible.

That said, I'd also assume you've done some research already on the potential of reusing or what it would take to extend the current profiling tooling to cover our needs and maybe it's not worth it. Mostly I want to make sure we're making a conscious decision. In any case, I'd give this feedback for ic-repl to Yan and see if we can make improvements long term. The problems you noticed might be issues that other developers have hit or will hit at some point in the future.

ielashi · 2023-08-23T07:37:19Z

I've reincarnated this draft in #116.

profiling

a073c94

roman-kashitsyn approved these changes Feb 3, 2023

View reviewed changes

dsarlis approved these changes Feb 3, 2023

View reviewed changes

ielashi closed this Aug 23, 2023

ielashi mentioned this pull request Aug 23, 2023

feat: add basic profiling to benchmarks #116

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add profiling info to benchmarks #52

feat: add profiling info to benchmarks #52

ielashi commented Feb 3, 2023

roman-kashitsyn Feb 3, 2023

roman-kashitsyn Feb 3, 2023

dsarlis commented Feb 3, 2023

ielashi commented Aug 23, 2023

		[features]
		profile = ["profiler"]

feat: add profiling info to benchmarks #52

feat: add profiling info to benchmarks #52

Conversation

ielashi commented Feb 3, 2023

Problem

Solution

roman-kashitsyn Feb 3, 2023

Choose a reason for hiding this comment

roman-kashitsyn Feb 3, 2023

Choose a reason for hiding this comment

dsarlis commented Feb 3, 2023

ielashi commented Aug 23, 2023