Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add profiling info to benchmarks #52

Closed
wants to merge 1 commit into from
Closed

Conversation

ielashi
Copy link
Contributor

@ielashi ielashi commented Feb 3, 2023

Problem

The benchmarks currently return the number of instructions it took to execute each benchmark. While this number is useful to measure performance, it doesn't provide insight into where these instructions are being used and where the performance bottle necks are. Without this information, making informed performance optimizations would require a lot of trial and error.

Solution

The typical solution to this problem is to use some kind of profiler. ic-repl already supports profiling and can output a flamegraph of where instructions are being spent, but it has a few drawbacks that makes it difficult to use:

  1. The names of rust methods are mangled, even when debug = 1 is turned on, making it hard to make sense of the output.
  2. Each benchmark includes logic to first setup, and only after setup would we want to profile, so we'd need a way to programmatically tell the profiler to reset its measurements.
  3. Often we'd like to benchmark blocks of code that aren't functions.
  4. It's difficult to extend, meaning that we can't easily swap the "instruction count" for some other performance measurement, such as "pages written".

To address the issues above, this commit introduces a "poor man profiler". This profiler is manual, in the sense that the developer adds to the code hints for what they care about profiling. In this PR, I added a few of these hints, and the benchmarks now return an output that looks like this:

dfx canister call benchmarks btreemap_insert_blob_256_1024
[Canister rwlgt-iiaaa-aaaaa-aaaaa-cai] {
    "btreemap_insert": "3_511_533_555 (97%)",
    "load_keys": "1_145_464_345 (31%)",
    "load_values": "876_374_691 (24%)",
    "node_load": "2_808_121_190 (77%)",
    "node_save": "182_500_379 (5%)",
}

This approach is simple and effective, but it does have drawbacks.

  1. It makes the instructions count inaccurate. The profiling logic itself consumes cycles, making the instructions count inaccurate. I think we can limit this inaccuracy by making the profiler crate internally account for its own overhead and deducting those from its measurements.
  2. It doesn't currently support depth, meaning that the profiler doesn't know that, in this specific example, it's unaware that load_keys is part node_load, but that's easy to introduce.
  3. Syntax isn't the cleanest, but it can be improved with something like a macro.


#[cfg(not(target_arch = "wasm32"))]
{
0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random idea: we can use CPU time as instruction count in native code.

Comment on lines +23 to +24
[features]
profile = ["profiler"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't have to declare the feature if it has the same name as the optional dependency.

@dsarlis
Copy link
Member

dsarlis commented Feb 3, 2023

Looks like a reasonable approach to me.

Reading the list of drawbacks you mention for the profiler included with ic-repl, I'm wondering whether it would make sense to try to fix those drawbacks instead of introducing our own custom profiler. It's always a bit easier to say "the current tooling isn't working well for us, let's create our own version" but I'm worried that we're reinventing our own approach when there might be a path forward that allows us to reuse existing machinery as much as possible.

That said, I'd also assume you've done some research already on the potential of reusing or what it would take to extend the current profiling tooling to cover our needs and maybe it's not worth it. Mostly I want to make sure we're making a conscious decision. In any case, I'd give this feedback for ic-repl to Yan and see if we can make improvements long term. The problems you noticed might be issues that other developers have hit or will hit at some point in the future.

@ielashi
Copy link
Contributor Author

ielashi commented Aug 23, 2023

I've reincarnated this draft in #116.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants