Adding benchmark workflows #923

lipchev · 2021-04-30T05:34:32Z

updated the UnitsNet.Benchmark project (updated nugets & added frameworks)
added BenchmarkCategories to the existing benchmarks
added a few scripts for local testing (individual, multi-platform + rplots)
added an automatic benchmark workflow on pushing to [master] (some folders), running for ["netcoreapp50", "netcoreapp21", "net472"]
added workflows triggered on workflow_dispatch for running a benchmark (with options for comparing against a baseline)

Closes #920

First a few ground rules

As of this writing, manually triggered workflows (a.k.a. workflow_dispatch), are only visible from the master branch.
Composite actions have some nasty limitations
The Continuous Benchmark needs to be in a context with a commit in order to work
It publishes commits (with optional comments and alerts) targeting a given folder on a separate branch (typically gh-pages/benchmark/)- here is an example (all but the last benchmarks were run using a very-low iteration count- a sort of DryRun, so the wild fluctuations are expected). Here is another (that's consistently slower).
It does not support multi-line charts (out-of-the-box)
I read somewhere that the average performance of the virtual machines vary within about 10-15%
We can run the benchmarks for any framework from 472 onward (windows required)
We can easily configure all aspects of a benchmark using the command line arguments (there is also a dotnet benchmark tool but that seems to be doing the exact same things- although I suspect it's also going to get some 'compare results' functionality at some point which would likely remove the need for the next step)
There are a few result-comparing extensions out there- but I most liked the one from dotnet/performance
Just a few days ago, I knew almost none of that.

The workflows are well documented, I think you would easily figure out what's going on. Here are the highlights:

The automatic workflow runs the benchmarks (multiple frameworks), for each one exporting the result to the gh-pages/benchmarks- one framework one sub-folder
It is only supposed to run on the master branch & only on certain occasions (see here)
For each folder (framework) we have the chart data: data.js & index.html (here we append results) and the results folder containing the latest benchmark results, as exported by BenchmarkDotNet (we override these each time the script is executed)
We can then reference / compare results across branches & repositories (a fork doesn't have to have gh-pages in order to compare against upstream/gh-pages/benchmark/xxx/)
This can be done using the second action: which can be used to generate a report (the typical results folder) which can then be compared against a baseline (this step only runs if the baseline parameter points to a valid "*-report-full.json")

PS Some of the links point to lipchev/UnitsNet-Benchmarks which is the repository I used for testing. Unfortunately something got messed up during the rebase so I made the PR in a new branch (of my main fork) - this (as expected) did not trigger a benchmark-run as I pushed on a non master branch. As to why the manual flow isn't showing up in the actions menu- see 1).

PS2 You should still have access to the test repository (lipchev/UnitsNet-Benchmarks) if you want to test out the workflows- feel free to commit on it if you want, I plan to delete it after this is merged.

- updated the UnitsNet.Benchmark project (updated nugets & added frameworks) - added BenchmarkCategories to the existing benchmarks - added a few scripts for local testing (individual, multi-platform + rplots) - added an automatic benchmark workflow on pushing to [master] (some folders), running for ["netcoreapp50", "netcoreapp21", "net472"] - added workflows triggered on workflow_dispatch for running a benchmark (with options for comparing against a baseline)

angularsen · 2021-04-30T21:28:47Z

What the hell did you do in these commits? 😆

angularsen · 2021-04-30T21:37:59Z

I mean, wow, great job and great outline! I love it ❤️
I say, pull the trigger!

Letting you merge it, in case there is anything else to do before it merges.

The only thing I'm curious about, is how do we best use this tool to monitor regressions over time?
If I understand correctly, this only runs on master so we can't easily use it when reviewin PRs. Is there any way to alert us to give a heads up if it regresses significantly, say 50% slower on some tests?

Also, are there any limits on the free plan for actions we need to know about?

lipchev · 2021-04-30T21:42:31Z

There it's on- the action is running

lipchev · 2021-04-30T21:46:10Z

Here is the free plan limits:

Job execution time - Each job in a workflow can run for up to 6 hours of execution time. If a job reaches this limit, the job is terminated and fails to complete.

Workflow run time - Each workflow run is limited to 72 hours. If a workflow run reaches this limit, the workflow run is cancelled.

I'm currently running the matrix in parallel- so no way we could reach 6 hours. I'm not yet sure if running the benchmarks in parallel is any worse (I assume we get a random machine assignment).

Also I've been conservative- only 3 frameworks, but we could just as well test only one (or maybe 5-6).

lipchev · 2021-04-30T21:52:37Z

As for testing the PR's- we wouldn't want to run the benchmarks, for something like adding a new quantity. As for the performance affecting PR's - the person proposing the modification (like you and me, or anyone else- even without privilages) can run the 'run-benchmarks' workflow on his branch (targeting one of framework at time*) using the baseline, that's available on your gh-pages branch (which the person would not need to pull or anything).

I'll try to give an example in a few minutes.

angularsen · 2021-04-30T21:53:12Z

Sounds perfect

lipchev · 2021-04-30T21:57:50Z

Ok, here are the initial results:
https://angularsen.github.io/UnitsNet/benchmarks/net472/
https://angularsen.github.io/UnitsNet/benchmarks/netcoreapp21/
https://angularsen.github.io/UnitsNet/benchmarks/netcoreapp50/

And the detailed results:
https://github.com/angularsen/UnitsNet/tree/gh-pages/benchmarks/net472/results
https://github.com/angularsen/UnitsNet/tree/gh-pages/benchmarks/netcoreapp21/results
https://github.com/angularsen/UnitsNet/tree/gh-pages/benchmarks/netcoreapp50/results

lipchev · 2021-05-01T01:12:52Z

Ok, here is an example run from my newly created fork that returns a new list for each unknown abbreviation:

Here is the action log (net50)

And here are the results:

Runtime=.NET Core 5.0 Toolchain=netcoreapp50

|          Method |     Mean |    Error |   StdDev |  Gen 0 |  Gen 1 | Gen 2 | Allocated |
|---------------- |---------:|---------:|---------:|-------:|-------:|------:|----------:|
|           Parse | 49.42 μs | 0.980 μs | 1.048 μs | 1.7700 | 0.0610 |     - |  32.56 KB |
|   TryParseValid | 49.98 μs | 0.736 μs | 0.653 μs | 1.7700 | 0.0610 |     - |  32.54 KB |
| TryParseInvalid | 54.05 μs | 1.061 μs | 1.090 μs | 1.7090 | 0.0610 |     - |  32.16 KB |

..and the results of the comparison to the baseline
* (not comparing the allocations, but if you open the base-line from the link above you should be able to spot the difference)

summary:
worse: 3, geomean: 1.072
total diff: 3

| Slower                                                | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ----------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| UnitsNet.Benchmark.UnitsNetBenchmarks.TryParseInvalid |      1.10 |         48927.15 |         53800.22 |         |
| UnitsNet.Benchmark.UnitsNetBenchmarks.TryParseValid   |      1.08 |         46153.61 |         49879.57 |         |
| UnitsNet.Benchmark.UnitsNetBenchmarks.Parse           |      1.04 |         47511.59 |         49251.15 |         |

No Faster results for the provided threshold = 0.001% and noise filter = 0.3ns.

lipchev · 2021-05-01T01:33:41Z

Ok I only needed to fix one little thing. This in turn triggered a re-run, so now we should get two points on the chart.

Overall the benchmarks appear to be quite noisy- some of the operations seem to be too short. For those cases microsoft is doing a bunch of operations in a loop. I'm thinking of stealing this test, and putting it side by side with ours (sort of like a baseline for the IComparable interface, for which there is currently no benchmark).

lipchev · 2021-05-01T02:30:48Z

It's worse than what I expected- I re-ran the workflow to see what a third point (of the same code-base) would look like, and it kinda looks like "the machine" was (almost) consistently slower/faster on each run (except for ToUnit on net472 where we seem to have an inflection, but that's probably due to the benchmark itself).

It would probably smooth out if I run it a couple of more times (at different times), but I'm not sure how useful it would be for comparing results between branches. Maybe if I run the tests for a longer period, we would get some multi-modal distribution that when averaged out would give us a sample of the performance on a typical VM.

Anyways, the workflows seem to be doing what their supposed to. I'm sure an automatic comparison workflow can be devised that executes [on: pull_request] but I don't think it worth it for now (the manual-way doesn't take that many steps anyway).

This would be pretty cool to see run inside a stable self-hosted runner.

angularsen · 2021-05-07T21:33:50Z

I guess it might be tricky to get stable numbers from a shared pool of worker VMs.

This one observed 10-20% fluctuation.
https://github.com/rhysd/github-action-benchmark#stability-of-virtual-environment

lipchev · 2021-05-07T21:51:50Z

Yeah, pretty useless for any between-commit comparisons (might look ok in the long run, though I think I'm going to disable the auto-comment feature)..

Also- I hadn't foreseen it triggering on release, but yeah- I guess the project file is inside the UnitsNet folder :)

Anyways, we got lucky this time- Intel Xeon CPU E5-2673 v3 2.40GHz appears to have almost the same performance as Intel Xeon Platinum 8171M CPU 2.60GHz

PS I'm playing with a little hack for re-scaling the results (only for the charts)- that would probably smooth out the lines a little.

PS1 There is a bug in the QuantityFrom benchmark (I've fixed it locally).

PS2 There are at least two performance improvements I'm going to PR for soon (so that this wouldn't have been a complete waste of time..).

angularsen · 2021-05-07T22:18:23Z

You are on 🔥 🤩

lipchev added 2 commits April 30, 2021 06:46

fixing a typo

558c6e2

angularsen approved these changes Apr 30, 2021

View reviewed changes

lipchev merged commit 0f39bec into angularsen:master Apr 30, 2021

lipchev deleted the benchmark-workflows branch April 30, 2021 22:03

lipchev mentioned this pull request May 3, 2021

Plugins for the most common Continuous Integration Systems dotnet/BenchmarkDotNet#54

Open

lipchev mentioned this pull request Aug 28, 2021

First call of Quantity.From is extremely slow compared to subsequent calls #965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding benchmark workflows #923

Adding benchmark workflows #923

lipchev commented Apr 30, 2021 •

edited

Loading

angularsen commented Apr 30, 2021

angularsen commented Apr 30, 2021

lipchev commented Apr 30, 2021

lipchev commented Apr 30, 2021 •

edited

Loading

lipchev commented Apr 30, 2021

angularsen commented Apr 30, 2021

lipchev commented Apr 30, 2021

lipchev commented May 1, 2021 •

edited

Loading

lipchev commented May 1, 2021

lipchev commented May 1, 2021

angularsen commented May 7, 2021

lipchev commented May 7, 2021

angularsen commented May 7, 2021

Adding benchmark workflows #923

Adding benchmark workflows #923

Conversation

lipchev commented Apr 30, 2021 • edited Loading

angularsen commented Apr 30, 2021

angularsen commented Apr 30, 2021

lipchev commented Apr 30, 2021

lipchev commented Apr 30, 2021 • edited Loading

lipchev commented Apr 30, 2021

angularsen commented Apr 30, 2021

lipchev commented Apr 30, 2021

lipchev commented May 1, 2021 • edited Loading

lipchev commented May 1, 2021

lipchev commented May 1, 2021

angularsen commented May 7, 2021

lipchev commented May 7, 2021

angularsen commented May 7, 2021

lipchev commented Apr 30, 2021 •

edited

Loading

lipchev commented Apr 30, 2021 •

edited

Loading

lipchev commented May 1, 2021 •

edited

Loading