-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding benchmark workflows #923
Conversation
- updated the UnitsNet.Benchmark project (updated nugets & added frameworks) - added BenchmarkCategories to the existing benchmarks - added a few scripts for local testing (individual, multi-platform + rplots) - added an automatic benchmark workflow on pushing to [master] (some folders), running for ["netcoreapp50", "netcoreapp21", "net472"] - added workflows triggered on workflow_dispatch for running a benchmark (with options for comparing against a baseline)
I mean, wow, great job and great outline! I love it ❤️ Letting you merge it, in case there is anything else to do before it merges. The only thing I'm curious about, is how do we best use this tool to monitor regressions over time? Also, are there any limits on the free plan for actions we need to know about? |
There it's on- the action is running |
Here is the free plan limits:
I'm currently running the matrix in parallel- so no way we could reach 6 hours. I'm not yet sure if running the benchmarks in parallel is any worse (I assume we get a random machine assignment). Also I've been conservative- only 3 frameworks, but we could just as well test only one (or maybe 5-6). |
As for testing the PR's- we wouldn't want to run the benchmarks, for something like adding a new quantity. As for the performance affecting PR's - the person proposing the modification (like you and me, or anyone else- even without privilages) can run the 'run-benchmarks' workflow on his branch (targeting one of framework at time*) using the baseline, that's available on your gh-pages branch (which the person would not need to pull or anything). I'll try to give an example in a few minutes. |
Sounds perfect |
Ok, here is an example run from my newly created fork that returns a new list for each unknown abbreviation: Here is the action log (net50) And here are the results: Runtime=.NET Core 5.0 Toolchain=netcoreapp50
|
Ok I only needed to fix one little thing. This in turn triggered a re-run, so now we should get two points on the chart. Overall the benchmarks appear to be quite noisy- some of the operations seem to be too short. For those cases microsoft is doing a bunch of operations in a loop. I'm thinking of stealing this test, and putting it side by side with ours (sort of like a baseline for the IComparable interface, for which there is currently no benchmark). |
It's worse than what I expected- I re-ran the workflow to see what a third point (of the same code-base) would look like, and it kinda looks like "the machine" was (almost) consistently slower/faster on each run (except for ToUnit on net472 where we seem to have an inflection, but that's probably due to the benchmark itself). It would probably smooth out if I run it a couple of more times (at different times), but I'm not sure how useful it would be for comparing results between branches. Maybe if I run the tests for a longer period, we would get some multi-modal distribution that when averaged out would give us a sample of the performance on a typical VM. Anyways, the workflows seem to be doing what their supposed to. I'm sure an automatic comparison workflow can be devised that executes [ This would be pretty cool to see run inside a stable self-hosted runner. |
I guess it might be tricky to get stable numbers from a shared pool of worker VMs. This one observed 10-20% fluctuation. |
Yeah, pretty useless for any between-commit comparisons (might look ok in the long run, though I think I'm going to disable the auto-comment feature).. Also- I hadn't foreseen it triggering on release, but yeah- I guess the project file is inside the UnitsNet folder :) Anyways, we got lucky this time- Intel Xeon CPU E5-2673 v3 2.40GHz appears to have almost the same performance as Intel Xeon Platinum 8171M CPU 2.60GHz PS I'm playing with a little hack for re-scaling the results (only for the charts)- that would probably smooth out the lines a little. PS1 There is a bug in the QuantityFrom benchmark (I've fixed it locally). PS2 There are at least two performance improvements I'm going to PR for soon (so that this wouldn't have been a complete waste of time..). |
You are on 🔥 🤩 |
Closes #920
First a few ground rules
workflow_dispatch
), are only visible from the master branch.dotnet benchmark tool
but that seems to be doing the exact same things- although I suspect it's also going to get some 'compare results' functionality at some point which would likely remove the need for the next step)The workflows are well documented, I think you would easily figure out what's going on. Here are the highlights:
data.js & index.html
(here we append results) and the results folder containing the latest benchmark results, as exported by BenchmarkDotNet (we override these each time the script is executed)PS Some of the links point to lipchev/UnitsNet-Benchmarks which is the repository I used for testing. Unfortunately something got messed up during the rebase so I made the PR in a new branch (of my main fork) - this (as expected) did not trigger a benchmark-run as I pushed on a non master branch. As to why the manual flow isn't showing up in the actions menu- see 1).
PS2 You should still have access to the test repository (lipchev/UnitsNet-Benchmarks) if you want to test out the workflows- feel free to commit on it if you want, I plan to delete it after this is merged.