-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baseline benchmarking tests #54
Conversation
at first glance it looks good, is there a easy way to publish the results? |
@RonnyPfannschmidt good question. Hopefully @ionelmc can comment. |
testing/test_multicall.py
Outdated
|
||
@pytest.mark.benchmark | ||
def test_hookwrapper_speed(benchmark): | ||
benchmark(inner_exec, [w1, w2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have something like
execute = MC(methods, {'arg1': 1, 'arg2': 2, 'arg3': 3}).execute
benchmark(execute, [w1, w2])
to avoid including the wrapper overhead in your timings. This would matter if the execute method is really really fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ionelmc the wrapper overhead is actually more true to call usage by a _HookCaller
which internally calls its _HookCaller._hookexec()
which in turn eventually invokes the PlugginManager._inner_hookexec()
.
My next move once this stuff lands is to add benchmarks around those higher level components such that we speed test the full API code paths.
@ionelmc by the way did you see our question about how to better plubish the benchmarking results for CI or on github? I noticed in your docs that you've got notes on how to render results to json but is there any known way to present results here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right. I don't know much about the pluggy internals.
About displaying results, and running pytest-benchmark in CI ... well, it's not good. CI will run each with lots of external bias, so timings aren't consistent. You could look at perf counters or profile data (my plugin only shows some profile data but not well integrated in the stats yet) or you could compare against a baseline (eg, you have a test that only does 10 function calls to establish a baseline to compare against - if there's a perf regression you will get slower than the baseline.
Just to restate, saving benchmark data somewhere and comparing against previous runs would be unreliable due to virtualization and contention in TravisCI. We can talk about instruction counters I guess (since those shouldn't care about preemption right?), what would be useful?
And testing on pypy further complicates things because of warmup (the plugin has such feature but you need to set it correctly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ionelmc I didn't even think about the effects of the CI server.. very good point.
or you could compare against a baseline (eg, you have a test that only does 10 function calls to establish a baseline to compare against - if there's a perf regression you will get slower than the baseline.
Can you elaborate on this? Are you suggesting this approach will work in CI?
I think I'll still include a baseline from the machine I develop on at least as a reference.
We can talk about instruction counters I guess (since those shouldn't care about preemption right?), what would be useful?
Yes this would be super useful. Can I pull this from the profiling data; I'm guessing it can't be used with --benchmark-compare-fail
?
4a2f399
to
d8d6fdd
Compare
I like the output, nice work! How do we plan to track if some change improves or decreases performance? Manually? Any idea if it would be possible to automate it without too much effort? |
@nicoddemus I was planning a couple progress tracking ideas:
I was going to add this in my next PR which will attempt to address #23 with accompanying benchmarks to make everyone confident about the changes. This PR should be viewed as just an initial (and non-invasive) introduction of |
Also note the appveyor builds are failing because of missing |
@tgoodlet awesome, sounds like a good plan. One idea: how about creating a new environment just to run the tests for pytest benchmark? That would avoid cluttering all runs with it, plus would give us flexibility to run different comparisons on the CI. |
Well it is a good practice, given that we also test all python versions in both Linux and Windows. It should be working though, pytest uses the same mechanism basically. |
Actually |
@nicoddemus ah I see the error now: @nicoddemus in terms of separate (tox) envs that sounds fine to me although won't we still be interested in performance diffs on each |
It seems to me the tests are abruptly terminating, which indicates a crash to me.
We might have different benchmark envs (bench-py27-pytest28, etc), although I feel that initially we should keep things simple with a single benchmark env; we want to keep benchmark logs on the repository for comparison, and adding another level of complexity might be overkill at this point.
Good point. In fact we can name it so pytest won't pick it by default ( [testenv:bench]
commands= py.test testing/benchmarks.py
deps=
pytest
pytest-benchmark |
You can also stick |
This begins an effort to incorporate run-time speed tests using `pytest-benchmark`. This initial test set audits the `_MultiCall` loop with hookimpls, hookwrappers and the combination of both. The intention is to eventually have a reliable set of tests which enable making core component modifications without disrupting performance as per #37.
@nicoddemus @RonnyPfannschmidt alright made the requested changes. If possible I'd like to get this PR in before adding more tests. |
Just made it even better by parametrizing over multiple hook/wrapper counts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Just a note: you might want to consider to push to your branches to your fork instead of the main repository; this avoids running the CI services twice. That's our recommendation in pytest and plugins at least. 👍
@nicoddemus I'm fine to do that if it's preferred. What do you mean by CI running twice? Seems to me it's the same either way? |
The end result is the same, is just that Travis and AppVeyor will run twice each: 1 for the PR and another for the branch (at least in the current configuration). Also this leaves the main repository cleaner without topic branches. But really, it is no big deal and I'm mentioning just because 4 builds might take longer than just two. 👍 |
This adds some basic speed tests for the
_MultiCall
loop.The goal is to extend this out so what we avoid performance degradation in addressing #23 and #15.
The next set of tests will include full code path(s) initiated by
_HookCaller
methods.I was also thinking of adding collection speed tests.
Treat this PR as a first draft to get some opinions going.
So far the results are about what I'd expect -
pypy
killing it,py27
andpy36
the runners up.