-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set up airspeed velocity benchmarks #1049
Conversation
After getting this more or less set up according to the proposal in #419 (comment) (set up a nightly job to benchmark any new commits to master and publish results to GH Pages), here are some thoughts:
Timings available here: https://pvlib-benchmarker.github.io/pvlib-benchmarks/ |
Thanks @kanderso-nrel! I agree that the timings look pretty stable. cc @alorenzo175 (and thanks for your help with that VM). It makes sense to store the results and the runner script in a separate repository, but what is the motivation for storing the configuration file there too? I think locally running the performance suite without too much trouble is going to be important for keeping up with it. |
The motivation was because I didn't think asv would be amenable to storing environments and results in a different directory from the configuration file. I still think that is the case, but a simple workaround would be to keep the official configuration file on pvlib-python/master and have the scheduled job just copy the config file wherever it needs to go after fetching updates from master. It will also have to do some minor edits to the copy (the |
That seems reasonable. Any idea what pandas and xarray are doing to manage that? They both have |
I think much of the pydata stack (including pandas and xarray) get run by https://github.com/asv-runner/asv-runner. I'm not sure I understand it fully, but it seems like they copy the timing results over to another server (the pydata webserver, maybe?) instead of in a github repo, which means they can just run the benchmarks in-place in the main repo. I'm not very enthusiastic about trying to get it to work for our case. I've mostly been looking at the benchmarks for astropy and sympy, but they both keep the benchmark functions themselves in the second repo. I haven't run into any packages with the same situation as us (benchmark functions in main repo, results in another), but I haven't looked very hard. |
this section of the astropy repo seems reasonable enough. I'll defer to you on the best course of action. |
Ok, I think this is in a good spot. It should be relatively painless to get it working locally, and still pretty easy to automate for the nightly job. Anybody interested in trying it out? Hopefully |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments below. Can you also add an 0.8.1 whats new file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Any comments from @mikofski or @cwhanse? If not, I'll merge this tomorrow night and get the automated job set up for real. FYI here's an example showing timings from the two environments on my computer. Fairly dramatic difference, but the CPU was busy doing other things while running the benchmarks so it might not be a real effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had quick look through the readme, benchmarks, and asv.confs - it all looks great. Thank you!
No comments, all this is rather beyond me. Is there a way to judge where additional coverage could be important? |
Great question. I think we should examine the patterns of other packages using ASV. I think it might make sense to benchmark any functions that have the potential to become performance bottlenecks. Also I think anytime a function is enhanced specifically to improve performance, a benchmark should be required as part of the PR to demonstrate the improvement. To be safe, I think broad coverage is best, in case a future modification adversely affects existing performance, we can blame the specific commit responsible. Building the coverage has the potential to engage some "good 4 first timers" similar to the examples gallery, and it won't affect our CI build/test time because it runs on another test server. |
test failures are unrelated. Thanks @kanderso-nrel! |
solarposition.ephemeris(self.times_localized, self.lat, self.lon) | ||
|
||
def time_spa_python(self): | ||
solarposition.spa_python(self.times_localized[::5], self.lat, self.lon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kanderso-nrel any objection to removing the slice? Will it cause a blip in the asv results or will it the test be rerun over the whole commit history? I ask because it makes the tests a little harder to reason about and it seems like the run time would still be reasonable. I also wonder about the slice performance on a pandas index.
It looks like the [::30]
below is needed to keep a reasonable run time in that function. I still wonder if we'd be better off with a new index e.g. self.days = pd.date_range(start='20180101', end='20190101', freq='24h')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, those slices were trying to keep runtime to something more than milliseconds and less than tens of seconds. You raise a good point about the performance of the slice itself -- it was careless to include that in the timing function. Do you think it is valuable to use the same index for all the functions (meaning their timings can be compared apples-to-apples), or is it better to have separate indexes with various lengths defined in setup
so that each function runs about the same amount of time?
Will it cause a blip in the asv results or will it the test be rerun over the whole commit history?
I think asv will automatically invalidate the old results when the function changes (see 'version' here), but I don't think it would run the new function over the commit history unless you told it to. In this case I would probably delete the old results and regenerate them with the new function. It is easy enough to run timings for a single benchmark, it's like pytest where you can specify which functions you want to run.
Just checking, is there anything special about the index you suggested other than the number of elements? For example is there a difference between 365 points across a year vs 365 across a single day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info. I can make a PR for this.
something more than milliseconds
I skimmed the documentation for guidance for a minimum test time and didn't find anything. Milliseconds seems reasonable to me but I'm wondering if you have a reference.
Do you think it is valuable to use the same index for all the functions (meaning their timings can be compared apples-to-apples), or is it better to have separate indexes with various lengths defined in setup so that each function runs about the same amount of time?
I vote for apples to apples so long as the runtime constraints are satisfied.
Just checking, is there anything special about the index you suggested other than the number of elements?
Nothing special.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Milliseconds seems reasonable to me but I'm wondering if you have a reference
I do not, it just seemed like a good idea at the time. I see that the aoi function is ~2.5 ms and looks fine, so no pushback from me on having some fast benchmarks 👍
I can make a PR for this.
If you want, you could also fix a typo in the fuentes benchmark where I was inconsistent with windspeed
vs wind_speed
. Didn't show up before the v0.8 release because that benchmark only runs for >=v0.8.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1059 removes the slices
Tests addedUpdates entries todocs/sphinx/source/api.rst
for API changes.docs/sphinx/source/whatsnew
for all changes. Includes link to the GitHub Issue with:issue:`num`
or this Pull Request with:pull:`num`
. Includes contributor name and/or GitHub username (link with:ghuser:`user`
).Not ready for review yet; just opening the PR now because I need a branch on this repo to fetch onto the VM.Benchmark results hosted here: https://pvlib-benchmarker.github.io/pvlib-benchmarks/#/