Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up airspeed velocity benchmarks #1049

Merged
merged 33 commits into from
Sep 10, 2020
Merged

Conversation

kandersolar
Copy link
Member

@kandersolar kandersolar commented Sep 5, 2020

  • Closes Add a performance benchmarking tool #419
  • I am familiar with the contributing guidelines
  • Tests added
  • Updates entries to docs/sphinx/source/api.rst for API changes.
  • Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • Pull request is nearly complete and ready for detailed review.
  • Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

Not ready for review yet; just opening the PR now because I need a branch on this repo to fetch onto the VM.

Benchmark results hosted here: https://pvlib-benchmarker.github.io/pvlib-benchmarks/#/

@kandersolar
Copy link
Member Author

After getting this more or less set up according to the proposal in #419 (comment) (set up a nightly job to benchmark any new commits to master and publish results to GH Pages), here are some thoughts:

  • The timings are pretty stable on the UA VM. Big performance improvements (e.g. 1c0c3e74) are obvious in the timeseries performance plots. The noise seems to be below 5-10% so far, which is pretty good IMO.
  • Benchmarking across API changes can be done, but it's a little clumsy. The version check trick I used in the SAPM cell temp benchmark works although it emits deprecation warnings for the commits from between the commit that deprecated the old function (5b4ba72) and v0.7.0. Could be that there's a clean way to filter by commit date.
  • The timing results are stored in a separate repo (https://github.com/pvlib-benchmarker/pvlib-benchmarks). The asv.conf.json configuration file is stored there as well. Unfortunately that complicates running the benchmark suite locally like @wholmgren wanted to be able to do (add airspeed velocity config and performance tests #530 (comment)). Is that a big deal?

Timings available here: https://pvlib-benchmarker.github.io/pvlib-benchmarks/

@wholmgren
Copy link
Member

Thanks @kanderso-nrel!

I agree that the timings look pretty stable. cc @alorenzo175 (and thanks for your help with that VM).

It makes sense to store the results and the runner script in a separate repository, but what is the motivation for storing the configuration file there too? I think locally running the performance suite without too much trouble is going to be important for keeping up with it.

@kandersolar
Copy link
Member Author

The motivation was because I didn't think asv would be amenable to storing environments and results in a different directory from the configuration file. I still think that is the case, but a simple workaround would be to keep the official configuration file on pvlib-python/master and have the scheduled job just copy the config file wherever it needs to go after fetching updates from master. It will also have to do some minor edits to the copy (the benchmark_dir path, specifically) but I think that should be straightforward.

@wholmgren
Copy link
Member

That seems reasonable. Any idea what pandas and xarray are doing to manage that? They both have asv_bench/asv.conf.json files.

@kandersolar
Copy link
Member Author

I think much of the pydata stack (including pandas and xarray) get run by https://github.com/asv-runner/asv-runner. I'm not sure I understand it fully, but it seems like they copy the timing results over to another server (the pydata webserver, maybe?) instead of in a github repo, which means they can just run the benchmarks in-place in the main repo. I'm not very enthusiastic about trying to get it to work for our case.

I've mostly been looking at the benchmarks for astropy and sympy, but they both keep the benchmark functions themselves in the second repo. I haven't run into any packages with the same situation as us (benchmark functions in main repo, results in another), but I haven't looked very hard.

@wholmgren
Copy link
Member

this section of the astropy repo seems reasonable enough. I'll defer to you on the best course of action.

@kandersolar kandersolar marked this pull request as ready for review September 9, 2020 04:06
@kandersolar
Copy link
Member Author

Ok, I think this is in a good spot. It should be relatively painless to get it working locally, and still pretty easy to automate for the nightly job. Anybody interested in trying it out? Hopefully benchmarks/README.md will provide sufficient guidance...

@wholmgren wholmgren added this to the 0.8.1 milestone Sep 9, 2020
Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments below. Can you also add an 0.8.1 whats new file?

Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kandersolar
Copy link
Member Author

Any comments from @mikofski or @cwhanse? If not, I'll merge this tomorrow night and get the automated job set up for real.

FYI here's an example showing timings from the two environments on my computer. Fairly dramatic difference, but the CPU was busy doing other things while running the benchmarks so it might not be a real effect.

image

Copy link
Member

@mikofski mikofski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had quick look through the readme, benchmarks, and asv.confs - it all looks great. Thank you!

@cwhanse
Copy link
Member

cwhanse commented Sep 10, 2020

No comments, all this is rather beyond me. Is there a way to judge where additional coverage could be important?

@mikofski
Copy link
Member

Is there a way to judge where additional coverage could be important?

Great question. I think we should examine the patterns of other packages using ASV. I think it might make sense to benchmark any functions that have the potential to become performance bottlenecks. Also I think anytime a function is enhanced specifically to improve performance, a benchmark should be required as part of the PR to demonstrate the improvement.

To be safe, I think broad coverage is best, in case a future modification adversely affects existing performance, we can blame the specific commit responsible. Building the coverage has the potential to engage some "good 4 first timers" similar to the examples gallery, and it won't affect our CI build/test time because it runs on another test server.

@wholmgren
Copy link
Member

test failures are unrelated. Thanks @kanderso-nrel!

@wholmgren wholmgren merged commit 30d784e into pvlib:master Sep 10, 2020
@kandersolar kandersolar deleted the asv_setup branch September 14, 2020 02:00
solarposition.ephemeris(self.times_localized, self.lat, self.lon)

def time_spa_python(self):
solarposition.spa_python(self.times_localized[::5], self.lat, self.lon)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kanderso-nrel any objection to removing the slice? Will it cause a blip in the asv results or will it the test be rerun over the whole commit history? I ask because it makes the tests a little harder to reason about and it seems like the run time would still be reasonable. I also wonder about the slice performance on a pandas index.

It looks like the [::30] below is needed to keep a reasonable run time in that function. I still wonder if we'd be better off with a new index e.g. self.days = pd.date_range(start='20180101', end='20190101', freq='24h')

xref SolarArbiter/solarforecastarbiter-core#567 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, those slices were trying to keep runtime to something more than milliseconds and less than tens of seconds. You raise a good point about the performance of the slice itself -- it was careless to include that in the timing function. Do you think it is valuable to use the same index for all the functions (meaning their timings can be compared apples-to-apples), or is it better to have separate indexes with various lengths defined in setup so that each function runs about the same amount of time?

Will it cause a blip in the asv results or will it the test be rerun over the whole commit history?

I think asv will automatically invalidate the old results when the function changes (see 'version' here), but I don't think it would run the new function over the commit history unless you told it to. In this case I would probably delete the old results and regenerate them with the new function. It is easy enough to run timings for a single benchmark, it's like pytest where you can specify which functions you want to run.

Just checking, is there anything special about the index you suggested other than the number of elements? For example is there a difference between 365 points across a year vs 365 across a single day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. I can make a PR for this.

something more than milliseconds

I skimmed the documentation for guidance for a minimum test time and didn't find anything. Milliseconds seems reasonable to me but I'm wondering if you have a reference.

Do you think it is valuable to use the same index for all the functions (meaning their timings can be compared apples-to-apples), or is it better to have separate indexes with various lengths defined in setup so that each function runs about the same amount of time?

I vote for apples to apples so long as the runtime constraints are satisfied.

Just checking, is there anything special about the index you suggested other than the number of elements?

Nothing special.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Milliseconds seems reasonable to me but I'm wondering if you have a reference

I do not, it just seemed like a good idea at the time. I see that the aoi function is ~2.5 ms and looks fine, so no pushback from me on having some fast benchmarks 👍

I can make a PR for this.

If you want, you could also fix a typo in the fuentes benchmark where I was inconsistent with windspeed vs wind_speed. Didn't show up before the v0.8 release because that benchmark only runs for >=v0.8.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1059 removes the slices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a performance benchmarking tool
4 participants