Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance test #275

Merged
merged 18 commits into from
Dec 28, 2024
Merged

Conversation

jagerber48
Copy link
Contributor

@jagerber48 jagerber48 commented Dec 16, 2024

  • Closes Benchmarking tests #274
  • Executed pre-commit run --all-files with no errors
  • The change is fully covered by automated unit tests
  • Documented in docs/ as appropriate
  • Added an entry to the CHANGES file

add a performance benchmark test. This test is important especially to ensure #262 doesn't introduce a performance regression.

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.61%. Comparing base (969324d) to head (6927711).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #275      +/-   ##
==========================================
+ Coverage   96.50%   96.61%   +0.10%     
==========================================
  Files          16       17       +1     
  Lines        1919     1947      +28     
==========================================
+ Hits         1852     1881      +29     
+ Misses         67       66       -1     
Flag Coverage Δ
macos-latest-3.10 95.06% <100.00%> (+0.12%) ⬆️
macos-latest-3.11 95.01% <100.00%> (+0.07%) ⬆️
macos-latest-3.12 95.01% <100.00%> (+0.07%) ⬆️
macos-latest-3.8 95.01% <100.00%> (+0.07%) ⬆️
macos-latest-3.9 95.01% <100.00%> (+0.07%) ⬆️
no-numpy 75.14% <100.00%> (+0.36%) ⬆️
ubuntu-latest-3.10 95.01% <100.00%> (+0.07%) ⬆️
ubuntu-latest-3.11 95.01% <100.00%> (+0.07%) ⬆️
ubuntu-latest-3.12 95.01% <100.00%> (+0.07%) ⬆️
ubuntu-latest-3.8 95.01% <100.00%> (+0.07%) ⬆️
ubuntu-latest-3.9 95.01% <100.00%> (+0.07%) ⬆️
windows-latest-3.10 95.01% <100.00%> (+0.07%) ⬆️
windows-latest-3.11 95.01% <100.00%> (+0.07%) ⬆️
windows-latest-3.12 95.01% <100.00%> (+0.07%) ⬆️
windows-latest-3.8 95.01% <100.00%> (+0.07%) ⬆️
windows-latest-3.9 95.01% <100.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jagerber48
Copy link
Contributor Author

For the raw benchmark threshold I set that str(sum(ufloat(1, 0.1) for _ in range(100000))) should execute in 0.75 * 4 seconds. This came from this code executing in ~0.75 s on my laptop.

However already this threshold was too stringent for the github action runners. My question: how should we put in this benchmark? Should I just increase the factor to 8? Do something else? I'm not sure a good way to test for performance regression since absolute timing is so system dependent.

@andrewgsavage
Copy link
Contributor

there's libraries built spefically for this purpose rather than writing your own tests like that. pint uses codspeed

@jagerber48
Copy link
Contributor Author

Perfect, that's exactly the sort of thing I will look for. I will try to integrate codspeed.

@jagerber48
Copy link
Contributor Author

I made a complexity only test which runs the benchmark for n = [10, 100, ...] and checks that log10(t/t0) is equal to log10(n/n0) to within 10%. Looking at t vs n on a log plot the data is very linear, so I expect this test to be relatively robust. It should definitely catch regressions if t becomes proportional to n^2 instead of n. The worry is that it will occasionally fail erroneously (false negative).
image
x axis is n, y axis is t in seconds. solid lines are linear interpolation between points (no fit or anything)

@jagerber48
Copy link
Contributor Author

I'm hitting a variety of issues with establishing the connection to codspeed.io, but I'll keep working at it.

@newville
Copy link
Member

@jagerber48 @andrewgsavage Thanks for this. I approved the codspeed integration request.

I also agree with the observation in #274 that the lazy evaluation from @lebigot is very impressive and nicely avoids calculations of uncertainties on intermediate quantities that are never needed in isolation. Clever! And, yes, I can see that this would be easy to slip up when refactoring. So, thanks, and sorry I can't be more help at this time.

@jagerber48
Copy link
Contributor Author

jagerber48 commented Dec 17, 2024

@newville thank you for approving the integration request. However I see that further configuration is needed. I got a message that the configuration can only be done by an organization owner (which I am not).

This is the configuration that I think needs to be done
image

I access it from the repositories page on my codecov.io account page, then I click to import a new repository (+) and then configure the lmfit organization. Looks like within the uncertainties team in lmfit org that @newville and @wshanks are owners.

I've never used codspeed, I don't know if further "owner-only" configuration will be necessary after this.

@jagerber48
Copy link
Contributor Author

Ok, it looks like I am past the issue from my last comment. Not sure if that is due to someone else doing something, or if it just started working. Either way, the ball is in my court now to setup the CI to push some runs of the benchmarking up to codspeed.

Copy link

codspeed-hq bot commented Dec 18, 2024

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

🆕 5 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

  • test_repeated_summation_speed[100000] (2.9 s)
  • test_repeated_summation_speed[10000] (275.4 ms)
  • test_repeated_summation_speed[1000] (27.1 ms)
  • test_repeated_summation_speed[100] (2.8 ms)
  • test_repeated_summation_speed[10] (402.3 µs)

assert 0.9 * log10(n / n0) < log10(t / t0) < 1.1 * log10(n / n0)


@pytest.mark.benchmark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably parameterise this test with n_list and have it benchmark each num

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nesting the benchmark and parametrize marks works in my local environment, but seems to make the github action hang forever running the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok got it. The problem is I was mixing my internal benchmarking using timeit with what codspeed was trying to do. The tool is very nice.

Will cleanup the tests a little bit now that I know what's going on then get this ready for review.

@jagerber48
Copy link
Contributor Author

This is ready for review. @andrewgsavage you seem like the natural person for this one.

return result


def test_repeated_summation_complexity():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen a test similar to this that kept randomly failing so I don't know how reliable this test will be.
I think test_repeated_summation_speed will catch any increases in time so these tests are testing the same thing. I'm not opposed to this test- we can leave it in for now, and if it we find it unreliable we can remove it.

I'll add a link to PR with the graph you plotted as it helps makes sense of this test. It'll be good to see this plotted for your other PR too.

@jagerber48 jagerber48 merged commit 61f688f into lmfit:master Dec 28, 2024
22 checks passed
@jagerber48 jagerber48 deleted the feature/benchmark_test branch December 28, 2024 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmarking tests
3 participants