Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blr pedestal and performance #724

Merged
merged 5 commits into from
Nov 19, 2020
Merged

Conversation

andLaing
Copy link
Collaborator

@andLaing andLaing commented Jun 8, 2020

Refactorises the blr function removing the baseline calculation to be done outside the cythonised functions. In this way alternate functions can be tested.

Also changes the cythonised PMT loop for a standard map to improve performance. Per event time reduced from ~200 ms to ~25 ms on local machine.

@andLaing andLaing requested review from jjgomezcadenas and Aretno June 8, 2020 10:48
@andLaing andLaing force-pushed the blr-pedestal branch 2 times, most recently from 29a7de7 to e33965a Compare July 7, 2020 15:53
@andLaing
Copy link
Collaborator Author

I've rebased and redone some performance checks both with NEW data (R7985) and with some NEXT100 simulations which I passed through buffy and diomira using np.repeat(NEWCOEFFICIENTS, 5) (3 ms buffers with the S2 in the centre) as the blr coefficients (we don't have values for NEXT100 as yet).

On my local machine the comparison with timeit gives the following time per event:

  • NEW data event, MASTER: 238 ± 17.8 ms, PR: 25.1 ± 1.1 ms
  • NEXT100 sim. event, MASTER: 2.07 ± 0.28 s, PR: 225 ± 2 ms

I ran a simple loop over 10 events for each case too and the improvement at the same order (NEW 2.19 s/331 ms, N100 20.5 s / 3.02 s). I've attached a couple of plots of the output for PMT sum. As expected (the algorithm itself isn't changed) both branches give the same result.
N100_PRdeconvComp
NEW_PRdeconvComp

Copy link
Collaborator

@jjgomezcadenas jjgomezcadenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is true to the advertising lines:

-Refactorises the blr function removing the baseline calculation to be done outside the cythonised functions. In this way alternate functions can be tested.

True, and tests are added

  • Also changes the cythonised PMT loop for a standard map to improve performance. Per event time reduced from ~200 ms to ~25 ms on local machine.

Also true and demonstration added.

Code concise and to the point. Good job.

It does exactly what it claims

@jacg
Copy link
Collaborator

jacg commented Nov 16, 2020

IIUC, moving some code from Cython to pure-python-using-numpy resulted in a speedup by a factor of 8.

Do you have some high-level explanation for why the Cython was that much slower, or some general lesson that can be learned from this?

@andLaing
Copy link
Collaborator Author

Very empirical opinion maybe but the cython function that's replaced used an astype on the pmt array argument (copy), then defined an output array for the individual pmts using zeros then did a loop over each PMT appending the deconvolution results to a list. By using the map we avoid the copies and it's optimised for that type of looping.

The baseline calculation didn't really make any difference to performance (maybe even slightly slower but within error) but gives the user more flexibility.

My take away, without really having used cython, is that you don't seem to gain anything unless you're doing something novel that can't use the internal optimisations of numpy or python itself.

@jacg
Copy link
Collaborator

jacg commented Nov 16, 2020

Very empirical opinion maybe but the cython function that's replaced used an astype on the pmt array argument (copy), then defined an output array for the individual pmts using zeros then did a loop over each PMT appending the deconvolution results to a list. By using the map we avoid the copies and it's optimised for that type of looping.

Yes, in general you want to let Numpy do any looping over its arrays, rather than writing your own loops over numpy arrays.

(Well, I have seen Numba make hand-coded loops run faster than internal Numpy loops, but the gains were certainly not worth the hassle.)

My take away, without really having used cython, is that you don't seem to gain anything unless you're doing something novel that can't use the internal optimisations of numpy or python itself.

Indeed, this is the rule of thumb that everyone should follow (for Numpy; for plain Python it's a completely different story). If this is not obvious to everyone, then it's worth highlighting it here: Hand-coded Cython is unlikely to outperform things that Nupmy can do with internal loops!

So one might say that the problem with the original Cython function was that

  • It tried to re-invent a wheel that was already available in Numpy, and

  • as tends to be the case with re-invented wheels, it wasn't as good as the wheel which already existed.

Thanks for the summary ... and the PR!

@carmenromo carmenromo merged commit 7191327 into next-exp:master Nov 19, 2020
@andLaing andLaing deleted the blr-pedestal branch November 19, 2020 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants