Cython functions for detsim #756

mmkekic · 2020-11-18T18:29:05Z

This PR adds cython classes for light tables on sipms and pmts for S2 signal, and a loop over ionization electrons producing a waveform-shaped matrix of expected photon values.

mmkekic · 2020-12-13T21:53:10Z

I have rebased on accepted #757 PR since they share some common functions and test files

jacg · 2020-12-14T06:46:57Z

Are there any benchmarks that justify the use of Cython?

mmkekic · 2020-12-14T15:54:46Z

The problem is that there are no reasonable alternatives that solve the exact problem : having a matrix of
(number_of_electrons (~100 000) X number_of_time_slices (10) ) X number_of_sensors (~3000) to then be rebinned into
some_arbitrary_binning X number_of_sensors. Placing the original matrix in memory is impossible without the approximations (naive estimate is > 20GB). @gondiaz tried to follow this approach with some approximations managing ~5s per event and <1.5 GB in https://github.com/gondiaz/IC/tree/detsim_v0. However, this approach lowers our desired resolution and does not scale well with the type of event.
Looping over numpy array is never a good idea but that was also tried in https://github.com/gondiaz/IC/tree/detsim_loop_based, however again adding some approximations. The memory is <0.7 GB and time is ~20s per event. Without the unjustified approximation the time was measured in minutes (we needed to use slow numpy.add.at function).
The cython one keeps the same memory consumption (is basically the memory of input and output arrays) and is ~6s per event, without using any approximation. However it is a bit ugly, in order to make it modular and flexible but still fast I had to use a c pointer directly, other approaches implied slicing of numpy memory view that was ~2x slower. Some cython trials are in https://github.com/mmkekic/detsim_checks repository (Warning: the code there is really just a messy draft).

Are there any benchmarks that justify the use of Cython?

jacg · 2020-12-14T16:43:14Z

Is there somewhere I can read a high-level description of the computational task?

jacg · 2020-12-14T17:04:50Z

Is there somewhere I can read a high-level description of the computational task?

Put another way: can you describe

the structure and size of the data that go into the process
the structure and size of the data that come out
The relationship between 1. and 2.

mmkekic · 2020-12-14T18:18:22Z

Is there somewhere I can read a high-level description of the computational task?

I am afraid not, but I tried to prepare a document that explains the process. If there are unclear parts please ask.
s2_description.pdf
Hope it can be useful to @jmalbos as well.

jacg · 2020-12-15T15:06:13Z

My current understanding is something like this:

Input

es: Electrons ✕ 1.5 ✕ 10⁵

x y t nph

f64 f64 f64 u64

Total space: 4 ✕ 8 ✕ 1.5 ✕ 10⁵ -> 5MB

Important question: are these already ordered on t?
LT(z, r) -> p: Light table

AFAICT, this is a function: (f64, f64) -> f64 implemented as a lookup table
with 10 different z-slices, and an unspecified number of r-slices, which
therefore occupies

f64 ✕ 10 (zs) ✕ ? (rs) -> 1MB / 12 r-slices

Output

f64 ✕ number of sensors ✕ number of time bins

SiPM responses

f64 ✕ 4000 ✕ 2000 -> 64 MB
PMT responses

f64 ✕ 60 ✕ 200000 -> 96 MB

(You also mention applying a Poisson distribution, but it seems that it is not
part of this calculation.)

Calculation

Very roughly, for SiPMs

dv = drift velocity
zs = z-positions of 10 uniformly distributed slices of EL plane
for each electron
    for each sensor
        r = electron-sensor distance in x-y plane
        ps = LT(r, zs) # 10 different probabilities
        for p,z in ps,zs
            time_bin = bin(t + z / dv)
            response_contribution = p * nph / 10 
            add response_contribution to appropriate (sensor, time_bin)

For PMTs it's almost the same, just that the z-distribution is collapsed to a
single point, for the purpose of the light table probabilities.

Discussion

In your earlier comment you allude to memory being a problem because there are
10⁵ electrons ✕ 10 time slices ✕ 3000 sensors.

I assume (trust?, hope!) that the detector spits out results in chronological
order. If this is the case, you only ever need to hold 10 time-slices
(corresponding to the EL-layer z-slices) in memory: as soon as an electron
appears whose t lies after the end of your earliest in-memory time-slice, that
time slice can be yielded, and a fresh one added to your in-memory time-slice
buffer.

You also mentioned arbitrary binning ✕ number of sensors in the context of
excessive memory use. Again, if the input is time-ordered, then your first
binning pass can produce a stream of time-ordered time slices, which do not need
to be kept in memory simultaneously. A second pass (or a second component just
downstream, in the same pass) can then regroup these into arbitrarily binned
slices.

My original question was about benchmarks which justify the use of Cython. That
means there should be a pure python implementation and an equivalent Cython
implementation, and their run-times (or memory usage) should be compared. I
don't see how the discussion of memory is pertinent to this question: Cython
doesn't magically make numpy arrays occupy less memory.

Fundamentally, you have to show that your Cython solution is significantly more
efficient than your equivalent pure Python solution. Without that[*], you
shouldn't take on the burden of replacing Python with a less flexible language.

[*] Or some other compelling advantage. For example, it might be reasonable to
rewrite Python code in Rust, even if the Rust version offers no significant
speed improvement, because of the safety guarantees and lower long-term cost of
development and maintenance of code written in Rust.

Cython offers very little advantage over Python, beyond speed of execution, but
it is significantly more painful to work in. So you really should make sure that
it is significantly faster, before choosing Cython over pure Python.

Questions

Have you tried streaming your data, rather than keeping everything in
memory at once?
What do you do with these values once they have been generated? (Write to
file? Use immediately? If the latter, how?)
What is an acceptable run time, per event?
What benefits does Cython bring to this problem, that are not available in
pure Python?

mmkekic · 2020-12-15T19:47:11Z

Ah, ok, I think I was not clear enough.

Looping over numpy is guaranteed to be slower than cython loop over numpy array but using fast (c-implemented) numpy operations on vectors is usually not. Hence the main goal would be to solve the problem using numpy operations on arrays directly, as was our first approach. This required putting number_of_electrons x number_of_partitions x number_of_sensors in memory and doing multiplications and histogrammings, and this is memory consuming (20+ Gb).
The actual input/output array sizes are not a problem for now (I agree that having all this array in memory is not optimal and could be a problem in the future, but this would need to be fixed in other parts of IC as well, for example irene/diomira where the full array is read and manipulated).

Second approach was to do a loop over electrons and use numpy vector manipulations inside the first loop. This was order of 10 minutes . Doing all three loops explicitely is slower than that.

Then we tried to use cython.

I tried several other approaches, the most promising one was to first group electrons in voxels determined by the psf file (ie distance and EL time partitions), and then loop over sensors and do the calculations using numpy fast bincount. This was around 1.2 minutes per event.

The conclusion: we didnt manage to think of any pure python approach that can remotely come close to cython speed without the excessive memory usage.

jjgomezcadenas · 2020-12-16T07:59:56Z

My take: While premature optimisation may be the root of evil, this is a case where a number of other alternatives have been tried and discarded, cython being the most effective solution and an accepted tool for IC. So I think it is OK, for now. Alternatives to write fast code based on rust could be attractive, and worth to explore when/if relevant.

…

On 15 Dec 2020, at 20:47, mmkekic ***@***.***> wrote: Ah, ok, I think I was not clear enough. Looping over numpy is guaranteed to be slower than cython loop over numpy array but using fast (c-implemented) numpy operations on vectors is usually not. Hence the main goal would be to solve the problem using numpy operations on arrays directly, as was our first approach. This required putting number_of_electrons x number_of_partitions x number_of_sensors in memory and doing multiplications and histogrammings, and this is memory consuming (20+ Gb). The actual input/output array sizes are not a problem for now (I agree that having all this array in memory is not optimal and could be a problem in the future, but this would need to be fixed in other parts of IC as well, for example irene/diomira where the full array is read and manipulated). Second approach was to do a loop over electrons and use numpy vector manipulations inside the first loop. This was order of 10 minutes . Doing all three loops explicitely is slower than that. Then we tried to use cython. I tried several other approaches, the most promising one was to first group electrons in voxels determined by the psf file (ie distance and EL time partitions), and then loop over sensors and do the calculations using numpy fast bincount. This was around 1.2 minutes per event. The conclusion: we didnt manage to think of any pure python approach that can remotely come close to cython speed without the excessive memory usage. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#756 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5SID23AXX2DUAKP7FMJILSU64M5ANCNFSM4T2KXU2Q>.

jacg · 2020-12-16T09:32:39Z

By all means accept this PR, and get on with your lives. This is the best thing
you can do in the short term. [You should probably stop reading this now, and
ignore what follows.]

However, as you know, I take the long-term view: Investing in some careful
thought now, pays handsome dividends in the future. Avoid technical debt.

In general, one shouldn't use Cython unless it solves a problem that cannot
adequately be solved in pure Python. On the evidence I have seen and understood
so far, I remain unconvinced that this is the case here.

Some technical details:

The only argument I have seen that sounds vaguely convincing, is that a
hand-written loop over a numpy array is completely unviable in pure Python,
and rewriting this loop in Cython makes it acceptable.
I suspect that the overall structure of the algorithm and the data structures
used are misguided, which leads to the illusion chucking Cython at it, is a
good solution. I wouldn't be surprised if completely rethinking the algorithm
leads to a pure Python solution which significantly outperforms the proposed
Cython solution. However, I am not (yet) in a position make a pronouncement on
this.

(As for Rust, it offers many, many advantages which Cython does not, but it
also brings greater costs than Cython.)

jacg · 2020-12-16T15:43:01Z

Are there data somewhere in the IC repo that can be fed in to these functions in order to make a benchmark?

mmkekic · 2020-12-16T16:46:19Z

No, the data are too large. In abovementioned repository https://github.com/mmkekic/detsim_checks, in README there is a dropbox link to download the data that should be run through the full city, as well as a full light tables.

The branch that has a working version of a full city is https://github.com/mmkekic/IC/tree/detsim_unification, hence to make it work you would need to checkout the branch, compile cython and run standard
city detsim detsim.conf , where detsim.conf file is also provided in the https://github.com/mmkekic/detsim_checks repository (or use run_detsim.py script that I used to run memory profiler). Note that here you run a full city flow and not just the desired function.

The other option is to give random x, y, time, number_of_photons arrays (of a size ~110k) but this might not describe the reality well if you plan to try to use some grouping or voxelization.

jacg · 2020-12-16T17:48:28Z

Why don't you add

the code in https://github.com/mmkekic/detsim_checks
the files in the Dropbox you mention

to the IC detsim_unification branch?

The first can be placed in an arbitrary convenient subdirectory of IC, the
second can be added via Git LFS. Both can be removed from the branch before the
branch is merged into master (if the branch is supposed to be merged).

As it stands, with everything scattered all over the place (across different
repositories and file-sharing services) the activation energy to get going with
this is unhelpfully huge, and the maintainability and feeding back of others'
contributions will also suffer greatly.

(Hard-wired absolute paths to one developer's local filesystem, don't help
either!)

mmkekic · 2020-12-16T18:51:16Z

I added the files to the testfiles folder and detsim_benchmark folder has the corresponding config and run_detsim.py script, hence you need to checkout the branch detsim_unification and run the script (and so the typical setup IC + compile cython)
I havent added other trials from my scratch repository cause they were mostly used to test different implementations, all involving cython.
If you have issues in setting this up, or further questions about the flow please ask.

( I don't think it is a good practice to add unnecesarry files to our LFS server but I believe in this case is justified :) )

jacg · 2020-12-17T09:46:52Z

Thanks. This should make it much easier to contribute.

I have some comments up-front:

I don't think it is a good practice to add unnecesarry files to our LFS server

Benchmarks are necessary if you are contemplating adding Cython to the
codebase (or indeed, doing any type of optimization).
These files are necessary for the benchmarks to run.
Ergo, these files are necessary and therefore should be added to the
repository.
These files are too big to reasonably add as normal git files.
Ergo, these files must live on the LFS server.

It is incorrect to classify these files as unnecessary. It makes about as much
sense as saying that we shouldn't commit tests, because they're not necessary.
Benchmarks, like tests, give vital and independently-verifiable information
about the behaviour of the code. They are absolutely necessary.

Consequently, the commit title

Add files for detsim benchmarks (DO NOT MERGE THIS COMMIT TO MASTER)

is wrong. This commit must be merged into master (if the code that it helps to
benchmark gets merged). The commit should be reverted before merging: these
files shouldn't pollute our working trees in day-to-day work, but we should be
able to retrieve them from history, so that we can run the benchmarks easily at
any time we need to, in the future.

mmkekic · 2020-12-17T10:07:57Z

I agree partially, but the problem (as usual) is that we are doing development without a clear idea of what future flexibility we need. For example, the light tables format and dependencies can change, hence the light tables reading should be flexible (actually the format of the table did change during the development so we would need to commit them 2x if we uploaded the files to the server from the start). The event type can change, there could be events with more hits (ie electrons) so whatever implementation should be at most linear with the number of electrons (both in time and memory consumption).
Basically I wouldn't take these files as an ultimate benchmark test, rather as a guidance for the current implementation since we need this code asap...
(I also realized these files are ~100 MB so I really shouldn't make such a fuss about storing them on our server)

jacg · 2020-12-17T14:42:52Z

I'm confused: Should I be able to find the function called create_wfs in this PR, anywhere in the detsim_unification branch?

jacg · 2020-12-17T14:48:51Z

I'm confused: Should I be able to find the function called create_wfs in this PR, anywhere in the detsim_unification branch?

OK, so it's approximately electron_loop.pyx/electron_loop.

jacg · 2020-12-17T15:09:07Z

What is the relationship between the detsim_v2.0 branch (this PR) and detsim_unification? Is it that the former aims to merge a subset of the work done in the latter, with the rest of the latter following in a later PR?

It's really annoying to be faced with two versions of the code which are similar-but-different. To review this PR it must be possible to run benchmarks for it. Having to switch to another branch and run a different version of the code in order to run the benchmarks, is not really an reasonable way of working!

mmkekic · 2020-12-17T15:41:46Z

I agree! @gonzaponte was against the idea of putting a full city in one PR so we divided it in a bunch of small ones. However they are not independent and were developing simultaniously so we ended up with a spaghetti code with merge conflicts.
detsim_unification branch was a quick fix of the merge conflicts providing a working version but is not the final one since bunch of cosmetics changes were made in the 'official' branches. I am not sure what is the best way to proceed.
The PRs needed for the full detsim city are #694, #695, #757 that should all be merged before this one. And the final city flow PR will depend on all of them. The main issue is that we started developing before having a clear algorithm and the correlations between modules kept changing (@gondiaz that started implementing the detsim has 14 detsim branches, this gives you the idea of the mess we have).

If it will make things easier I (or @gondiaz since he is supposed to open a final city body PR once all the rests are approved) can do (yet another!) branch that should resemble more a final version...

gonzaponte · 2020-12-17T17:09:22Z

Originally the work was divided into different tasks that were (in principle) independent of each other. I am not really sure how we got here, but I guess there were not so independent after all. My involvement has been discontinuous and perhaps counterproductive since I couldn't really follow its evolution.

mmkekic · 2020-12-18T15:52:20Z

@jacg , I have rebased detsim_unification branch on this PR (and other PRs). You would need to do a hard reset since there were conflicts.
What is the status of this? Will anyone actually try some better approach or shell we proceed with the review accepting cython?

jmalbos · 2021-02-02T10:25:55Z

I think this is in much better shape now, but I'd suggest some commit squashing before merging.

mmkekic · 2021-02-02T12:22:09Z

Commits are squashed and the branch is rebased on #757

jmalbos · 2021-02-02T12:38:17Z

I approve this PR.

#757 [author: gondiaz] This PR contains the detsim functions that simulate the S1 signal from the total number of photons using the S1 LightTable. It is a single module `simulate_S1.py` and its corresponding test module. Also, the S1 LightTable test file is added. This PR, along with PR #756, overrides PRs #698, #699 and #734. [reviewer: mmkekic] This PR adds functionalities for simulation of the S1 scintillation light needed for detsim city. The code is clear and tested, good job!

All light tables classes should inherit from this one

Cryptic lines have explanatory comments

The position of z inside EL gap is hardcoded to the middle of EL. Cryptic lines have explanatory comments Private __extend_lt_bounds method extends light tables to active_radius size

Tests added: test for LT_SiPM check el_gap and active_r set properly test for LT_SIPM get values test for LT_PMT get_values test for LT_PMT values with extended radius test get_values return 0s for non existing sensors

The function loops over ionization electrons and creates waveforms of expected values (the output should be poissonised for real waveforms) Cryptic parts of the code have explanatory comments

The spreading is to be used in case of PMTs light simulation since the photon distribution of one electron is assumed to be uniform across EL gap.

Tests are: test for correct output shape test for tmin parameter tests for waveforms integrated signal test for time distribution signal on pmts test for time distribution signal on sipms

This slows down a code around 10-20%

#769 [author: gondiaz] This is the last of the series of PRs #757 #756 #695 #694. It contains detsim city flow module that performs a fast detector simulation. From MC hits, it simulates inonization electron creation, drift and diffusion. Then it simulates the photon propagation based on previously computed light-tables, which depend on detector geometry. The final output are the signal waveforms at sensors (without electronic simulation). [reviewer: mmkekic] This PR is the last piece of the long lasting effort to implement fast detector light simulation in IC. The PR introduces the final city flow that is using already approved functionalities. Code is readable and appropriate tests are added. Good job!

mmkekic requested a review from jmalbos November 18, 2020 18:29

mmkekic force-pushed the detsim_v2.0 branch from affc7a7 to 3061e40 Compare November 18, 2020 18:42

mmkekic changed the title ~~cython functions for detsim~~ Cython functions for detsim Nov 18, 2020

This was referenced Nov 19, 2020

Detsim S1 simulation #757

Merged

Detsim get psf functions #698

Closed

Detsim waveform creation functions #699

Closed

Detsim signal simulation module. #734

Closed

mmkekic force-pushed the detsim_v2.0 branch from 3061e40 to 69d3537 Compare December 13, 2020 21:51

mmkekic force-pushed the detsim_v2.0 branch from b114388 to 6c8e4de Compare February 2, 2021 12:20

jmalbos closed this Feb 2, 2021

jmalbos reopened this Feb 2, 2021

jmalbos approved these changes Feb 2, 2021

View reviewed changes

jmalbos self-requested a review February 2, 2021 12:45

mmkekic force-pushed the detsim_v2.0 branch from 6c8e4de to c5dd3a0 Compare February 2, 2021 13:08

gondiaz mentioned this pull request Feb 3, 2021

Detsim city flow #769

Merged

mmkekic added 15 commits February 22, 2021 15:44

Add light_tables_c pxd header for base LightTable class type

5ead2bf

Add base LightTable cython extension type class

71a3d65

All light tables classes should inherit from this one

Add LT_SiPM class for reading PSF type light table

29a9f45

Cryptic lines have explanatory comments

Add LT_PMT class for reading S2 at PMTs light table

f03e0f2

The position of z inside EL gap is hardcoded to the middle of EL. Cryptic lines have explanatory comments Private __extend_lt_bounds method extends light tables to active_radius size

Add light_tables_c module docstring

590f3f2

Add fixture for lighttables tests

8d9ccc1

Add S2 at SiPMs light table test file

ba0046a

Add test module for LightTable classes

b76735c

Tests added: test for LT_SiPM check el_gap and active_r set properly test for LT_SIPM get values test for LT_PMT get_values test for LT_PMT values with extended radius test get_values return 0s for non existing sensors

Add test S2 PMT LightTable gives same result as light_table.py func

9b35e11

Add create_wfs function

005633b

The function loops over ionization electrons and creates waveforms of expected values (the output should be poissonised for real waveforms) Cryptic parts of the code have explanatory comments

Add spreading of waveforms bins in case lt bin larger than sns bin

01d11dd

The spreading is to be used in case of PMTs light simulation since the photon distribution of one electron is assumed to be uniform across EL gap.

Add test module for s2_waveforms_c function

28c8f4d

Tests are: test for correct output shape test for tmin parameter tests for waveforms integrated signal test for time distribution signal on pmts test for time distribution signal on sipms

Reactivate cython checks in light_tables

e7a1a6b

This slows down a code around 10-20%

Reactivate cython checks in s2_waveforms

23021d9

Extend deadline for slow s2_waveforms test

bb5662b

mmkekic force-pushed the detsim_v2.0 branch from c5dd3a0 to bb5662b Compare February 22, 2021 14:45

carmenromo merged commit 09f8760 into next-exp:master Feb 22, 2021

mmkekic deleted the detsim_v2.0 branch February 22, 2021 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cython functions for detsim #756

Cython functions for detsim #756

mmkekic commented Nov 18, 2020

mmkekic commented Dec 13, 2020

jacg commented Dec 14, 2020

mmkekic commented Dec 14, 2020

jacg commented Dec 14, 2020

jacg commented Dec 14, 2020

mmkekic commented Dec 14, 2020

jacg commented Dec 15, 2020

mmkekic commented Dec 15, 2020

jjgomezcadenas commented Dec 16, 2020 via email

jacg commented Dec 16, 2020

jacg commented Dec 16, 2020

mmkekic commented Dec 16, 2020

jacg commented Dec 16, 2020

mmkekic commented Dec 16, 2020

jacg commented Dec 17, 2020 •

edited

Loading

mmkekic commented Dec 17, 2020 •

edited

Loading

jacg commented Dec 17, 2020

jacg commented Dec 17, 2020

jacg commented Dec 17, 2020

mmkekic commented Dec 17, 2020 •

edited

Loading

gonzaponte commented Dec 17, 2020

mmkekic commented Dec 18, 2020

jmalbos commented Feb 2, 2021

mmkekic commented Feb 2, 2021

jmalbos commented Feb 2, 2021

Cython functions for detsim #756

Cython functions for detsim #756

Conversation

mmkekic commented Nov 18, 2020

mmkekic commented Dec 13, 2020

jacg commented Dec 14, 2020

mmkekic commented Dec 14, 2020

jacg commented Dec 14, 2020

jacg commented Dec 14, 2020

mmkekic commented Dec 14, 2020

jacg commented Dec 15, 2020

Input

Output

Calculation

Discussion

Questions

mmkekic commented Dec 15, 2020

jjgomezcadenas commented Dec 16, 2020 via email

jacg commented Dec 16, 2020

jacg commented Dec 16, 2020

mmkekic commented Dec 16, 2020

jacg commented Dec 16, 2020

mmkekic commented Dec 16, 2020

jacg commented Dec 17, 2020 • edited Loading

mmkekic commented Dec 17, 2020 • edited Loading

jacg commented Dec 17, 2020

jacg commented Dec 17, 2020

jacg commented Dec 17, 2020

mmkekic commented Dec 17, 2020 • edited Loading

gonzaponte commented Dec 17, 2020

mmkekic commented Dec 18, 2020

jmalbos commented Feb 2, 2021

mmkekic commented Feb 2, 2021

jmalbos commented Feb 2, 2021

jacg commented Dec 17, 2020 •

edited

Loading

mmkekic commented Dec 17, 2020 •

edited

Loading

mmkekic commented Dec 17, 2020 •

edited

Loading