Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython functions for detsim #756

Merged
merged 15 commits into from
Feb 22, 2021
Merged

Cython functions for detsim #756

merged 15 commits into from
Feb 22, 2021

Conversation

mmkekic
Copy link
Collaborator

@mmkekic mmkekic commented Nov 18, 2020

This PR adds cython classes for light tables on sipms and pmts for S2 signal, and a loop over ionization electrons producing a waveform-shaped matrix of expected photon values.

@mmkekic mmkekic requested a review from jmalbos November 18, 2020 18:29
@mmkekic mmkekic changed the title cython functions for detsim Cython functions for detsim Nov 18, 2020
@mmkekic
Copy link
Collaborator Author

mmkekic commented Dec 13, 2020

I have rebased on accepted #757 PR since they share some common functions and test files

@jacg
Copy link
Collaborator

jacg commented Dec 14, 2020

Are there any benchmarks that justify the use of Cython?

@mmkekic
Copy link
Collaborator Author

mmkekic commented Dec 14, 2020

The problem is that there are no reasonable alternatives that solve the exact problem : having a matrix of
(number_of_electrons (~100 000) X number_of_time_slices (10) ) X number_of_sensors (~3000) to then be rebinned into
some_arbitrary_binning X number_of_sensors. Placing the original matrix in memory is impossible without the approximations (naive estimate is > 20GB). @gondiaz tried to follow this approach with some approximations managing ~5s per event and <1.5 GB in https://github.com/gondiaz/IC/tree/detsim_v0. However, this approach lowers our desired resolution and does not scale well with the type of event.
Looping over numpy array is never a good idea but that was also tried in https://github.com/gondiaz/IC/tree/detsim_loop_based, however again adding some approximations. The memory is <0.7 GB and time is ~20s per event. Without the unjustified approximation the time was measured in minutes (we needed to use slow numpy.add.at function).
The cython one keeps the same memory consumption (is basically the memory of input and output arrays) and is ~6s per event, without using any approximation. However it is a bit ugly, in order to make it modular and flexible but still fast I had to use a c pointer directly, other approaches implied slicing of numpy memory view that was ~2x slower. Some cython trials are in https://github.com/mmkekic/detsim_checks repository (Warning: the code there is really just a messy draft).

Are there any benchmarks that justify the use of Cython?

@jacg
Copy link
Collaborator

jacg commented Dec 14, 2020

Is there somewhere I can read a high-level description of the computational task?

@jacg
Copy link
Collaborator

jacg commented Dec 14, 2020

Is there somewhere I can read a high-level description of the computational task?

Put another way: can you describe

  1. the structure and size of the data that go into the process

  2. the structure and size of the data that come out

  3. The relationship between 1. and 2.

@mmkekic
Copy link
Collaborator Author

mmkekic commented Dec 14, 2020

Is there somewhere I can read a high-level description of the computational task?

I am afraid not, but I tried to prepare a document that explains the process. If there are unclear parts please ask.
s2_description.pdf
Hope it can be useful to @jmalbos as well.

@jacg
Copy link
Collaborator

jacg commented Dec 15, 2020

My current understanding is something like this:

Input

  1. es: Electrons ✕ 1.5 ✕ 105

    x y t nph
    f64 f64 f64 u64

    Total space: 4 ✕ 8 ✕ 1.5 ✕ 105 -> 5MB

    Important question: are these already ordered on t?

  2. LT(z, r) -> p: Light table

    AFAICT, this is a function: (f64, f64) -> f64 implemented as a lookup table
    with 10 different z-slices, and an unspecified number of r-slices, which
    therefore occupies

    f64 ✕ 10 (zs) ✕ ? (rs) -> 1MB / 12 r-slices

Output

f64number of sensorsnumber of time bins

  1. SiPM responses

    f64 ✕ 4000 ✕ 2000 -> 64 MB

  2. PMT responses

    f64 ✕ 60 ✕ 200000 -> 96 MB

(You also mention applying a Poisson distribution, but it seems that it is not
part of this calculation.)

Calculation

Very roughly, for SiPMs

dv = drift velocity
zs = z-positions of 10 uniformly distributed slices of EL plane
for each electron
    for each sensor
        r = electron-sensor distance in x-y plane
        ps = LT(r, zs) # 10 different probabilities
        for p,z in ps,zs
            time_bin = bin(t + z / dv)
            response_contribution = p * nph / 10 
            add response_contribution to appropriate (sensor, time_bin)

For PMTs it's almost the same, just that the z-distribution is collapsed to a
single point, for the purpose of the light table probabilities.

Discussion

In your earlier comment you allude to memory being a problem because there are
105 electrons ✕ 10 time slices ✕ 3000 sensors.

I assume (trust?, hope!) that the detector spits out results in chronological
order. If this is the case, you only ever need to hold 10 time-slices
(corresponding to the EL-layer z-slices) in memory: as soon as an electron
appears whose t lies after the end of your earliest in-memory time-slice, that
time slice can be yielded, and a fresh one added to your in-memory time-slice
buffer.

You also mentioned arbitrary binningnumber of sensors in the context of
excessive memory use. Again, if the input is time-ordered, then your first
binning pass can produce a stream of time-ordered time slices, which do not need
to be kept in memory simultaneously. A second pass (or a second component just
downstream, in the same pass) can then regroup these into arbitrarily binned
slices.

My original question was about benchmarks which justify the use of Cython. That
means there should be a pure python implementation and an equivalent Cython
implementation, and their run-times (or memory usage) should be compared. I
don't see how the discussion of memory is pertinent to this question: Cython
doesn't magically make numpy arrays occupy less memory.

Fundamentally, you have to show that your Cython solution is significantly more
efficient than your equivalent pure Python solution. Without that[*], you
shouldn't take on the burden of replacing Python with a less flexible language.

[*] Or some other compelling advantage. For example, it might be reasonable to
rewrite Python code in Rust, even if the Rust version offers no significant
speed improvement, because of the safety guarantees and lower long-term cost of
development and maintenance of code written in Rust.

Cython offers very little advantage over Python, beyond speed of execution, but
it is significantly more painful to work in. So you really should make sure that
it is significantly faster, before choosing Cython over pure Python.

Questions

  1. Have you tried streaming your data, rather than keeping everything in
    memory at once?

  2. What do you do with these values once they have been generated? (Write to
    file? Use immediately? If the latter, how?)

  3. What is an acceptable run time, per event?

  4. What benefits does Cython bring to this problem, that are not available in
    pure Python?

@mmkekic
Copy link
Collaborator Author

mmkekic commented Dec 15, 2020

Ah, ok, I think I was not clear enough.

Looping over numpy is guaranteed to be slower than cython loop over numpy array but using fast (c-implemented) numpy operations on vectors is usually not. Hence the main goal would be to solve the problem using numpy operations on arrays directly, as was our first approach. This required putting number_of_electrons x number_of_partitions x number_of_sensors in memory and doing multiplications and histogrammings, and this is memory consuming (20+ Gb).
The actual input/output array sizes are not a problem for now (I agree that having all this array in memory is not optimal and could be a problem in the future, but this would need to be fixed in other parts of IC as well, for example irene/diomira where the full array is read and manipulated).

Second approach was to do a loop over electrons and use numpy vector manipulations inside the first loop. This was order of 10 minutes . Doing all three loops explicitely is slower than that.

Then we tried to use cython.

I tried several other approaches, the most promising one was to first group electrons in voxels determined by the psf file (ie distance and EL time partitions), and then loop over sensors and do the calculations using numpy fast bincount. This was around 1.2 minutes per event.

The conclusion: we didnt manage to think of any pure python approach that can remotely come close to cython speed without the excessive memory usage.

@jjgomezcadenas
Copy link
Collaborator

jjgomezcadenas commented Dec 16, 2020 via email

@jacg
Copy link
Collaborator

jacg commented Dec 16, 2020

By all means accept this PR, and get on with your lives. This is the best thing
you can do in the short term. [You should probably stop reading this now, and
ignore what follows.]

However, as you know, I take the long-term view: Investing in some careful
thought now, pays handsome dividends in the future. Avoid technical debt.

In general, one shouldn't use Cython unless it solves a problem that cannot
adequately be solved in pure Python. On the evidence I have seen and understood
so far, I remain unconvinced that this is the case here.

Some technical details:

  • The only argument I have seen that sounds vaguely convincing, is that a
    hand-written loop over a numpy array is completely unviable in pure Python,
    and rewriting this loop in Cython makes it acceptable.

  • I suspect that the overall structure of the algorithm and the data structures
    used are misguided, which leads to the illusion chucking Cython at it, is a
    good solution. I wouldn't be surprised if completely rethinking the algorithm
    leads to a pure Python solution which significantly outperforms the proposed
    Cython solution. However, I am not (yet) in a position make a pronouncement on
    this.

(As for Rust, it offers many, many advantages which Cython does not, but it
also brings greater costs than Cython.)

@jacg
Copy link
Collaborator

jacg commented Dec 16, 2020

Are there data somewhere in the IC repo that can be fed in to these functions in order to make a benchmark?

@mmkekic
Copy link
Collaborator Author

mmkekic commented Dec 16, 2020

No, the data are too large. In abovementioned repository https://github.com/mmkekic/detsim_checks, in README there is a dropbox link to download the data that should be run through the full city, as well as a full light tables.

The branch that has a working version of a full city is https://github.com/mmkekic/IC/tree/detsim_unification, hence to make it work you would need to checkout the branch, compile cython and run standard
city detsim detsim.conf , where detsim.conf file is also provided in the https://github.com/mmkekic/detsim_checks repository (or use run_detsim.py script that I used to run memory profiler). Note that here you run a full city flow and not just the desired function.

The other option is to give random x, y, time, number_of_photons arrays (of a size ~110k) but this might not describe the reality well if you plan to try to use some grouping or voxelization.

@jacg
Copy link
Collaborator

jacg commented Dec 16, 2020

Why don't you add

  1. the code in https://github.com/mmkekic/detsim_checks

  2. the files in the Dropbox you mention

to the IC detsim_unification branch?

The first can be placed in an arbitrary convenient subdirectory of IC, the
second can be added via Git LFS. Both can be removed from the branch before the
branch is merged into master (if the branch is supposed to be merged).

As it stands, with everything scattered all over the place (across different
repositories and file-sharing services) the activation energy to get going with
this is unhelpfully huge, and the maintainability and feeding back of others'
contributions will also suffer greatly.

(Hard-wired absolute paths to one developer's local filesystem, don't help
either!)

@mmkekic
Copy link
Collaborator Author

mmkekic commented Dec 16, 2020

I added the files to the testfiles folder and detsim_benchmark folder has the corresponding config and run_detsim.py script, hence you need to checkout the branch detsim_unification and run the script (and so the typical setup IC + compile cython)
I havent added other trials from my scratch repository cause they were mostly used to test different implementations, all involving cython.
If you have issues in setting this up, or further questions about the flow please ask.

( I don't think it is a good practice to add unnecesarry files to our LFS server but I believe in this case is justified :) )

@jacg
Copy link
Collaborator

jacg commented Dec 17, 2020

Thanks. This should make it much easier to contribute.

I have some comments up-front:

I don't think it is a good practice to add unnecesarry files to our LFS server

  • Benchmarks are necessary if you are contemplating adding Cython to the
    codebase (or indeed, doing any type of optimization).

  • These files are necessary for the benchmarks to run.

  • Ergo, these files are necessary and therefore should be added to the
    repository.

  • These files are too big to reasonably add as normal git files.

  • Ergo, these files must live on the LFS server.

It is incorrect to classify these files as unnecessary. It makes about as much
sense as saying that we shouldn't commit tests, because they're not necessary.
Benchmarks, like tests, give vital and independently-verifiable information
about the behaviour of the code. They are absolutely necessary.

Consequently, the commit title

Add files for detsim benchmarks (DO NOT MERGE THIS COMMIT TO MASTER)

is wrong. This commit must be merged into master (if the code that it helps to
benchmark gets merged). The commit should be reverted before merging: these
files shouldn't pollute our working trees in day-to-day work, but we should be
able to retrieve them from history, so that we can run the benchmarks easily at
any time we need to, in the future.

@mmkekic
Copy link
Collaborator Author

mmkekic commented Dec 17, 2020

I agree partially, but the problem (as usual) is that we are doing development without a clear idea of what future flexibility we need. For example, the light tables format and dependencies can change, hence the light tables reading should be flexible (actually the format of the table did change during the development so we would need to commit them 2x if we uploaded the files to the server from the start). The event type can change, there could be events with more hits (ie electrons) so whatever implementation should be at most linear with the number of electrons (both in time and memory consumption).
Basically I wouldn't take these files as an ultimate benchmark test, rather as a guidance for the current implementation since we need this code asap...
(I also realized these files are ~100 MB so I really shouldn't make such a fuss about storing them on our server)

@jacg
Copy link
Collaborator

jacg commented Dec 17, 2020

I'm confused: Should I be able to find the function called create_wfs in this PR, anywhere in the detsim_unification branch?

@jacg
Copy link
Collaborator

jacg commented Dec 17, 2020

I'm confused: Should I be able to find the function called create_wfs in this PR, anywhere in the detsim_unification branch?

OK, so it's approximately electron_loop.pyx/electron_loop.

@jacg
Copy link
Collaborator

jacg commented Dec 17, 2020

What is the relationship between the detsim_v2.0 branch (this PR) and detsim_unification? Is it that the former aims to merge a subset of the work done in the latter, with the rest of the latter following in a later PR?

It's really annoying to be faced with two versions of the code which are similar-but-different. To review this PR it must be possible to run benchmarks for it. Having to switch to another branch and run a different version of the code in order to run the benchmarks, is not really an reasonable way of working!

@mmkekic
Copy link
Collaborator Author

mmkekic commented Dec 17, 2020

I agree! @gonzaponte was against the idea of putting a full city in one PR so we divided it in a bunch of small ones. However they are not independent and were developing simultaniously so we ended up with a spaghetti code with merge conflicts.
detsim_unification branch was a quick fix of the merge conflicts providing a working version but is not the final one since bunch of cosmetics changes were made in the 'official' branches. I am not sure what is the best way to proceed.
The PRs needed for the full detsim city are #694, #695, #757 that should all be merged before this one. And the final city flow PR will depend on all of them. The main issue is that we started developing before having a clear algorithm and the correlations between modules kept changing (@gondiaz that started implementing the detsim has 14 detsim branches, this gives you the idea of the mess we have).

If it will make things easier I (or @gondiaz since he is supposed to open a final city body PR once all the rests are approved) can do (yet another!) branch that should resemble more a final version...

@gonzaponte
Copy link
Collaborator

Originally the work was divided into different tasks that were (in principle) independent of each other. I am not really sure how we got here, but I guess there were not so independent after all. My involvement has been discontinuous and perhaps counterproductive since I couldn't really follow its evolution.

@mmkekic
Copy link
Collaborator Author

mmkekic commented Dec 18, 2020

@jacg , I have rebased detsim_unification branch on this PR (and other PRs). You would need to do a hard reset since there were conflicts.
What is the status of this? Will anyone actually try some better approach or shell we proceed with the review accepting cython?

@jmalbos
Copy link
Collaborator

jmalbos commented Feb 2, 2021

I think this is in much better shape now, but I'd suggest some commit squashing before merging.

@mmkekic
Copy link
Collaborator Author

mmkekic commented Feb 2, 2021

Commits are squashed and the branch is rebased on #757

@jmalbos
Copy link
Collaborator

jmalbos commented Feb 2, 2021

I approve this PR.

@jmalbos jmalbos closed this Feb 2, 2021
@jmalbos jmalbos reopened this Feb 2, 2021
@jmalbos jmalbos self-requested a review February 2, 2021 12:45
@gondiaz gondiaz mentioned this pull request Feb 3, 2021
carmenromo pushed a commit that referenced this pull request Feb 22, 2021
#757

[author: gondiaz]

This PR contains the detsim functions that simulate the S1 signal from the total
number of photons using the S1 LightTable. It is a single module `simulate_S1.py`
and its corresponding test module. Also, the S1 LightTable test file is added.

This PR, along with PR #756, overrides PRs #698, #699 and #734.

[reviewer: mmkekic]

This PR adds functionalities for simulation of the S1 scintillation light needed
for detsim city. The code is clear and tested, good job!
All light tables classes should inherit from this one
Cryptic lines have explanatory comments
The position of z inside EL gap is hardcoded to the middle of EL.
Cryptic lines have explanatory comments
Private __extend_lt_bounds method extends light tables to
active_radius size
Tests added:
test for LT_SiPM check el_gap and active_r set properly
test for LT_SIPM get values
test for LT_PMT get_values
test for LT_PMT values with extended radius
test get_values return 0s for non existing sensors
The function loops over ionization electrons and creates waveforms of
expected values (the output should be poissonised for real waveforms)

Cryptic parts of the code have explanatory comments
The spreading is to be used in case of PMTs light simulation since the
photon distribution of one electron is assumed to be uniform across EL gap.
Tests are:
test for correct output shape
test for tmin parameter
tests for waveforms integrated signal
test for time distribution signal on pmts
test for time distribution signal on sipms
This slows down a code around 10-20%
@carmenromo carmenromo merged commit 09f8760 into next-exp:master Feb 22, 2021
carmenromo pushed a commit that referenced this pull request Feb 22, 2021
#769

[author: gondiaz]

This is the last of the series of PRs #757 #756 #695 #694.

It contains detsim city flow module that performs a fast detector simulation.

From MC hits, it simulates inonization electron creation, drift and diffusion.
Then it simulates the photon propagation based on previously computed
light-tables, which depend on detector geometry. The final output are the signal
waveforms at sensors (without electronic simulation).

[reviewer: mmkekic]

This PR is the last piece of the long lasting effort to implement fast detector
light simulation in IC. The PR introduces the final city flow that is using
already approved functionalities. Code is readable and appropriate tests are
added. Good job!
@mmkekic mmkekic deleted the detsim_v2.0 branch February 22, 2021 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants