Bufferization functions #737

andLaing · 2020-08-08T17:30:36Z

Adds functions and utilities used to sort MC sensor data (full optical simulation) into buffers for IC processing.

Addresses point 6 of issue #691

Functions can be seen in context in https://github.com/andLaing/detsim/tree/IC-integration-tests where the position_signal.py script will form the basis for a future IC city.

andLaing · 2020-08-08T17:32:30Z

There's probably some scope for change and certainly we should work out the way that all or part of these functions could be used as the final part of the detsim flow too.

gonzaponte

I haven't taken a look to the tests just yet, but they will probably change a bit with this.

gonzaponte · 2020-08-10T10:57:58Z

invisible_cities/detector_simulation/buffer_functions.py

+from .. reco.peak_functions import                 split_in_peaks
+
+
+@wraps(np.histogram)


I think your intention is to indicate that this is just a special np.histogram call, but wraps would be misleading, as you would expect exactly the same call signature, which is not the case. If this function is only used in wf_binner.bin_data, you can define it in place, if you want.

gonzaponte · 2020-08-10T10:59:05Z

invisible_cities/detector_simulation/buffer_functions.py

+    return np.histogram(data.time, weights=data.charge, bins=bins)[0]
+
+
+def wf_binner(max_buffer: int) -> Callable:


This looks like a city component, rather than a generic function. Maybe the scheme should rather be to define bin_data as a generic function with an extra argument max_buffer and wf_binner a city component that calls it fixing that argument.

gonzaponte · 2020-08-10T11:06:59Z

invisible_cities/detector_simulation/buffer_functions.py

+def signal_finder(buffer_len   : float,
+                  bin_width    : float,
+                  bin_threshold:   int) -> Callable:


Same. This looks like a city component.

gonzaponte · 2020-08-10T11:09:10Z

invisible_cities/detector_simulation/buffer_functions.py

+    stand_off = int(buffer_len * units.mus / bin_width)
+    def find_signal(wfs: pd.Series) -> List[int]:


In this case, find_signal could work taking stand_off and bin_threshold as parameters and signal_finder should calculate stand_off.

gonzaponte · 2020-08-10T11:11:31Z

invisible_cities/detector_simulation/buffer_functions.py

+                    PE threshold for selection
+    """
+
+    stand_off = int(buffer_len * units.mus / bin_width)


Maybe an inline comment explaining what stand_off means in this context? I think I get it, seeing how it is used, but it would help to have a short clarification.

gonzaponte · 2020-08-10T13:26:24Z

invisible_cities/detector_simulation/buffer_functions.py

+        def generate_slices(triggers: List) -> Tuple:
+
+            for trg in triggers:


Seen how you use it, I think it would be more useful to define it for a single trigger and move the loop outside.

invisible_cities/detector_simulation/buffer_functions.py

gonzaponte · 2020-08-10T13:43:25Z

invisible_cities/detector_simulation/buffer_functions.py

+        sipm_q = np.empty((0,0))\
+          if sipm_charge.empty else np.array(sipm_charge.tolist())
+        slice_and_pad = slice_generator(pmt_bins                     ,
+                                        np.array(pmt_charge.tolist()),


pmt_charge.values?

That was my first thought but .values on the Series with lists as values (they're histograms for each sensor) gives and array of arrays and not a 2D np.ndarray which is what we want so we can use slicing to get everything into buffers.

gonzaponte · 2020-08-10T13:47:30Z

invisible_cities/detector_simulation/sensor_utils.py

+        pmt_ord  = pmt_ids [ pmt_ids.isin( pmt_resp.index.tolist())].index
+        sipm_ord = sipm_ids[sipm_ids.isin(sipm_resp.index.tolist())].index


is tolist() necessary?

No, I must have forgotten to check after changing something. I'll sort it.

gonzaponte · 2020-08-10T13:54:00Z

invisible_cities/detector_simulation/sensor_utils.py

+    return order_and_pad
+
+
+def get_no_sensors(detector_db: str, run_number: int) -> Tuple:


May I suggest changing no to n or number_of?

andLaing · 2020-09-02T13:08:53Z

I think I addressed the initial comments in the last set of pushes. Whenever the reviewers have some time, the PR can be checked again.

gonzaponte

Overall I get the feeling that we should have named outputs in several places. Specifically, I am very tempted to just declare type synonyms like

ChargeArray = np.ndarray # 1D
WFset       = np.ndarray # 2D
Waveforms   = pd.Series  # 2D

(names not optimal). I think this would probably increase the readability and understanding of what data types are going through each function because it's not always obvious.

This PR looks very good. Although there are lots of comments, most of them are minor.

gonzaponte · 2020-09-06T21:12:55Z

invisible_cities/cities/components.py

+    buffer_len    : float
+                    Configured buffer length in mus
+    bin_width     : float
+                    Sampling width for sensors
+    bin_threshold : int
+                    PE threshold for selection
+    """
+    # The stand_off is the minumum number of samples
+    # necessary between candidate triggers.
+    stand_off = int(buffer_len * units.mus / bin_width)


We are lacking the city here, but I suppose buffer_len will be read from config file and if so, will it not come already with units? In general, units should only be applied when inputting data to the program (either by hardcoding parameters or by reading them from somewhere where they are assumed to be in a specific unit). (Most) functions should assume everything comes in "default" units.

gonzaponte · 2020-09-06T21:27:54Z

invisible_cities/detector_simulation/buffer_functions.py

+    bins        = np.arange(min_bin, max_bin, bin_width)
+    b_pad       = bins[-1] + bin_width
+    bin_sensors = sensors.groupby('sensor_id').apply(weighted_histogram    ,
+                                                     np.append(bins, b_pad))
+    return bins, bin_sensors


Maybe there is an edge case where this doesn't work (?) but, isn't it simpler:

Suggested change

bins = np.arange(min_bin, max_bin, bin_width)

b_pad = bins[-1] + bin_width

bin_sensors = sensors.groupby('sensor_id').apply(weighted_histogram ,

np.append(bins, b_pad))

return bins, bin_sensors

bins = np.arange(min_bin, max_bin + bin_width, bin_width)

bin_sensors = sensors.groupby('sensor_id').apply(weighted_histogram, bins)

return bins[:-1], bin_sensors

Also, is it intended to not return the last upper bin edge?

invisible_cities/detector_simulation/buffer_functions_test.py

gonzaponte · 2020-09-08T12:46:41Z

invisible_cities/detector_simulation/buffer_functions.py

+                bin_width : float    ,
+                t_min     : float    ,
+                t_max     : float    ,
+                max_buffer: int      ) -> Tuple:


Probably the output annotation should be Tuple[np.ndarray, pd.Series]. Also in the corresponding city component.

gonzaponte · 2020-09-08T12:47:52Z

invisible_cities/detector_simulation/buffer_functions.py

+    return np.histogram(data.time, weights=data.charge, bins=bins)[0]
+
+
+def bin_sensors(sensors   : pd.Series,


sensors' type hint should be pd.DataFrame, right? If so, check also the city component.

gonzaponte · 2020-09-08T15:58:04Z

invisible_cities/detector_simulation/sensor_utils_test.py

+    nsamp_pmt   = 5000
+    nsamp_sipm  =    5
+
+    sensor_order = order_sensors(detector_db, run_number,


swap names. order_sensors = sensor_order(er?)(...)

gonzaponte · 2020-09-08T15:59:36Z

invisible_cities/detector_simulation/sensor_utils_test.py

+    pmt_resp   = pd.Series([[1]*nsamp_pmt ]*len(id_dict[ 'pmt_ids']),
+                          index = id_dict[ 'pmt_ids'])
+    sipm_resp  = pd.Series([[1]*nsamp_sipm]*len(id_dict['sipm_ids']),
+                          index = id_dict['sipm_ids'])
+    pmt_q      = np.array( pmt_resp.tolist())
+    sipm_q     = np.array(sipm_resp.tolist())


Can this be defined in the same fixture id_dict (changing the name accordingly)?

gonzaponte · 2020-09-08T16:02:18Z

invisible_cities/detector_simulation/sensor_utils_test.py

+    assert pmt_out.shape == (n_pmt, nsamp_pmt)
+    pmt_nonzero = np.argwhere(pmt_out.sum(axis=1) != 0)
+    assert np.all(pmt_nonzero.flatten() == id_dict['pmt_ord'])
+    assert sipm_out.shape == (n_sipm, nsamp_sipm)
+    sipm_nonzero = np.argwhere(sipm_out.sum(axis=1) != 0)
+    assert np.all(sipm_nonzero.flatten() == id_dict['sipm_ord'])


Suggested change

assert pmt_out.shape == (n_pmt, nsamp_pmt)

pmt_nonzero = np.argwhere(pmt_out.sum(axis=1) != 0)

assert np.all(pmt_nonzero.flatten() == id_dict['pmt_ord'])

assert sipm_out.shape == (n_sipm, nsamp_sipm)

sipm_nonzero = np.argwhere(sipm_out.sum(axis=1) != 0)

assert np.all(sipm_nonzero.flatten() == id_dict['sipm_ord'])

assert pmt_out.shape == (n_pmt , nsamp_pmt)

assert sipm_out.shape == (n_sipm, nsamp_sipm)

pmt_nonzero = np.argwhere( pmt_out.sum(axis=1) != 0)

sipm_nonzero = np.argwhere(sipm_out.sum(axis=1) != 0)

assert np.all( pmt_nonzero.flatten() == id_dict[ 'pmt_ord'])

assert np.all(sipm_nonzero.flatten() == id_dict['sipm_ord'])

A bit os cosmetics for readability.

gonzaponte · 2020-09-08T16:02:51Z

invisible_cities/detector_simulation/sensor_utils_test.py

+def test_get_n_sensors(pmt_ids, sipm_ids):
+    npmt, nsipm = get_n_sensors('new', 6400)
+
+    assert npmt  == len( pmt_ids)
+    assert nsipm == len(sipm_ids)


parametrize to test positive and negative run numbers

I'm not sure what we gain by the parametrizing the test. Positive and negative run numbers are essentially identical as load_db uses the absolute value. The +/- difference is only used to invoke the copy of mc tables in the cities

Right. The idea was to make sure it worked in both cases. Since this code is going to be used in MC, you should use a negative run number, right?

It seems irrelevant but I can make it negative. As I mentioned above the load_db uses absolute value so the difference should be tested there.

gonzaponte · 2020-09-08T16:07:20Z

invisible_cities/detector_simulation/sensor_utils_test.py

+    file_in = os.path.join(ICDATADIR                                         ,
+                           'Kr83_full_nexus_v5_03_01_ACTIVE_7bar_1evt.sim.h5')


This filename is defined in the fixture mc_waveforms. It should probably be defined in an independent fixture to be reused here. Actually I've just seen that the same file is also used in other modules, so it should probably go in the global conftest.py file and be fixed later in the other modules.

gonzaponte

Just some very minor changes left. Also, please run pyflakes on the modified files to look for unused variables or imports.

invisible_cities/detector_simulation/buffer_functions_test.py

invisible_cities/detector_simulation/sensor_utils_test.py

invisible_cities/detector_simulation/buffer_functions.py

invisible_cities/detector_simulation/sensor_utils.py

invisible_cities/detector_simulation/sensor_utils_test.py

gonzaponte · 2020-09-15T13:37:57Z

Neat.

gonzaponte

A complete set of functions and components to split events into buffers and create waveforms in each of them. It reuses existing code as much as needed and tests every function implemented. All functions have some documentation. Very nice job.

jacg · 2021-02-27T10:03:02Z

invisible_cities/detsim/buffer_functions.py

+## !! to-do: clarify for non-pmt versions of next
+## !! to-do: Check on integral instead of only threshold?


Many development tools recognize TODO (exactly four characters, all uppercase, no spaces, no dashes) and offer assistance with finding, managing, and warning about stuff that has been marked to be done, but this relies on it being written exactly like this: TODO.

So let us all, please, develop the habit of marking things to be done exactly thus: TODO.

andLaing requested a review from gonzaponte August 8, 2020 17:30

gonzaponte reviewed Aug 10, 2020

View reviewed changes

andLaing mentioned this pull request Sep 8, 2020

Detector simulation #691

Closed

gonzaponte requested changes Sep 8, 2020

View reviewed changes

gonzaponte reviewed Sep 14, 2020

View reviewed changes

andLaing added 9 commits September 15, 2020 15:08

Add detsim folder and __init__.py file

0c8bc91

Add detsim.buffer_functions

0b756b0

Add full sim file as fixture to general conftest.py

6824a35

Add new test file binned_simwfs.h5

b1ba611

Add detsim.conftest.py for detsim specific fixtures

e26e5f2

Add tests for detsim.buffer_functions

9dbb003

Add detsim.sensor_utils.py

a86a75b

Add tests for detsim.sensor_utils

2ccb8dc

Add components for MC waveform binning and trigger search

6bc7363

andLaing force-pushed the buffer_functions branch from fb33d8e to 6bc7363 Compare September 15, 2020 13:26

gonzaponte approved these changes Sep 15, 2020

View reviewed changes

carmenromo merged commit 6b9b047 into next-exp:master Sep 16, 2020

andLaing deleted the buffer_functions branch September 16, 2020 15:35

jacg reviewed Feb 27, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bufferization functions #737

Bufferization functions #737

andLaing commented Aug 8, 2020

andLaing commented Aug 8, 2020

gonzaponte left a comment

gonzaponte Aug 10, 2020

gonzaponte Aug 10, 2020

gonzaponte Aug 10, 2020

gonzaponte Aug 10, 2020

gonzaponte Aug 10, 2020

gonzaponte Aug 10, 2020

gonzaponte Aug 10, 2020

andLaing Aug 10, 2020

gonzaponte Aug 10, 2020

andLaing Aug 10, 2020

gonzaponte Aug 10, 2020

andLaing commented Sep 2, 2020

gonzaponte left a comment

gonzaponte Sep 6, 2020

gonzaponte Sep 6, 2020

gonzaponte Sep 8, 2020

gonzaponte Sep 8, 2020

gonzaponte Sep 8, 2020

gonzaponte Sep 8, 2020

gonzaponte Sep 8, 2020

gonzaponte Sep 8, 2020

andLaing Sep 9, 2020

gonzaponte Sep 9, 2020

andLaing Sep 9, 2020

gonzaponte Sep 8, 2020

gonzaponte left a comment

gonzaponte commented Sep 15, 2020

gonzaponte left a comment

jacg Feb 27, 2021

		from .. reco.peak_functions import split_in_peaks


		@wraps(np.histogram)

		return np.histogram(data.time, weights=data.charge, bins=bins)[0]


		def wf_binner(max_buffer: int) -> Callable:

		stand_off = int(buffer_len * units.mus / bin_width)
		def find_signal(wfs: pd.Series) -> List[int]:

		def generate_slices(triggers: List) -> Tuple:

		for trg in triggers:

		pmt_ord = pmt_ids [ pmt_ids.isin( pmt_resp.index.tolist())].index
		sipm_ord = sipm_ids[sipm_ids.isin(sipm_resp.index.tolist())].index

		return order_and_pad


		def get_no_sensors(detector_db: str, run_number: int) -> Tuple:

		return np.histogram(data.time, weights=data.charge, bins=bins)[0]


		def bin_sensors(sensors : pd.Series,

		file_in = os.path.join(ICDATADIR ,
		'Kr83_full_nexus_v5_03_01_ACTIVE_7bar_1evt.sim.h5')

		## !! to-do: clarify for non-pmt versions of next
		## !! to-do: Check on integral instead of only threshold?

Bufferization functions #737

Bufferization functions #737

Conversation

andLaing commented Aug 8, 2020

andLaing commented Aug 8, 2020

gonzaponte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andLaing commented Sep 2, 2020

gonzaponte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gonzaponte left a comment

Choose a reason for hiding this comment

gonzaponte commented Sep 15, 2020

gonzaponte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment