Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bufferization functions #737

Merged
merged 9 commits into from
Sep 16, 2020
Merged

Conversation

andLaing
Copy link
Collaborator

@andLaing andLaing commented Aug 8, 2020

Adds functions and utilities used to sort MC sensor data (full optical simulation) into buffers for IC processing.

Addresses point 6 of issue #691

Functions can be seen in context in https://github.com/andLaing/detsim/tree/IC-integration-tests where the position_signal.py script will form the basis for a future IC city.

@andLaing andLaing requested a review from gonzaponte August 8, 2020 17:30
@andLaing
Copy link
Collaborator Author

andLaing commented Aug 8, 2020

There's probably some scope for change and certainly we should work out the way that all or part of these functions could be used as the final part of the detsim flow too.

Copy link
Collaborator

@gonzaponte gonzaponte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't taken a look to the tests just yet, but they will probably change a bit with this.

from .. reco.peak_functions import split_in_peaks


@wraps(np.histogram)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your intention is to indicate that this is just a special np.histogram call, but wraps would be misleading, as you would expect exactly the same call signature, which is not the case. If this function is only used in wf_binner.bin_data, you can define it in place, if you want.

return np.histogram(data.time, weights=data.charge, bins=bins)[0]


def wf_binner(max_buffer: int) -> Callable:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a city component, rather than a generic function. Maybe the scheme should rather be to define bin_data as a generic function with an extra argument max_buffer and wf_binner a city component that calls it fixing that argument.

Comment on lines 63 to 65
def signal_finder(buffer_len : float,
bin_width : float,
bin_threshold: int) -> Callable:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. This looks like a city component.

Comment on lines 83 to 84
stand_off = int(buffer_len * units.mus / bin_width)
def find_signal(wfs: pd.Series) -> List[int]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, find_signal could work taking stand_off and bin_threshold as parameters and signal_finder should calculate stand_off.

PE threshold for selection
"""

stand_off = int(buffer_len * units.mus / bin_width)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an inline comment explaining what stand_off means in this context? I think I get it, seeing how it is used, but it would help to have a short clarification.

Comment on lines 155 to 157
def generate_slices(triggers: List) -> Tuple:

for trg in triggers:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seen how you use it, I think it would be more useful to define it for a single trigger and move the loop outside.

sipm_q = np.empty((0,0))\
if sipm_charge.empty else np.array(sipm_charge.tolist())
slice_and_pad = slice_generator(pmt_bins ,
np.array(pmt_charge.tolist()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pmt_charge.values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first thought but .values on the Series with lists as values (they're histograms for each sensor) gives and array of arrays and not a 2D np.ndarray which is what we want so we can use slicing to get everything into buffers.

Comment on lines 74 to 75
pmt_ord = pmt_ids [ pmt_ids.isin( pmt_resp.index.tolist())].index
sipm_ord = sipm_ids[sipm_ids.isin(sipm_resp.index.tolist())].index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is tolist() necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I must have forgotten to check after changing something. I'll sort it.

return order_and_pad


def get_no_sensors(detector_db: str, run_number: int) -> Tuple:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest changing no to n or number_of?

@andLaing
Copy link
Collaborator Author

andLaing commented Sep 2, 2020

I think I addressed the initial comments in the last set of pushes. Whenever the reviewers have some time, the PR can be checked again.

@andLaing andLaing mentioned this pull request Sep 8, 2020
Copy link
Collaborator

@gonzaponte gonzaponte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I get the feeling that we should have named outputs in several places. Specifically, I am very tempted to just declare type synonyms like

ChargeArray = np.ndarray # 1D
WFset       = np.ndarray # 2D
Waveforms   = pd.Series  # 2D

(names not optimal). I think this would probably increase the readability and understanding of what data types are going through each function because it's not always obvious.

This PR looks very good. Although there are lots of comments, most of them are minor.

Comment on lines 243 to 252
buffer_len : float
Configured buffer length in mus
bin_width : float
Sampling width for sensors
bin_threshold : int
PE threshold for selection
"""
# The stand_off is the minumum number of samples
# necessary between candidate triggers.
stand_off = int(buffer_len * units.mus / bin_width)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are lacking the city here, but I suppose buffer_len will be read from config file and if so, will it not come already with units? In general, units should only be applied when inputting data to the program (either by hardcoding parameters or by reading them from somewhere where they are assumed to be in a specific unit). (Most) functions should assume everything comes in "default" units.

Comment on lines 39 to 43
bins = np.arange(min_bin, max_bin, bin_width)
b_pad = bins[-1] + bin_width
bin_sensors = sensors.groupby('sensor_id').apply(weighted_histogram ,
np.append(bins, b_pad))
return bins, bin_sensors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is an edge case where this doesn't work (?) but, isn't it simpler:

Suggested change
bins = np.arange(min_bin, max_bin, bin_width)
b_pad = bins[-1] + bin_width
bin_sensors = sensors.groupby('sensor_id').apply(weighted_histogram ,
np.append(bins, b_pad))
return bins, bin_sensors
bins = np.arange(min_bin, max_bin + bin_width, bin_width)
bin_sensors = sensors.groupby('sensor_id').apply(weighted_histogram, bins)
return bins[:-1], bin_sensors

Also, is it intended to not return the last upper bin edge?

bin_width : float ,
t_min : float ,
t_max : float ,
max_buffer: int ) -> Tuple:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the output annotation should be Tuple[np.ndarray, pd.Series]. Also in the corresponding city component.

return np.histogram(data.time, weights=data.charge, bins=bins)[0]


def bin_sensors(sensors : pd.Series,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sensors' type hint should be pd.DataFrame, right? If so, check also the city component.

nsamp_pmt = 5000
nsamp_sipm = 5

sensor_order = order_sensors(detector_db, run_number,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swap names. order_sensors = sensor_order(er?)(...)

Comment on lines 60 to 65
pmt_resp = pd.Series([[1]*nsamp_pmt ]*len(id_dict[ 'pmt_ids']),
index = id_dict[ 'pmt_ids'])
sipm_resp = pd.Series([[1]*nsamp_sipm]*len(id_dict['sipm_ids']),
index = id_dict['sipm_ids'])
pmt_q = np.array( pmt_resp.tolist())
sipm_q = np.array(sipm_resp.tolist())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be defined in the same fixture id_dict (changing the name accordingly)?

Comment on lines 70 to 75
assert pmt_out.shape == (n_pmt, nsamp_pmt)
pmt_nonzero = np.argwhere(pmt_out.sum(axis=1) != 0)
assert np.all(pmt_nonzero.flatten() == id_dict['pmt_ord'])
assert sipm_out.shape == (n_sipm, nsamp_sipm)
sipm_nonzero = np.argwhere(sipm_out.sum(axis=1) != 0)
assert np.all(sipm_nonzero.flatten() == id_dict['sipm_ord'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert pmt_out.shape == (n_pmt, nsamp_pmt)
pmt_nonzero = np.argwhere(pmt_out.sum(axis=1) != 0)
assert np.all(pmt_nonzero.flatten() == id_dict['pmt_ord'])
assert sipm_out.shape == (n_sipm, nsamp_sipm)
sipm_nonzero = np.argwhere(sipm_out.sum(axis=1) != 0)
assert np.all(sipm_nonzero.flatten() == id_dict['sipm_ord'])
assert pmt_out.shape == (n_pmt , nsamp_pmt)
assert sipm_out.shape == (n_sipm, nsamp_sipm)
pmt_nonzero = np.argwhere( pmt_out.sum(axis=1) != 0)
sipm_nonzero = np.argwhere(sipm_out.sum(axis=1) != 0)
assert np.all( pmt_nonzero.flatten() == id_dict[ 'pmt_ord'])
assert np.all(sipm_nonzero.flatten() == id_dict['sipm_ord'])

A bit os cosmetics for readability.

Comment on lines 78 to 82
def test_get_n_sensors(pmt_ids, sipm_ids):
npmt, nsipm = get_n_sensors('new', 6400)

assert npmt == len( pmt_ids)
assert nsipm == len(sipm_ids)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parametrize to test positive and negative run numbers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what we gain by the parametrizing the test. Positive and negative run numbers are essentially identical as load_db uses the absolute value. The +/- difference is only used to invoke the copy of mc tables in the cities

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The idea was to make sure it worked in both cases. Since this code is going to be used in MC, you should use a negative run number, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems irrelevant but I can make it negative. As I mentioned above the load_db uses absolute value so the difference should be tested there.

Comment on lines 86 to 87
file_in = os.path.join(ICDATADIR ,
'Kr83_full_nexus_v5_03_01_ACTIVE_7bar_1evt.sim.h5')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filename is defined in the fixture mc_waveforms. It should probably be defined in an independent fixture to be reused here. Actually I've just seen that the same file is also used in other modules, so it should probably go in the global conftest.py file and be fixed later in the other modules.

Copy link
Collaborator

@gonzaponte gonzaponte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some very minor changes left. Also, please run pyflakes on the modified files to look for unused variables or imports.

@gonzaponte
Copy link
Collaborator

Neat.

Copy link
Collaborator

@gonzaponte gonzaponte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A complete set of functions and components to split events into buffers and create waveforms in each of them. It reuses existing code as much as needed and tests every function implemented. All functions have some documentation. Very nice job.

@carmenromo carmenromo merged commit 6b9b047 into next-exp:master Sep 16, 2020
@andLaing andLaing deleted the buffer_functions branch September 16, 2020 15:35
Comment on lines +42 to +43
## !! to-do: clarify for non-pmt versions of next
## !! to-do: Check on integral instead of only threshold?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many development tools recognize TODO (exactly four characters, all uppercase, no spaces, no dashes) and offer assistance with finding, managing, and warning about stuff that has been marked to be done, but this relies on it being written exactly like this: TODO.

So let us all, please, develop the habit of marking things to be done exactly thus: TODO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants