-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bufferization functions #737
Conversation
There's probably some scope for change and certainly we should work out the way that all or part of these functions could be used as the final part of the detsim flow too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't taken a look to the tests just yet, but they will probably change a bit with this.
from .. reco.peak_functions import split_in_peaks | ||
|
||
|
||
@wraps(np.histogram) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your intention is to indicate that this is just a special np.histogram
call, but wraps
would be misleading, as you would expect exactly the same call signature, which is not the case. If this function is only used in wf_binner.bin_data
, you can define it in place, if you want.
return np.histogram(data.time, weights=data.charge, bins=bins)[0] | ||
|
||
|
||
def wf_binner(max_buffer: int) -> Callable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a city component, rather than a generic function. Maybe the scheme should rather be to define bin_data
as a generic function with an extra argument max_buffer
and wf_binner
a city component that calls it fixing that argument.
def signal_finder(buffer_len : float, | ||
bin_width : float, | ||
bin_threshold: int) -> Callable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. This looks like a city component.
stand_off = int(buffer_len * units.mus / bin_width) | ||
def find_signal(wfs: pd.Series) -> List[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, find_signal
could work taking stand_off
and bin_threshold
as parameters and signal_finder
should calculate stand_off
.
PE threshold for selection | ||
""" | ||
|
||
stand_off = int(buffer_len * units.mus / bin_width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an inline comment explaining what stand_off
means in this context? I think I get it, seeing how it is used, but it would help to have a short clarification.
def generate_slices(triggers: List) -> Tuple: | ||
|
||
for trg in triggers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seen how you use it, I think it would be more useful to define it for a single trigger and move the loop outside.
sipm_q = np.empty((0,0))\ | ||
if sipm_charge.empty else np.array(sipm_charge.tolist()) | ||
slice_and_pad = slice_generator(pmt_bins , | ||
np.array(pmt_charge.tolist()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pmt_charge.values
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first thought but .values
on the Series with lists as values (they're histograms for each sensor) gives and array of arrays and not a 2D np.ndarray which is what we want so we can use slicing to get everything into buffers.
pmt_ord = pmt_ids [ pmt_ids.isin( pmt_resp.index.tolist())].index | ||
sipm_ord = sipm_ids[sipm_ids.isin(sipm_resp.index.tolist())].index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is tolist()
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I must have forgotten to check after changing something. I'll sort it.
return order_and_pad | ||
|
||
|
||
def get_no_sensors(detector_db: str, run_number: int) -> Tuple: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I suggest changing no
to n
or number_of
?
I think I addressed the initial comments in the last set of pushes. Whenever the reviewers have some time, the PR can be checked again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I get the feeling that we should have named outputs in several places. Specifically, I am very tempted to just declare type synonyms like
ChargeArray = np.ndarray # 1D
WFset = np.ndarray # 2D
Waveforms = pd.Series # 2D
(names not optimal). I think this would probably increase the readability and understanding of what data types are going through each function because it's not always obvious.
This PR looks very good. Although there are lots of comments, most of them are minor.
buffer_len : float | ||
Configured buffer length in mus | ||
bin_width : float | ||
Sampling width for sensors | ||
bin_threshold : int | ||
PE threshold for selection | ||
""" | ||
# The stand_off is the minumum number of samples | ||
# necessary between candidate triggers. | ||
stand_off = int(buffer_len * units.mus / bin_width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are lacking the city here, but I suppose buffer_len
will be read from config file and if so, will it not come already with units? In general, units should only be applied when inputting data to the program (either by hardcoding parameters or by reading them from somewhere where they are assumed to be in a specific unit). (Most) functions should assume everything comes in "default" units.
bins = np.arange(min_bin, max_bin, bin_width) | ||
b_pad = bins[-1] + bin_width | ||
bin_sensors = sensors.groupby('sensor_id').apply(weighted_histogram , | ||
np.append(bins, b_pad)) | ||
return bins, bin_sensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is an edge case where this doesn't work (?) but, isn't it simpler:
bins = np.arange(min_bin, max_bin, bin_width) | |
b_pad = bins[-1] + bin_width | |
bin_sensors = sensors.groupby('sensor_id').apply(weighted_histogram , | |
np.append(bins, b_pad)) | |
return bins, bin_sensors | |
bins = np.arange(min_bin, max_bin + bin_width, bin_width) | |
bin_sensors = sensors.groupby('sensor_id').apply(weighted_histogram, bins) | |
return bins[:-1], bin_sensors |
Also, is it intended to not return the last upper bin edge?
bin_width : float , | ||
t_min : float , | ||
t_max : float , | ||
max_buffer: int ) -> Tuple: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the output annotation should be Tuple[np.ndarray, pd.Series]
. Also in the corresponding city component.
return np.histogram(data.time, weights=data.charge, bins=bins)[0] | ||
|
||
|
||
def bin_sensors(sensors : pd.Series, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sensors
' type hint should be pd.DataFrame
, right? If so, check also the city component.
nsamp_pmt = 5000 | ||
nsamp_sipm = 5 | ||
|
||
sensor_order = order_sensors(detector_db, run_number, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swap names. order_sensors = sensor_order(er?)(...)
pmt_resp = pd.Series([[1]*nsamp_pmt ]*len(id_dict[ 'pmt_ids']), | ||
index = id_dict[ 'pmt_ids']) | ||
sipm_resp = pd.Series([[1]*nsamp_sipm]*len(id_dict['sipm_ids']), | ||
index = id_dict['sipm_ids']) | ||
pmt_q = np.array( pmt_resp.tolist()) | ||
sipm_q = np.array(sipm_resp.tolist()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be defined in the same fixture id_dict
(changing the name accordingly)?
assert pmt_out.shape == (n_pmt, nsamp_pmt) | ||
pmt_nonzero = np.argwhere(pmt_out.sum(axis=1) != 0) | ||
assert np.all(pmt_nonzero.flatten() == id_dict['pmt_ord']) | ||
assert sipm_out.shape == (n_sipm, nsamp_sipm) | ||
sipm_nonzero = np.argwhere(sipm_out.sum(axis=1) != 0) | ||
assert np.all(sipm_nonzero.flatten() == id_dict['sipm_ord']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert pmt_out.shape == (n_pmt, nsamp_pmt) | |
pmt_nonzero = np.argwhere(pmt_out.sum(axis=1) != 0) | |
assert np.all(pmt_nonzero.flatten() == id_dict['pmt_ord']) | |
assert sipm_out.shape == (n_sipm, nsamp_sipm) | |
sipm_nonzero = np.argwhere(sipm_out.sum(axis=1) != 0) | |
assert np.all(sipm_nonzero.flatten() == id_dict['sipm_ord']) | |
assert pmt_out.shape == (n_pmt , nsamp_pmt) | |
assert sipm_out.shape == (n_sipm, nsamp_sipm) | |
pmt_nonzero = np.argwhere( pmt_out.sum(axis=1) != 0) | |
sipm_nonzero = np.argwhere(sipm_out.sum(axis=1) != 0) | |
assert np.all( pmt_nonzero.flatten() == id_dict[ 'pmt_ord']) | |
assert np.all(sipm_nonzero.flatten() == id_dict['sipm_ord']) |
A bit os cosmetics for readability.
def test_get_n_sensors(pmt_ids, sipm_ids): | ||
npmt, nsipm = get_n_sensors('new', 6400) | ||
|
||
assert npmt == len( pmt_ids) | ||
assert nsipm == len(sipm_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parametrize to test positive and negative run numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what we gain by the parametrizing the test. Positive and negative run numbers are essentially identical as load_db uses the absolute value. The +/- difference is only used to invoke the copy of mc tables in the cities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The idea was to make sure it worked in both cases. Since this code is going to be used in MC, you should use a negative run number, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems irrelevant but I can make it negative. As I mentioned above the load_db uses absolute value so the difference should be tested there.
file_in = os.path.join(ICDATADIR , | ||
'Kr83_full_nexus_v5_03_01_ACTIVE_7bar_1evt.sim.h5') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This filename is defined in the fixture mc_waveforms
. It should probably be defined in an independent fixture to be reused here. Actually I've just seen that the same file is also used in other modules, so it should probably go in the global conftest.py file and be fixed later in the other modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some very minor changes left. Also, please run pyflakes
on the modified files to look for unused variables or imports.
fb33d8e
to
6bc7363
Compare
Neat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A complete set of functions and components to split events into buffers and create waveforms in each of them. It reuses existing code as much as needed and tests every function implemented. All functions have some documentation. Very nice job.
## !! to-do: clarify for non-pmt versions of next | ||
## !! to-do: Check on integral instead of only threshold? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many development tools recognize TODO
(exactly four characters, all uppercase, no spaces, no dashes) and offer assistance with finding, managing, and warning about stuff that has been marked to be done, but this relies on it being written exactly like this: TODO
.
So let us all, please, develop the habit of marking things to be done exactly thus: TODO
.
Adds functions and utilities used to sort MC sensor data (full optical simulation) into buffers for IC processing.
Addresses point 6 of issue #691
Functions can be seen in context in https://github.com/andLaing/detsim/tree/IC-integration-tests where the
position_signal.py
script will form the basis for a future IC city.