-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Valdrada, the trigger simulation city #795
base: master
Are you sure you want to change the base?
Conversation
A BLR function is added that computes the baseline as a moving average. It applies the high-pass filter after substracting such baseline (and thus it's done within the loop) and applies deconvolution. As happens in the FPGA, the returned signal is equal to the raw signal whenever the BLR is not active.
These functions extract the trigger parameters and check for coincidences between trigger candidates.
Add class that contains the simualted trigger info table.
Adds the writer for the simulated trigger dataframe.
Adds valdrada, this city takes rwf and produces a dataframe with the simulated trigger information. It takes a configurable set of parameters that imitate the detector acquisition ones and their naming convention. It provides all possible trigger candidates above a given width and flags them as valid or not based on the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the code in depth, but I've cached a few technical details already.
This PR is algorithmically complex and requires knowing many non-trivial details. Who else besides you is familiar with this? It would be great if this person could review the algorithmic side of things at least.
invisible_cities/conftest.py
Outdated
<<<<<<< HEAD | ||
======= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unresolved conflict! Fixed in the last commit, but shouldn't be here anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how to remove this from the history.
elements = [] | ||
for _ in range(6): | ||
elements.append(draw(lists(integers(min_value=0, max_value=np.iinfo(np.uint32).max), | ||
min_size=size, max_size=size))) | ||
for _ in range(4): | ||
elements.append(draw(lists(booleans(), min_size=size, max_size=size))) | ||
for _ in range(5): | ||
elements.append(draw(lists(integers(min_value=np.iinfo(np.int32).min, max_value=np.iinfo(np.int32).max), | ||
min_size=size, max_size=size))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these numbers have specific meanings? Maybe it's better to give them names lke
a_meaningful_name = lists(integers(min_value=0, max_value=np.iinfo(np.uint32).max)
and then draw(a_meaninful_name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that later these are assigned to each column in col_names
. I think in terms of self-documentation it's better for the composite to draw each value individually, but this might be too slow (haven't checked), so at least you should leave a comment annotating the columns they correspond to.
I think I would also transpose the data drawing: instead of "for each column generate all rows" -> "for each row generate all columns". This would be more natural if we had a composite that generates a single row and then another one that generates a list of rows of a given size. If this doesn't affect performance, I would go for it.
|
||
assert np.all([tname == cname for tname, cname in zip(trigger_dst.columns.values, col_names)]) | ||
|
||
# Check values stored, the if is needed because the valid_all condition is done in the writer, not on the input trigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The writer shouldn't manipulate the data. Why not add that column to the input instead?
@fixture(scope="session") | ||
def double_square_wf(): | ||
# Fake signal with two-nearby square pulses and a smaller additional one at the beginning. | ||
n_baseline = 64000 | ||
base_window = 1024 | ||
thr = 5 | ||
wf = np.zeros(n_baseline) | ||
start1 = np.random.randint(2*base_window, n_baseline // 2) | ||
length = np.random.randint(n_baseline // 50, n_baseline // 4) | ||
stop1 = start1 + length | ||
start2 = stop1 + 10 | ||
stop2 = start2 + length | ||
|
||
wf[start1:stop1] = np.full(length, np.random.randint(thr+1, 50)) | ||
wf[start2:stop2] = np.full(length, np.random.randint(thr+1, 50)) | ||
wf[1500 : 1520] = np.full(20 , np.random.randint(thr+1, 50)) # Extra pulse to check discard_width | ||
return wf, thr, length, start1, start2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never remember how to handle this, but using random numbers is something we should avoid. At the very least we should fix the seed for the tests or fixtures that use random numbers to produce data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a composite now, I guess it should be ok.
for i, element in enumerate(elements): | ||
assert element == triggers[0][i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not elements == triggers[0]
?
multi_candidate = ((t1[4] >= multi_w ) & | ||
(t1[3] > multi_q ) & | ||
(t1[2] <= t0[2] + t0[4] + multi_t+2)) # Start of trigger + peak width + time window + FPGA delay (2) | ||
if not multi_candidate: | ||
continue | ||
# In case the later peak ends after the multipeak protection window ends | ||
if (t1[2]+t1[4]) > (t0[2] + t0[4] + multi_t+2): # Event partially contained in the multipeak window |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these magic numbers have meaning, so again, a namedtuple would be better.
# Order by time, index so we can use this to order both times and ids | ||
order_idx = np.lexsort((ids, times)) | ||
order_time = times[order_idx] | ||
pmts = np.full(ids.shape, -1) | ||
# Calculate all time differences between each trigger and all later ones | ||
time_diffs = np.array([order_time[i+1:] - t for i, t in enumerate(order_time)]) | ||
# Since pmts are ordered by time, closest pmt will be the next one in the array. | ||
pmts[:-1] = ids[order_idx][1:] | ||
# Shortest time is the first value of time_diff for each trigger | ||
shortest = np.array([t[0] if len(t)>0 else -1 for t in time_diffs]) | ||
# Count how many triggers are within the coincidince window of each trigger | ||
n_coinc = np.array([len(t[t<coinc_window])+1 for t in time_diffs]) | ||
# Undo the original ordering | ||
old_order = np.argsort(order_idx) | ||
# Assign the coincidence results to the trigger array | ||
triggers[valid_sel, -3] = n_coinc [old_order] | ||
triggers[valid_sel, -2] = shortest[old_order] | ||
triggers[valid_sel, -1] = pmts [old_order] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add empty lines in between blocks to make it a bit more readable.
order_time = times[order_idx] | ||
pmts = np.full(ids.shape, -1) | ||
# Calculate all time differences between each trigger and all later ones | ||
time_diffs = np.array([order_time[i+1:] - t for i, t in enumerate(order_time)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting it to an array is misleading. This would be a 1D array of type object
each object being a 1D array of different lengths. Better leave it as a list.
invisible_cities/io/trigger_io.py
Outdated
row["valid_w" ] = valid_w | ||
row["valid_h" ] = valid_h | ||
row["valid_peak" ] = valid_peak | ||
row["valid_all" ] = (valid_q + valid_w + valid_h + valid_peak) == 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the writer shouldn't need to know anything about what it is writing. you may also want to consider creating a dataframe and saving it using the dataframe writer?
also, since these are booleans, it's more appropriate to use all
.
Some things were answered in the comments of the reviewer, anothers talked peer to peer, but I think all the issues pointed out so far in the review have been taken into account. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ignoring cosmetics for now.
- q_min, q_max : Range for the integrated charge of the peak (q_min < q < q_max). | ||
In ADC counts. | ||
- time_min, time_max : Range for the peak width (time_min <= width < time_max). | ||
In time mus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In time mus. | |
In mus. |
valdrada | ||
----------------------------------------------------------------------- | ||
|
||
This city finds simulates the trigger procedure at the FPGA over PMT waveforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This city finds simulates the trigger procedure at the FPGA over PMT waveforms. | |
This city simulates the trigger procedure at the FPGA over PMT waveforms. |
|
||
A number of general configuration parameters are needed (input as a dict, trigger_config): | ||
- coincidence_window : Time window (in time bins) in which valid trigger coincidences are counted | ||
- discard_width : Any trigger with width less than this parameter is discarded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
units?
if trigger_config['multipeak'] is not None: | ||
trigger_config['multipeak']['time_min' ] /= 25*units.ns | ||
trigger_config['multipeak']['time_after'] /= 25*units.ns | ||
for k in channel_config.keys(): | ||
channel_config[k]['time_min'] /= 25*units.ns | ||
channel_config[k]['time_max'] /= 25*units.ns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't these be forced to be ints?
assert_tables_equality(got, expected) | ||
|
||
|
||
def test_valdrada_exact_result(ICDATADIR, trigger_config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
@mark.parametrize("multipeak", (True, False))
def test_valdrada_exact_result(ICDATADIR, trigger_config, multipeak)
...
if not multipeak:
conf['trigger_config']['multipeak'] = None
to combine both tests? You would need to add true_out to the parametrize as well, but it's worth it.
else: | ||
new_val = False | ||
if not new_val: break | ||
triggers[i] = triggers[i]._replace(valid_peak=new_val, valid_all=all([triggers[i].valid_all, new_val])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
triggers[i] = triggers[i]._replace(valid_peak=new_val, valid_all=all([triggers[i].valid_all, new_val])) | |
triggers[i] = triggers[i]._replace(valid_peak=new_val, valid_all=triggers[i].valid_all and new_val) |
def retrieve_trigger_information(channel_config : Dict[int, int] | ||
,trigger_config : Dict | ||
) -> Callable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a city component, but maybe it just looks like one. I leave to you the decision of moving it to components.
channel_dict = {'q_min' : 5000, 'q_max' : 50000, | ||
'time_min' : 2000, 'time_max': 10000, | ||
'baseline_dev' : 5, 'amp_max' : 1000, | ||
'pulse_valid_ext': 15} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
units
def trigger_time_and_validity(draw): | ||
# Set of trigger times an validity flags, to test trigger coincidence | ||
size = draw (integers(min_value=1, max_value=100)) | ||
flag = lists(integers(min_value=0, max_value=1 ), min_size=4, max_size=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be booleans?
top[k] = ped/base_window | ||
|
||
return np.asarray(signal_r, dtype='int16'), np.asarray(top, dtype='int16') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be good to clip the result before casting it to an int. Otherwise, 33000 would become -32536.
The limits for int16 are -32768 and +32767
Adds valdrada, the trigger which simulates the trigger in the FPGA.
This city takes rwf and produces a dataframe with the simulated trigger information. It takes a configurable set of parameters that imitate the detector acquisition ones and their naming convention. It provides all possible trigger candidates above a given width and flags them as valid or not based on the configuration.
Details of the output information are in the city main file but basically consist on the values used for triggering in the FPGA. Aside from the trigger implementation itself, a modified BLR algorithm that imitates the behavior of the one in the FPGA has been added.
This notebook ilustrates the flow of Valdrada and shows how the less immediate parameters impact the trigger results.
https://github.com/Aretno/Notebooks/blob/master/TriggerSim/Valdrada.ipynb
The implementation has been based on extensive discussion with Raúl and has been throughly tested prior to the PR. This was done jointly with Raúl, evaluating the FPGA output he obtained with the output of this implementation. Full agreement was found between the fpga simulation and the python implementation. A discrepance of 0.16% in the total number of valid triggers was found between already acquired data and data passing the trigger simulation (done for several runs). This discrepance also appeared in the FPGA simulation which is a copy-paste of the existing code. Such discrepance was traced to the high pass filter, which starts with a different value in the online acquisition (depends on previous time bins, which we do not stored) and yielded a different starting value for the coefficients. This could cause an ADC deviation which modify the overall result in edge cases.
Moreover, the tool has already been used for extensive trigger analysis with no visible deviations from the expected behavior.