-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add protection for empty waveforms #776
Conversation
We haven't thought about this possibility indeed. I think the most sensible is to filter out events with empty waveforms, instead of this hacky solution of tricking signal finding.
|
In 3. you mean rerun the test? |
I think there's even a filter function in |
Yeah, I think it can go to commonly used
Well notice that we do 2 things here, fix a bug so the 'weird' file doesn't produce the error, but also add a new filter table that we want to test that it contains the 'weird' event. My suggestion was to, in the first commit, introduce a test that will just run the city over the file (and fail), ie lines of a type
and in step 3 add We could, of course, do this in 2 separate tests, one that is simply showing the bug fix and the other that is testing the filtering, in which case we run over the same file 2x instead of reusing the output of one run only. |
I'm sorry if I haven't got the time to carefully understand the specific details of this case. In general, I recommend the following procedure:
Now, as for this specific case: I guess that I don't understand sufficiently what is meant exactly by 'filter table' and the weird event being in it. Is the bug fixed by filtering out the weird event, or by keeping the event and adding tolerance for such weird events to the implementation? |
It is fixed by filtering out the event, it is an empty event hence we do not want to keep it. So, in addition to code not breaking with unexpected error, we also want to test that the particular event number is saved in the Filters table in the output file (filtered out events are saved in Filters/filter_name tables). |
When an empty waveform is created in detsim the signal finder does not find any pulse, breaking the city flow.
18e2252
to
c320631
Compare
My proposed solution requires to add a filter for empty signals in Note also that the simplistic solution of filter empty waveforms directly in the city flow is somehow buggy, since those waveforms can contain signal which could not be found by |
@@ -55,6 +55,7 @@ def find_signal_start(wfs : Union[pd.Series, np.ndarray], | |||
eng_sum = wfs.sum() | |||
indices = indices_and_wf_above_threshold(eng_sum, | |||
bin_threshold).indices | |||
if len(indices) == 0: return [] | |||
## Just using this and the stand_off for now | |||
## taking first above sum threshold. | |||
## !! To-do: make more robust with min int? or similar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really not worth opening a separate PR for it, so, while we're touching this file, could you please add a single commit which fixes all the to-do
s in this file (and any others that you are already touching in this PR), changing them to this precise spelling: TODO
(exactly four characters, all uppercase, no spaces, no dashes). This is because many development tools recognize TODO
and can help managing them.
invisible_cities/cities/detsim.py
Outdated
fl.branch("event_number" , | ||
evtnum_collect.sink), | ||
buffer_calculation), | ||
fl.sink(lambda x: x)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of this? Is it to have a /dev/null
sink? (One that just throws the data away?)
If so, then simply don't branch off at the previous step!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could sink the pipe inside calculate_and_save_buffers
, but I wanted to keep the evtnum_collect
counter. I dont know how to sink
the main pipe afterwards so I added this dummy sink
which is irrelevant for the city itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. At the moment you have
buffer_calculation ----> /dev/null
\
---> pick event_number -> evtnum_collect.sink
Why can't you do the following instead?
buffer_calculation ---> pick event_number -> evtnum_collect.sink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. My bad because didn't really understand what "event_number"
did inside a pipe.
filter_events_signal, | ||
fl.branch(write_signal_filter), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, lining the commas up vertically helps to avoid mistakes and misunderstandings, and this can be especially important in dataflow pipelines. Now, adjusting the commas at the end of the line is a PITA, and adds unnecessary noise to the diffs.
I recommend that we adopt the following style. Rather than
fn(foo,
baaaaaaaa,
baz,
quux,
quuuuuux)
where comma-errors or surprises are difficult to spot, or
fn(foo ,
baaaaaaaa,
baz ,
quux ,
quuuuuux )
which is a PITA to keep aligned, do this:
fn( foo
, baaaaaaaa
, baz
, quux
, quuuuuux)
This will really freak out the PEP8-ists, (and many of you) but, turst me, it's the best way forward :-)
buffer_writer_ = sink(buffer_writer( h5out | ||
, run_number = run_number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm partial to aligning the commas directly under the opening bracket (and I'm pretty sure that is de-facto standard for this style), but I won't kick up a fuss if you leave it as it is.
2fd07ec
to
90e7bc7
Compare
If there are no more issues/comments, this PR should be ready to approval @jacg |
I'm tied up elsewhere at the moment. Apart from the stylistic comments I've already made, I wanted to have a look at the semantics, but I won't have time for that in the next 24h, at least. Maybe someone else can have a look. |
I had a look and it seems like a minimal change that address the issue : if no pulses are found, filter the event. |
Avoid unnecessary branching in detsim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds a filter in case no signal pulses were found in the waveforms. New filter is used both in detsim and buffy (generalizing the previous buffy filter that was triggered for only all-0s waveforms). The test to show unwanted behavior in detsim is added - it fails before the changes and passes afterwards.
The code changed is simple and clean. Thanks!
This PR add a protection for empty waveforms in signal finder.
I realized about this issue throught detsim simulation of background events close to the chamber edges at radious higher than the active radious.
In such cases the waveforms created by detsim are empty, ie they are a list of zeros. Current implementation is not prepared to deal this case. Also, note that a
trigger_thr
parameter high enought, no signals are found and the same problem arises. This issue also affects buffy city.