Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add protection for empty waveforms #776

Merged
merged 7 commits into from
Mar 2, 2021

Conversation

gondiaz
Copy link
Collaborator

@gondiaz gondiaz commented Feb 26, 2021

This PR add a protection for empty waveforms in signal finder.
I realized about this issue throught detsim simulation of background events close to the chamber edges at radious higher than the active radious.
In such cases the waveforms created by detsim are empty, ie they are a list of zeros. Current implementation is not prepared to deal this case. Also, note that a trigger_thr parameter high enought, no signals are found and the same problem arises. This issue also affects buffy city.

@gondiaz gondiaz requested review from andLaing and mmkekic February 26, 2021 09:16
@mmkekic mmkekic removed the request for review from andLaing February 26, 2021 09:20
@mmkekic
Copy link
Collaborator

mmkekic commented Feb 26, 2021

We haven't thought about this possibility indeed. I think the most sensible is to filter out events with empty waveforms, instead of this hacky solution of tricking signal finding.
I suggest you :

  1. provide a test ( and a 1-event testfile ) where simply running a detsim will fail
  2. add a filter to city flow to filter out events with empty waveforms
  3. update the test to check that the events are filtered

@jacg
Copy link
Collaborator

jacg commented Feb 26, 2021

In 3. you mean rerun the test?

@andLaing
Copy link
Collaborator

I think there's even a filter function in buffy.py maybe worth moving it to components or something so it's generally useful.

@mmkekic
Copy link
Collaborator

mmkekic commented Feb 26, 2021

I think there's even a filter function in buffy.py maybe worth moving it to components or something so it's generally useful.

Yeah, I think it can go to commonly used calculate_and_save_buffers

In 3. you mean rerun the test?

Well notice that we do 2 things here, fix a bug so the 'weird' file doesn't produce the error, but also add a new filter table that we want to test that it contains the 'weird' event. My suggestion was to, in the first commit, introduce a test that will just run the city over the file (and fail), ie lines of a type

conf.update(dict(files_in='weird_file.h5', file_out='tmp_file.h5'))
detsim(**conf)

and in step 3 add
check Filter table inside tmp_file.h5

We could, of course, do this in 2 separate tests, one that is simply showing the bug fix and the other that is testing the filtering, in which case we run over the same file 2x instead of reusing the output of one run only.

@jacg
Copy link
Collaborator

jacg commented Feb 26, 2021

I'm sorry if I haven't got the time to carefully understand the specific details of this case.

In general, I recommend the following procedure:

  1. Commit a test that points out a problem. This test should fail.

  2. Commit a solution to the problem.

  3. The same, unmodified test introduced in the first step should now pass, proving that

    • there was a problem that needed fixing
    • the problem has been fixed
    • the latest commit is the solution to that problem.

Now, as for this specific case: I guess that I don't understand sufficiently what is meant exactly by 'filter table' and the weird event being in it.

Is the bug fixed by filtering out the weird event, or by keeping the event and adding tolerance for such weird events to the implementation?

@mmkekic
Copy link
Collaborator

mmkekic commented Feb 26, 2021

Is the bug fixed by filtering out the weird event, or by keeping the event and adding tolerance for such weird events to the implementation?

It is fixed by filtering out the event, it is an empty event hence we do not want to keep it. So, in addition to code not breaking with unexpected error, we also want to test that the particular event number is saved in the Filters table in the output file (filtered out events are saved in Filters/filter_name tables).

When an empty waveform is created in detsim the signal finder does
not find any pulse, breaking the city flow.
@gondiaz
Copy link
Collaborator Author

gondiaz commented Feb 26, 2021

My proposed solution requires to add a filter for empty signals in calculate_and_save_buffers function. In order to maintaint the evtnum_collect the city pipes of detsim and buffy must be rewritted.

Note also that the simplistic solution of filter empty waveforms directly in the city flow is somehow buggy, since those waveforms can contain signal which could not be found by find_signal_start. Now, if no signal is found by this function it returns an empty list that is filtered afterwards in calculate_and_save_buffers .

@gondiaz gondiaz marked this pull request as ready for review February 26, 2021 19:09
@@ -55,6 +55,7 @@ def find_signal_start(wfs : Union[pd.Series, np.ndarray],
eng_sum = wfs.sum()
indices = indices_and_wf_above_threshold(eng_sum,
bin_threshold).indices
if len(indices) == 0: return []
## Just using this and the stand_off for now
## taking first above sum threshold.
## !! To-do: make more robust with min int? or similar
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really not worth opening a separate PR for it, so, while we're touching this file, could you please add a single commit which fixes all the to-dos in this file (and any others that you are already touching in this PR), changing them to this precise spelling: TODO (exactly four characters, all uppercase, no spaces, no dashes). This is because many development tools recognize TODO and can help managing them.

fl.branch("event_number" ,
evtnum_collect.sink),
buffer_calculation),
fl.sink(lambda x: x)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this? Is it to have a /dev/null sink? (One that just throws the data away?)

If so, then simply don't branch off at the previous step!

Copy link
Collaborator Author

@gondiaz gondiaz Feb 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could sink the pipe inside calculate_and_save_buffers, but I wanted to keep the evtnum_collect counter. I dont know how to sink the main pipe afterwards so I added this dummy sink which is irrelevant for the city itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. At the moment you have

buffer_calculation ----> /dev/null
                  \
                    ---> pick event_number -> evtnum_collect.sink

Why can't you do the following instead?

buffer_calculation ---> pick event_number -> evtnum_collect.sink

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. My bad because didn't really understand what "event_number" did inside a pipe.

Comment on lines 911 to 912
filter_events_signal,
fl.branch(write_signal_filter),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, lining the commas up vertically helps to avoid mistakes and misunderstandings, and this can be especially important in dataflow pipelines. Now, adjusting the commas at the end of the line is a PITA, and adds unnecessary noise to the diffs.

I recommend that we adopt the following style. Rather than

fn(foo,
   baaaaaaaa,
   baz,
   quux,
   quuuuuux)

where comma-errors or surprises are difficult to spot, or

fn(foo      ,
   baaaaaaaa,
   baz      ,
   quux     ,
   quuuuuux )

which is a PITA to keep aligned, do this:

fn( foo
  , baaaaaaaa
  , baz
  , quux
  , quuuuuux)

This will really freak out the PEP8-ists, (and many of you) but, turst me, it's the best way forward :-)

Comment on lines 901 to 902
buffer_writer_ = sink(buffer_writer( h5out
, run_number = run_number
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm partial to aligning the commas directly under the opening bracket (and I'm pretty sure that is de-facto standard for this style), but I won't kick up a fuss if you leave it as it is.

@gondiaz
Copy link
Collaborator Author

gondiaz commented Mar 1, 2021

If there are no more issues/comments, this PR should be ready to approval @jacg

@jacg
Copy link
Collaborator

jacg commented Mar 1, 2021

I'm tied up elsewhere at the moment. Apart from the stylistic comments I've already made, I wanted to have a look at the semantics, but I won't have time for that in the next 24h, at least. Maybe someone else can have a look.

@mmkekic
Copy link
Collaborator

mmkekic commented Mar 1, 2021

I had a look and it seems like a minimal change that address the issue : if no pulses are found, filter the event.
This is a generalization of the previous filter in buffy where we filtered out waveforms that had no positive signal, hence implicitly assuming that the threshold for pulses will always be 0.
The main change is in find_signal_startin buffer_functions.py where an empty list is returned in case there are no pulse candidates, and the change in pipe calculate_and_save_buffers in components.py where the filter itself is added. If nobody complains I would approve this PR and the only thing left is some history squashing.

gondiaz added 4 commits March 2, 2021 10:25
Avoid unnecessary branching in detsim
Fix typo in test comment

Rename subpipe

Re-align commas in calculate_and_save_buffers

Re-align commas in detsim

Re-align commas in buffy

Rename TODOs in file
Copy link
Collaborator

@mmkekic mmkekic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a filter in case no signal pulses were found in the waveforms. New filter is used both in detsim and buffy (generalizing the previous buffy filter that was triggered for only all-0s waveforms). The test to show unwanted behavior in detsim is added - it fails before the changes and passes afterwards.
The code changed is simple and clean. Thanks!

@carmenromo carmenromo merged commit bf374af into next-exp:master Mar 2, 2021
@gondiaz gondiaz deleted the signal_protect branch September 14, 2021 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants