Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New tracking city #755

Merged
merged 6 commits into from
Jul 22, 2021
Merged

New tracking city #755

merged 6 commits into from
Jul 22, 2021

Conversation

ausonandres
Copy link
Collaborator

According to issue #749, the aim of this PR is to add a city (to be run just after Beersheba, for the moment), that computes all the tracking information. Since the tracking-related functions will be shared between Esmeralda and this new city (so-called Isaura (credits to @jjgomezcadenas)), they will be moved to other places (components.py, in most of the cases).

There are two issues that I faced while doing the city. The first one was already pointed out in #749, and it is related to the kdst information. It would be interesting that the output of this tracking city contains kdst information (nS2 variable is used for the posterior analysis), however current Beersheba version doesn't save this table. Therefore, for the time being, we can access to this information just opening the cdst files, although I believe that in the near futurue we should add this table in Beersheba and then in Isaura.
The second issue concerns the tracking algorithm. As you may know, it seems that Paolina, and particularly reco/paolina_functions/find_extrema() funcion is not deterministic. This is because in some events there are more than one possible extreme voxels for a given track, and it appear that in those cases they are chosen randomly. The solution that I propose (and add here, in order to check that the output was correct -event though maybe this is not the place to do it-) consists in save all the possibilities, and then choose the extreme where the sum of their energy takes the maximum value (in order to avoid, if possible, the usage of paolina_functions/drop_end_point_voxels() function). Nonetheless, there would be other solutions, but the choice of one or another will be a bit arbitrary at the end, in my opinion.

@ausonandres ausonandres force-pushed the new_tracking_city branch 4 times, most recently from a4ae48d to 9a8c154 Compare November 16, 2020 15:02
@jacg
Copy link
Collaborator

jacg commented Apr 6, 2021

This PR is almost 5 months old, and has received no attention whatsoever. It relates to an issue (#749) which solicited the wisdom of: @pnovella @msorel @paolafer @ausonandres @Aretno @jahernando @jjgomezcadenas @gonzaponte

7 tests fail, most of them for fairly basic reasons. Please fix. Then rebase, so that the tests run on our new CI infrastructure (GitHub Actions).

Please don't leave PRs with red crosses (failing tests) in the queue

  • If the failures are spurious, re run the CI jobs.
  • If the failures are not spurious, fix the problems.
  • If you do not know how to fix the problem, ask for help.

A PR queue full of red crosses encourages people to ignore PRs.

@ausonandres
Copy link
Collaborator Author

Hi, sorry i's my fault... I was trying to solve the test failing issues, however some other urgent work that I had to attend to came up at that moment. After that, I completely forgot to come back to this PR. I'll do it as soon as I have some time.

@jacg
Copy link
Collaborator

jacg commented Apr 6, 2021

When it is ready for review, please explicitly ask someone to review it, using the buttons/links near the top-right of the PR page. If it is not yet ready for review, please mark it as a draft, using the link at the bottom of the reviewers section: 'Still in progress? convert to draft'.

@ausonandres ausonandres marked this pull request as draft April 6, 2021 18:37
@ausonandres ausonandres force-pushed the new_tracking_city branch 3 times, most recently from 49bea46 to be15740 Compare May 18, 2021 10:54
@gondiaz gondiaz force-pushed the new_tracking_city branch from be15740 to 5d98b10 Compare July 6, 2021 14:36
event_info = load_dst(path, 'Run', 'events')
for evt in dhits_df.event.unique():
timestamp = event_info[event_info.evt_number==evt].timestamp.values[0]
dhits = hits_from_df(dhits_df.loc[dhits_df.event==evt])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct since the DECO hits for a single event can contain multiple peaks. Since no peak information is included in the final track information output, we can not directly check if a multiple-track event comes from a real spliting or by a multiple-S2 event. This is not a relevant issue as long as commonly used 1S2 and 1-track cut are performed, but it is biasing the efficiencies.

This PR also modifies the search of track extremes in order to be deterministic but it does not adapt the tests, also affecting esmeralda tests and making the PR more difficult to review. I think this modifications should be a PR appart. I will do a different PR implementing the same functionality of this city, essentially running paolina algorithm over deconvoluted hits, without modifying the track extreme search.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand completely your plan. If you want to add a new PR that exclusively adds the city to run paolina over deconvoluted hits, what would you do with this PR? just ignore it? would you use this just for solving the extreme issue? There is another thing to take into account, this PR apart from doing both things, also moves some functions from Esmeralda to components.py, would you also add that functionality in your new PR?

In my opinion I believe that the best approach could be removing all the extreme-related issue from this PR, and then add that new PR that you commented to take it into account. So the goal of this PR will keep being to add the tracking city.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as you might see in #798 I just did another PR since I didn't aim to undo your changes related with the extreme finder. I didn't move the functions still from esmeralda.py to ease the review, but I intend to do it once the PR is reviewed. Then the main difference with this PR is that I left the paolina functions unmodified, which in my opinion should be done, if ever, in a different PR. It also changes the source to solve the issue with multipeak events, such that those events are filtered.

@ausonandres ausonandres force-pushed the new_tracking_city branch 3 times, most recently from 08444ee to b70a7ba Compare July 14, 2021 14:04
@@ -589,6 +595,31 @@ def MC_hits_from_files(files_in : List[str], rate: float) -> Generator:
timestamp = timestamp(evt))


def dhits_from_files(paths: List[str]) -> Iterator[Dict[str,Union[HitCollection, pd.DataFrame, MCInfo, int, float]]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that
hits_and_kdst_from_files, cdst_from_files and now a newly added dhits_from_files are all doing almost the same thing!
They all read hits (from different tables) and optionally kdst and summary_df, but do we really need 3 different readers? Cant we just have one reader with hits- table and group name as parameters and optionally read kdst and summary_df? @gonzaponte , @Aretno

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all that needs some refactoring, but I would put it in another PR.
There is dst_from_files which is almost what you request, but there are some subtleties done in some of those readers that might not be easy to implement in a general reader.

return copy_energy_to_Ep_hit_attribute_


types_dict_summary = OrderedDict({'event' : np.int32 , 'evt_energy' : np.float64, 'evt_charge' : np.float64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit cumbersome, maybe is better to place types_dict_... in ic_types.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I agree

write_summary = fl.sink( summary_writer (h5out=h5out) , args="event_info" )
write_topology_filter = fl.sink( event_filter_writer(h5out, "topology_select"), args=("event_number", "topology_passed" ))

if hits_type:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt it better to have
write_hits_paolina be None or fl.branch(fl.sink..)) and then filter the fn_list for Nones?
pipe(*filter(None, fn_list)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also put this writer definition in esmeralda and pass it as argument to this function, if we are sure not to use this in other cities?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll try that. Regarding your second comment, my intention here was to generalize this part of the function in order to allow writing the deconvoluted hits if this tracking city is run before Esmeralda's hit correction (maybe in the future). But I'll change it if you want.

write_topology_filter = fl.sink( event_filter_writer(h5out, "topology_select"), args=("event_number", "topology_passed" ))

if hits_type:
group_table = hits_type.split("/")[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hits_type is very misleading name, isnt it better to simply have 2 parameters directly? It would also be less error prone since you wont depend on the user passing string with this explicit pattern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have discussed offline, it doesn't make a lot of sense the possibility to save the deconvoluted hits, therefore I remove the option for allowing to change the name of the table. Now there is just a flag called save_paolina_hits that will be True, in case the function is executed inside Esmeralda (saving the hits in the table CHITS/highTh) and it will correspond to False, if the function is run inside Isaura.

args="paolina_hits" )
else: write_hits_paolina = fl.sink(lambda _: None)

fn_list = (create_extract_track_blob_info ,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should add the E or Ec to Ep copying here, it would simply need a hit_type as input and used in both cities.

you should also add hits filter (and filter writer) here since it is also shared between the cities

Cluster(row.Q, xy(row.X, row.Y), xy(row.Xrms**2, row.Yrms**2),
row.nsipm, row.Z, row.E,
Qc = getattr(row, 'Qc', -1)), # for backwards compatibility
Cluster(getattr(row,'Q', row.E), xy(row.X, row.Y),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is becoming a bit silly, we are missing half of the variables to construct our HitCollection type...
It is also not true in case of Xrms and Yrms squared - here you are setting them to 0.

I am not saying you should deal with all event model problems in this PR but maybe doing something a bit cleaner like

for i, row in df.iterrows():
    Q = getattr(row,'Q', -1)
    if skip_NN and Q == NN:
        continue
    if hasattr(row, 'Xrms):
         Xrms = row.Xrms
         Xrms2 = Xrms**2
    else:
        Xrms =  Xrms2 = -1
    if hasattr(row, 'Yrms):
    ....
    nsipm = getattr...
    Qc = ...

This is still very ugly but avoids calling getattr for parameters that are used more than once, and it also makes it a bit more clear where we are setting this default -1 value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll modify this to match your suggestion. It's completely true that this should be done just as a provisional solution: as you say, in the near future the event model must be revisited and changed in order to avoid these issues...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, I've changed the function trying to be a bit clearer.
Yes, I obviously agree with you, it has no sense using this HitCollection here, however I'd say that that issue is not related with this PR (since it's meant to adapt the tracking algorithm to the current code), so it should be mandatory to revisit the event_modelin the near future.

@ausonandres ausonandres force-pushed the new_tracking_city branch 3 times, most recently from 60c82f7 to 52a03e0 Compare July 16, 2021 07:25
@mmkekic mmkekic marked this pull request as ready for review July 16, 2021 13:49
@ausonandres ausonandres force-pushed the new_tracking_city branch 5 times, most recently from 86cd1f0 to dc00c6c Compare July 19, 2021 10:46
@@ -1002,3 +1037,267 @@ def calculate_and_save_buffers(buffer_length : float ,
# Filter out order_sensors if it is not set
buffer_definition = pipe(*filter(None, find_signal_and_write_buffers))
return buffer_definition

def copy_E_or_Ec_to_Ep(energy_type: HitEnergy):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @gonzaponte suggest, to keep our naming convention, this function can be called
Efield_copier and the inner one copy_Efield (this later name can also be used in the pipe elements namings)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds good, I've changed that.

@@ -429,37 +190,16 @@ def esmeralda(files_in, file_out, compression, event_range, print_mod,

threshold_and_correct_hits_high = fl.map(hits_threshold_and_corrector(threshold_charge=cor_hits_params['threshold_charge_high'], **cor_hits_params_),
args = 'hits',
out = 'cor_high_th_hits')
out = 'hits')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use
item = 'hits'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, I didn't know about that option.

@ausonandres ausonandres force-pushed the new_tracking_city branch 2 times, most recently from 00adf17 to 1052d36 Compare July 20, 2021 15:20
@ausonandres ausonandres force-pushed the new_tracking_city branch 3 times, most recently from 4785e15 to 3e8742e Compare July 20, 2021 15:57
Copy link
Collaborator

@mmkekic mmkekic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a new city, Isaura, that applies paolina tracking functions to the output of Beersheba. The functions that are shared between Esmeralda and Isaura are extracted into a single pipe, hence no code is copied. Esmeralda tests pass as before, and new tests are added for Isaura output.
This city is a temporary step until we decide on final reconstruction + analysis order (as discussed in Issue #749).

@MiryamMV MiryamMV merged commit d38e1e3 into next-exp:master Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants