New tracking city #755

ausonandres · 2020-11-12T11:20:20Z

According to issue #749, the aim of this PR is to add a city (to be run just after Beersheba, for the moment), that computes all the tracking information. Since the tracking-related functions will be shared between Esmeralda and this new city (so-called Isaura (credits to @jjgomezcadenas)), they will be moved to other places (components.py, in most of the cases).

There are two issues that I faced while doing the city. The first one was already pointed out in #749, and it is related to the kdst information. It would be interesting that the output of this tracking city contains kdst information (nS2 variable is used for the posterior analysis), however current Beersheba version doesn't save this table. Therefore, for the time being, we can access to this information just opening the cdst files, although I believe that in the near futurue we should add this table in Beersheba and then in Isaura.
The second issue concerns the tracking algorithm. As you may know, it seems that Paolina, and particularly reco/paolina_functions/find_extrema() funcion is not deterministic. This is because in some events there are more than one possible extreme voxels for a given track, and it appear that in those cases they are chosen randomly. The solution that I propose (and add here, in order to check that the output was correct -event though maybe this is not the place to do it-) consists in save all the possibilities, and then choose the extreme where the sum of their energy takes the maximum value (in order to avoid, if possible, the usage of paolina_functions/drop_end_point_voxels() function). Nonetheless, there would be other solutions, but the choice of one or another will be a bit arbitrary at the end, in my opinion.

jacg · 2021-04-06T14:43:02Z

This PR is almost 5 months old, and has received no attention whatsoever. It relates to an issue (#749) which solicited the wisdom of: @pnovella @msorel @paolafer @ausonandres @Aretno @jahernando @jjgomezcadenas @gonzaponte

7 tests fail, most of them for fairly basic reasons. Please fix. Then rebase, so that the tests run on our new CI infrastructure (GitHub Actions).

Please don't leave PRs with red crosses (failing tests) in the queue

If the failures are spurious, re run the CI jobs.
If the failures are not spurious, fix the problems.
If you do not know how to fix the problem, ask for help.

A PR queue full of red crosses encourages people to ignore PRs.

ausonandres · 2021-04-06T15:23:48Z

Hi, sorry i's my fault... I was trying to solve the test failing issues, however some other urgent work that I had to attend to came up at that moment. After that, I completely forgot to come back to this PR. I'll do it as soon as I have some time.

jacg · 2021-04-06T15:32:31Z

When it is ready for review, please explicitly ask someone to review it, using the buttons/links near the top-right of the PR page. If it is not yet ready for review, please mark it as a draft, using the link at the bottom of the reviewers section: 'Still in progress? convert to draft'.

gondiaz · 2021-07-07T13:47:21Z

invisible_cities/cities/components.py

+            event_info = load_dst(path, 'Run', 'events')
+            for evt in dhits_df.event.unique():
+                timestamp = event_info[event_info.evt_number==evt].timestamp.values[0]
+                dhits = hits_from_df(dhits_df.loc[dhits_df.event==evt])


This is not correct since the DECO hits for a single event can contain multiple peaks. Since no peak information is included in the final track information output, we can not directly check if a multiple-track event comes from a real spliting or by a multiple-S2 event. This is not a relevant issue as long as commonly used 1S2 and 1-track cut are performed, but it is biasing the efficiencies.

This PR also modifies the search of track extremes in order to be deterministic but it does not adapt the tests, also affecting esmeralda tests and making the PR more difficult to review. I think this modifications should be a PR appart. I will do a different PR implementing the same functionality of this city, essentially running paolina algorithm over deconvoluted hits, without modifying the track extreme search.

Sorry, I don't understand completely your plan. If you want to add a new PR that exclusively adds the city to run paolina over deconvoluted hits, what would you do with this PR? just ignore it? would you use this just for solving the extreme issue? There is another thing to take into account, this PR apart from doing both things, also moves some functions from Esmeralda to components.py, would you also add that functionality in your new PR?

In my opinion I believe that the best approach could be removing all the extreme-related issue from this PR, and then add that new PR that you commented to take it into account. So the goal of this PR will keep being to add the tracking city.

Yes, as you might see in #798 I just did another PR since I didn't aim to undo your changes related with the extreme finder. I didn't move the functions still from esmeralda.py to ease the review, but I intend to do it once the PR is reviewed. Then the main difference with this PR is that I left the paolina functions unmodified, which in my opinion should be done, if ever, in a different PR. It also changes the source to solve the issue with multipeak events, such that those events are filtered.

mmkekic · 2021-07-15T10:02:00Z

invisible_cities/cities/components.py

@@ -589,6 +595,31 @@ def MC_hits_from_files(files_in : List[str], rate: float) -> Generator:
                       timestamp    = timestamp(evt))


+def dhits_from_files(paths: List[str]) -> Iterator[Dict[str,Union[HitCollection, pd.DataFrame, MCInfo, int, float]]]:


It seems to me that
hits_and_kdst_from_files, cdst_from_files and now a newly added dhits_from_files are all doing almost the same thing!
They all read hits (from different tables) and optionally kdst and summary_df, but do we really need 3 different readers? Cant we just have one reader with hits- table and group name as parameters and optionally read kdst and summary_df? @gonzaponte , @Aretno

Yes, all that needs some refactoring, but I would put it in another PR.
There is dst_from_files which is almost what you request, but there are some subtleties done in some of those readers that might not be easy to implement in a general reader.

mmkekic · 2021-07-15T10:04:50Z

invisible_cities/cities/components.py

+    return copy_energy_to_Ep_hit_attribute_
+
+
+types_dict_summary = OrderedDict({'event'     : np.int32  , 'evt_energy' : np.float64, 'evt_charge'    : np.float64,


This is a bit cumbersome, maybe is better to place types_dict_... in ic_types.py?

ok, I agree

mmkekic · 2021-07-15T17:49:56Z

invisible_cities/cities/components.py

+    write_summary         = fl.sink( summary_writer     (h5out=h5out)             , args="event_info"         )
+    write_topology_filter = fl.sink( event_filter_writer(h5out, "topology_select"), args=("event_number", "topology_passed"    ))
+
+    if hits_type:


isnt it better to have
write_hits_paolina be None or fl.branch(fl.sink..)) and then filter the fn_list for Nones?
pipe(*filter(None, fn_list)?

We could also put this writer definition in esmeralda and pass it as argument to this function, if we are sure not to use this in other cities?

ok, I'll try that. Regarding your second comment, my intention here was to generalize this part of the function in order to allow writing the deconvoluted hits if this tracking city is run before Esmeralda's hit correction (maybe in the future). But I'll change it if you want.

mmkekic · 2021-07-15T17:53:45Z

invisible_cities/cities/components.py

+    write_topology_filter = fl.sink( event_filter_writer(h5out, "topology_select"), args=("event_number", "topology_passed"    ))
+
+    if hits_type:
+        group_table = hits_type.split("/")[0]


hits_type is very misleading name, isnt it better to simply have 2 parameters directly? It would also be less error prone since you wont depend on the user passing string with this explicit pattern.

As we have discussed offline, it doesn't make a lot of sense the possibility to save the deconvoluted hits, therefore I remove the option for allowing to change the name of the table. Now there is just a flag called save_paolina_hits that will be True, in case the function is executed inside Esmeralda (saving the hits in the table CHITS/highTh) and it will correspond to False, if the function is run inside Isaura.

mmkekic · 2021-07-15T17:54:45Z

invisible_cities/cities/components.py

+                                     args="paolina_hits"                                              )
+    else: write_hits_paolina = fl.sink(lambda _: None)
+
+    fn_list = (create_extract_track_blob_info              ,


you should add the E or Ec to Ep copying here, it would simply need a hit_type as input and used in both cities.

you should also add hits filter (and filter writer) here since it is also shared between the cities

mmkekic · 2021-07-15T18:17:57Z

invisible_cities/io/hits_io.py

-                      Cluster(row.Q, xy(row.X, row.Y), xy(row.Xrms**2, row.Yrms**2),
-                              row.nsipm, row.Z, row.E,
-                              Qc = getattr(row, 'Qc', -1)),       # for backwards compatibility
+                      Cluster(getattr(row,'Q', row.E), xy(row.X, row.Y),


This is becoming a bit silly, we are missing half of the variables to construct our HitCollection type...
It is also not true in case of Xrms and Yrms squared - here you are setting them to 0.

I am not saying you should deal with all event model problems in this PR but maybe doing something a bit cleaner like

for i, row in df.iterrows(): Q = getattr(row,'Q', -1) if skip_NN and Q == NN: continue if hasattr(row, 'Xrms): Xrms = row.Xrms Xrms2 = Xrms**2 else: Xrms = Xrms2 = -1 if hasattr(row, 'Yrms): .... nsipm = getattr... Qc = ...

This is still very ugly but avoids calling getattr for parameters that are used more than once, and it also makes it a bit more clear where we are setting this default -1 value

Ok, I'll modify this to match your suggestion. It's completely true that this should be done just as a provisional solution: as you say, in the near future the event model must be revisited and changed in order to avoid these issues...

Ok, thanks, I've changed the function trying to be a bit clearer.
Yes, I obviously agree with you, it has no sense using this HitCollection here, however I'd say that that issue is not related with this PR (since it's meant to adapt the tracking algorithm to the current code), so it should be mandatory to revisit the event_modelin the near future.

mmkekic · 2021-07-19T11:02:53Z

invisible_cities/cities/components.py

@@ -1002,3 +1037,267 @@ def calculate_and_save_buffers(buffer_length    : float        ,
    # Filter out order_sensors if it is not set
    buffer_definition = pipe(*filter(None, find_signal_and_write_buffers))
    return buffer_definition
+
+def copy_E_or_Ec_to_Ep(energy_type: HitEnergy):


As @gonzaponte suggest, to keep our naming convention, this function can be called
Efield_copier and the inner one copy_Efield (this later name can also be used in the pipe elements namings)

It sounds good, I've changed that.

mmkekic · 2021-07-20T09:32:27Z

invisible_cities/cities/esmeralda.py

@@ -429,37 +190,16 @@ def esmeralda(files_in, file_out, compression, event_range, print_mod,

    threshold_and_correct_hits_high = fl.map(hits_threshold_and_corrector(threshold_charge=cor_hits_params['threshold_charge_high'], **cor_hits_params_),
                                             args = 'hits',
-                                             out  = 'cor_high_th_hits')
+                                             out  = 'hits')


you can use
item = 'hits'

Good to know, I didn't know about that option.

mmkekic

This PR adds a new city, Isaura, that applies paolina tracking functions to the output of Beersheba. The functions that are shared between Esmeralda and Isaura are extracted into a single pipe, hence no code is copied. Esmeralda tests pass as before, and new tests are added for Isaura output.
This city is a temporary step until we decide on final reconstruction + analysis order (as discussed in Issue #749).

ausonandres force-pushed the new_tracking_city branch 4 times, most recently from a4ae48d to 9a8c154 Compare November 16, 2020 15:02

ausonandres marked this pull request as draft April 6, 2021 18:37

ausonandres force-pushed the new_tracking_city branch 3 times, most recently from 49bea46 to be15740 Compare May 18, 2021 10:54

gondiaz force-pushed the new_tracking_city branch from be15740 to 5d98b10 Compare July 6, 2021 14:36

gondiaz reviewed Jul 7, 2021

View reviewed changes

ausonandres force-pushed the new_tracking_city branch 3 times, most recently from 08444ee to b70a7ba Compare July 14, 2021 14:04

mmkekic reviewed Jul 15, 2021

View reviewed changes

ausonandres force-pushed the new_tracking_city branch 3 times, most recently from 60c82f7 to 52a03e0 Compare July 16, 2021 07:25

Move shared functions from Esmeralda

f2b2d59

ausonandres force-pushed the new_tracking_city branch from 52a03e0 to 9ad4059 Compare July 16, 2021 08:16

mmkekic marked this pull request as ready for review July 16, 2021 13:49

ausonandres force-pushed the new_tracking_city branch 5 times, most recently from 86cd1f0 to dc00c6c Compare July 19, 2021 10:46

mmkekic reviewed Jul 19, 2021

View reviewed changes

ausonandres force-pushed the new_tracking_city branch from dc00c6c to c269a95 Compare July 20, 2021 05:54

mmkekic reviewed Jul 20, 2021

View reviewed changes

ausonandres force-pushed the new_tracking_city branch 2 times, most recently from 00adf17 to 1052d36 Compare July 20, 2021 15:20

Modify and rename copy_Ec_to_Ep function to be more general

ee9fd49

ausonandres force-pushed the new_tracking_city branch from 1052d36 to 7421845 Compare July 20, 2021 15:23

Adapt code to deal with deconvoluted hits

3146b30

ausonandres force-pushed the new_tracking_city branch 3 times, most recently from 4785e15 to 3e8742e Compare July 20, 2021 15:57

mmkekic approved these changes Jul 21, 2021

View reviewed changes

ausonandres added 3 commits July 21, 2021 23:07

Add isaura city and config files

77e2906

Add isaura tests files

88623c0

Adapt esmeralda and isaura to new track computation flow

7dd0a19

ausonandres force-pushed the new_tracking_city branch from 3e8742e to 7dd0a19 Compare July 21, 2021 21:10

MiryamMV merged commit d38e1e3 into next-exp:master Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New tracking city #755

New tracking city #755

ausonandres commented Nov 12, 2020

jacg commented Apr 6, 2021

ausonandres commented Apr 6, 2021

jacg commented Apr 6, 2021

gondiaz Jul 7, 2021

ausonandres Jul 7, 2021

gondiaz Jul 7, 2021

mmkekic Jul 15, 2021

gonzaponte Jul 15, 2021

mmkekic Jul 15, 2021

ausonandres Jul 16, 2021

mmkekic Jul 15, 2021

mmkekic Jul 15, 2021

ausonandres Jul 16, 2021

mmkekic Jul 15, 2021

ausonandres Jul 16, 2021

mmkekic Jul 15, 2021

mmkekic Jul 15, 2021

ausonandres Jul 16, 2021

ausonandres Jul 19, 2021

mmkekic Jul 19, 2021

ausonandres Jul 20, 2021

mmkekic Jul 20, 2021

ausonandres Jul 20, 2021

mmkekic left a comment

		@@ -589,6 +595,31 @@ def MC_hits_from_files(files_in : List[str], rate: float) -> Generator:
		timestamp = timestamp(evt))


		def dhits_from_files(paths: List[str]) -> Iterator[Dict[str,Union[HitCollection, pd.DataFrame, MCInfo, int, float]]]:

		return copy_energy_to_Ep_hit_attribute_


		types_dict_summary = OrderedDict({'event' : np.int32 , 'evt_energy' : np.float64, 'evt_charge' : np.float64,

New tracking city #755

New tracking city #755

Conversation

ausonandres commented Nov 12, 2020

jacg commented Apr 6, 2021

ausonandres commented Apr 6, 2021

jacg commented Apr 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmkekic left a comment

Choose a reason for hiding this comment