Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethinking post-penthesilea production #749

Closed
mmkekic opened this issue Oct 27, 2020 · 12 comments
Closed

Rethinking post-penthesilea production #749

mmkekic opened this issue Oct 27, 2020 · 12 comments

Comments

@mmkekic
Copy link
Collaborator

mmkekic commented Oct 27, 2020

Since Beersheba hits are being used for track reconstructions, we need official code that runs paolina algorithm on Beersheba output. The code is already in Esmeralda and this is a matter of organizing different pipes.
As a reminder current algorithm is:

  • Esmeralda : input uncorrected Penthesilea hits; output corrected naive hits (2 charge thresholds); tracks-related variables
  • Beersheba : input corrected Esmeralda hits, output deconvoluted hits

The options that appeared at the meeting:

  1. Add tracking part to Beersheba; leave Esmeralda as is. This gives us both naive reconstruction and tracks as well as deconvoluted hits and tracks

  2. Add energy correction to Beersheba, remove it from Esmeralda and run Esmeralda after Beersheba. Beersheba may or may not output naive hits reconstruction. This is more modular since the tracking is separated from hits reconstruction.

  3. Have three independent cities : hits reconstruction (Beersheba), energy correction (??), tracking (Esmeralda)

@pnovella @msorel @paolafer @ausonandres @Aretno @jahernando @jjgomezcadenas @gonzaponte

@mmkekic
Copy link
Collaborator Author

mmkekic commented Oct 27, 2020

I removed the poll since it was confusing, so please comment

@msorel
Copy link
Collaborator

msorel commented Oct 27, 2020

Hi, as I said in the meeting, 1 probably sufficient in short term, as we are still applying xyz energy corrections using pre-deconvolution hits. Up to upcoming bb2nu first paper I guess we will do this.

For long term, beyond this paper, I think we should compute xyz corrections to energy after hit deconvolution. I think it can actually improve our energy resolution, as we correct the energy response using a much more faithful image of the event. This is just a feeling, and will have to be demonstrated with MC / data studies. In this scheme we will want to do options 2 or 3 in the long term, I think. If one does not want deconvolution (probably non-baseline case), option 2 could perhaps have a flag to do energy correction on pre-Beersheba hits, too, and not just on post-Beersheba ones.

So, to me it boils down if we want to formalise a solution for short term or for long term.

I have a slight preference for 2 (or 3, maybe), as option 1 may have a kind of short livetime, and we will want to change it in a couple of months...

@paolafer
Copy link
Collaborator

Given the experience with Esmeralda, I'm a bit reluctant to formalise code while still testing it; I'm afraid that we can end up doing a lot of work and then stopping using it shortly after. Maybe the best option would be to make the code as modular as possible, for instance, extracting from Esmeralda the hit correction and tracking functions, in such a way that they can be used anywhere else, and leave the city almost as a pipeline only. Beersheba could then add to its pipeline these functions and both cities would be usable. Does this approach make any sense in the city structure?

@Aretno
Copy link
Collaborator

Aretno commented Oct 28, 2020

Hi,

first, I think Pau's point about being able to have a fast reconstruction pipeline is really strong. As he pointed out, deconvolution is going to be considerably slower than the current scheme and, if we limit the energy corrections to that point, it will prevent us to check runs as instantaneously as before. This will be important not only for day-to-day operation but also in NEXT100 commissioning.

I'm also reluctant to just forget about the hits correction at all; deconvolution performance could change in different detectors configurations and I think having the corrected hits info could always serve as a guidance in case we see weird things at some point. But this could be solved by just doing both corrections, penthesilea hits and beersheba hits, and storing both tables.

In any case, I strongly support Paola in not formalizing code until we are completely sure about it thus, in any case, a throughout study of the energy resolution post-beersheba should be done before changing the corrections part.

Finally, I also dislike the idea of linking beersheba and paolina by default. As you know, I have a hard time believing that paolina approach is the best way to take advantage of the fine-grained tracks from deconvolution and would rather have them separate. Moreover, the lower we go in voxel size in paolina, the more time it takes to run and, whenever we need to optimize parameters of one or the other, we would be obligated to run both of them which, depending on the case, can be a waste of time.

So, in summary, I would go with Paola's suggestion, and have three different cities (corrections, beersheba, esmeralda). Corrections city should be able to read from either penthesilea or beersheba, beersheba should be able to read from penthesilea and corrections city, esmeralda should be able to read from corrections and beersheba.

@pnovella
Copy link

pnovella commented Oct 28, 2020 via email

@mmkekic
Copy link
Collaborator Author

mmkekic commented Oct 28, 2020

Ok, so the decision is that for now we leave Esmerala input and output as is, and make tracking city (urgent) and energy corrections pipe (not urgent).

As a first (and the simplest) step we only need tracking city since Beersheba is using Esmeralda output for now. I believe all readers already exist, and the part of tracking can simply be reused from Esmeralda. Hence, a volunteer (@ausonandres ?) should extract part of the pipe from Esmeralda that will be reused (together with appropriate sinks) and place it in components.py, and have a very simple tracking city flow. This part should be very easy and fast, is just placing the already existing code to a different position.

The hits energy corrections are more annoying since the current Esmeralda functions use event_model type (as do paolina functions), but Beersheba uses dataframes directly. In case we want to do energy correction -> deconvolution -> paolina, with the current functions, it would mean casting from pytables (DataFrame basically) to HitCollection, back to DataFrame to be used in deconvolution, and back to HitCollection to be used in paolina. The type casting is extremely time and memory consuming and this flow makes no sense. I would suggest to make hits energy correction (used in Esmeralda) accept both DataFrame or our custom HitCollection type (adding appropriate tests to make sure it is all consistent). Again, this should not be difficult, but it does require some code to be written.

In any case I think we need 3 independent pipes : deconvolution, energy correction and tracking, making sure that the data format passed in between them is consistent. I am not convinced that we need 3 independent cities, probably having a very-easy-to-combine pipes (really, is just calling 1-2 functions inside a city flow) is good enough till we decide on definite reconstruction scheme and what intermediate data format we want to store.

@jacg
Copy link
Collaborator

jacg commented Oct 28, 2020

Hence, a volunteer (@ausonandres ?) should extract part of the pipe from Esmeralda that will be reused (together with appropriate sinks) and place it in components.py, and have a very simple tracking city flow. This part should be very easy and fast, is just placing the already existing code to a different position.

Music to my ears.

Carry on.

@jjgomezcadenas
Copy link
Collaborator

jjgomezcadenas commented Oct 28, 2020 via email

@ausonandres
Copy link
Collaborator

Sorry I didn't say anything, but I already started doing this.
There is one issue that I'd like to comment: in my opinion this tracking city after Beersheba should output kdst information as well, since it's interesting to have access to some variables, particularly nS2 (it's used for the analysis), however Beersheba's output doesn't contain this table. For the time being, should we just leave it as it is and in the final part of the analysis -after running all cities-, joining the kdst information to the tracks table manually? And then in the future change Beersheba to also save the table?

@Aretno
Copy link
Collaborator

Aretno commented Oct 30, 2020

I actually agree, my bad for not putting it since the beginning. We should probably add it now as it would just be adding one line or two to the code (basically import the writer from esmeralda and add it to the pipeline). If agreed by everyone I'll proceed and do this on monday.

In case this is not done; don't you have that kind (or at least equivalent) info on the Summary table? That is currently being stored.

@ausonandres
Copy link
Collaborator

Not now, in summary table it is basically contained the position of the different events and a few more magnitudes event-related (energy, number of tracks, number of hits and charge), but not number of S2 signals.

MiryamMV pushed a commit that referenced this issue Jul 22, 2021
#755

[author: ausonandres]

According to issue #749, the aim of this PR is to add a city (to be
run just after Beersheba, for the moment), that computes all the
tracking information. Since the tracking-related functions will be
shared between Esmeralda and this new city (so-called Isaura (credits
to @jjgomezcadenas)), they will be moved to other
places (`components.py`, in most of the cases).

[reviewer: mmkekic ]

This PR adds a new city, Isaura, that applies paolina tracking
functions to the output of Beersheba. The functions that are shared
between Esmeralda and Isaura are extracted into a single pipe, hence
no code is copied. Esmeralda tests pass as before, and new tests are
added for Isaura output.
This city is a temporary step until we decide on final reconstruction
+ analysis order (as discussed in Issue #749).
@gonzaponte
Copy link
Collaborator

This is now IC v2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants