-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hypathia: City to bypass diomira #646
Conversation
0bc6f7c
to
916d729
Compare
invisible_cities/cities/hypathia.py
Outdated
from . components import WfType | ||
from . components import wf_from_files | ||
|
||
from . irene import check_empty_pmap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely components which are used by more than one city, belong in components
rather than any single city?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I have pushed a new commit moving those functions to components
.
invisible_cities/cities/hypathia.py
Outdated
from .. database import load_db | ||
|
||
from .. reco import tbl_functions as tbl | ||
from .. reco import peak_functions as pkf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor misalignment of peak_functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in the last commit with some other missalignments in the imports.
0c1c6e4
to
e0787e5
Compare
A naive question, or rather observation: The whole point of dataflow is to allow us to build up bigger components by composing and reusing smaller ones, in order to avoid code duplication. The title of this PR makes me uncomfortable, as bypassing some component shouldn't require a whole new city to be written. I have neither the time nor the domain expertise to analyse this in detail myself, but comparing this new city to Irene, I find the amount of copypasta quite disturbing. The (loooong) pipeline in Hypathia is identical to Irene, with the following exceptions
I would hope that if
I appreciate that there will be some details which make the last hope not entirely trivial to realize, but surely there must be a reasonable way of achieving this, that can a) reduce the copypasting, In short, there's too much copy'n'paste code reuse creeping into these cities! |
Actually, I see the components of the flow as the basic units and the "cities" as any arbitrary composition of those components. What the flow allows us is to reuse components indeed, but we can choose (and indeed this is what we are doing in this case) to link them in close-but-not equivalent applications. Indeed, I believe that the relations (or lack of) between the cities are somewhat irrelevant. What it is important is to understand what the city does, and those are clearly described by the components inserted into the flux. I agree that components that are boilerplate can be abstracted further, but this is a different story and a different PR. Hypathia runs always from pure MC, as specified by the source, while IRENE can run from MC or from data. The cities are fundamentally different and the naive algebraic relation quoted above is not a good description of their relation. Certainly I don't believe we can implement Irene (which runs in data or in MC) as a composition of Diomira and Hypathia, (both of tem MC cities). In fact, what Hypathia does is to provide a shortcut to produce high-level-reconstructed objects (PMAPS) using fast simulation (thus skipping DIOMIRA) and skipping part of the reconstruction in IRENE (which is not necessary because the fast simulation skips the convolution of waveforms, thus no need to deconvolute them). One could choose to write small cities, for example:
Then one could have DIOMIRA to write RWF and pass the 2 cities above to produce PMAPS But at the cost of writing intermediate structures to disk which is neither efficient not particularly eligthing, I think. And for sure, IRENE and DIOMIRA are working, tested and stable. I see no need to modify their fluxes at this point. |
Composition can and should happen on many different levels. It is a mistake to believe that there are just two levels: composable components and their compositions. Those compositions should themselves be composable, giving rise to a multi-layered hierarchy. This is kinda fundamental to good program design, and the functional style which you admire is especially conducive to making this sort of thing work well. Do look out for opportunities to take advantage of it.
I agree completely. Nonetheless, it is an important point that you should take advantage of the easy coposability that dataflow affords, and that you should aim to minimize copy'n'paste code reuse. Those concepts are fundamental to managing complexity and writing maintainable software.
Then don't write the intermediate structures to disk! Write larger components which can be plugged together to make even larger components. Please appreciate that pipes are themselves pipe components. You don't have to make a whole city of each and every composition, you can provide it as a reusable, higher-level component.
This is an important point, and I totally agree. But do note that the copy'n'paste duplication that this PR proposes is an increase in technical debt, and you should aim to pay it back sometime in the not too distant future. The fact that they are tested means that it will be easy to verify whether any code reuse efforts have been successul or not. One major point of having an extensive test suite is to allow us to refactor fearlessly.
That's fine, and I didn't really expect it to be possible in exactly those terms. But there are chains which are repeated in both Hypathia and Irene, and, rather than copy-pasting them without giving it further thought, you should think about why these sequences appear in both, and whether you can identify a higher-level concept in that commonality and whether/how to abstract it out and reuse it without copy-paste. (There is also a point to be made about giving your software components cute names to which you have personal emotional attachment: should good technical reasons come along for ripping Irene's guts out or claving her in twain, your emotional attachment to the cute name is likey to cloud your dispassionate technical judgement.) But, please, leave the copy-paste in for now ... that way maybe in a few months time I can, once again, have the immense pleasure of reducing the IC code base by a few thousand lines, while increasing its quality and clarity :-) |
Why, o why Hypathia??? Ipazia sounds much better to me ;-). |
I think that is an important point. I have refactored the code related to pmap building and writing, so now both |
empty_pmaps.filter, | ||
fl.branch(write_pmap)) | ||
|
||
# remove sipm_rwf_to_cal if it is not set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compute_pmaps = pipe(*filter(None, fn_list))
?
- Filter without a predicate uses the boolean value of the elements themselves. I find the signal to noise ratio in the genexpr version rather low.
this_name = fn(this_name)
is a style that you should (mildly) avoid, as the reader might see the first binding ofthis _name
and not notice the rebinding later on. It's not vitally important, but it does help a bit, for reasons similar to those for which we prefer to avoid mutation if sanely possible.- There is absolutely no point in creating an intermediate list out of the lazy object (regardless of whether that lazy object is a generator expression (as you had) or a filter (as I am suggesting): any iterable (that includes everything that is lazy) can be used with the arg-expanding
*
.
The comment you wrote is vitally important. If you switch to using filter
(and even if you don't), I suggest you change it to filter out simp_rwf_to_cal if it is not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grrrr! GitHub meta-question: is there a way of explicitly selecting the lines that will be shown above these comments? I never remember what GitHub is going to choose, and whatever it does't it clearly doesn't agree with my natural expectations, so, most of the time, the code it shows is not the code I'm talking about. I'm talking about the code on lines 571-571 plus a few.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compute_pmaps = pipe(*filter(None, fn_list))
?
- Filter without a predicate uses the boolean value of the elements themselves. I find the signal to noise ratio in the genexpr version rather low.
You are totally right, I have already updated the code with that.
this_name = fn(this_name)
is a style that you should (mildly) avoid, as the reader might see the first binding ofthis _name
and not notice the rebinding later on. It's not vitally important, but it does help a bit, for reasons similar to those for which we prefer to avoid mutation if sanely possible.
Make sense too, not a problem with the new version.
- There is absolutely no point in creating an intermediate list out of the lazy object (regardless of whether that lazy object is a generator expression (as you had) or a filter (as I am suggesting): any iterable (that includes everything that is lazy) can be used with the arg-expanding
*
.
You are right, I was using list
before because I had a line with print(fn_list)
to check the None
was being removed.
The comment you wrote is vitally important. If you switch to using
filter
(and even if you don't), I suggest you change it to filter out simp_rwf_to_cal if it is not set.
Updated too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grrrr! GitHub meta-question: is there a way of explicitly selecting the lines that will be shown above these comments? I never remember what GitHub is going to choose, and whatever it does't it clearly doesn't agree with my natural expectations, so, most of the time, the code it shows is not the code I'm talking about. I'm talking about the code on lines 571-571 plus a few.
This I don't know, I'm not familiar either with that part of github's interface...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using
list
before because I had a line withprint(fn_list)
to check theNone
was being removed.
Yes, this single-use aspect of lazy things is a bit annoying, especially for interactivity and debugging.
b624797
to
4ac4859
Compare
fe6ddda
to
6cab7ed
Compare
c1bbf4a
to
1e9bd4d
Compare
I have just rebased and solved all the conflicts in this old branch. Tests should pass, travis is still running. Hypathia is the city I have used for my NEXT100 studies. It reads waveform files, add the effect of single pe PMT resolution, SiPM PDFs and compute the pmaps. If all tests pass this is ready for review and merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally good and certainly a great tool to have so we can advance with simulations of NEXT100 without entering into too much detail for the electronics for which we won't have definitive numbers for some time.
Just a couple of questions and a few cosmetics too. There are quite a few very long lines (in IC in general) that might be nice to tidy up a bit too.
simulate_pmt = fl.map(partial(sf.charge_fluctuation, single_pe_rms=pmt_pe_rms), | ||
args = "rwf", | ||
out = "ccwfs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the charge resolution is definitely the major thing to simulate here, I'm a bit worried that for the S1s that we maybe need a bit of electronic noise. Since they don't tend to be seen in all PMTs and the current search method sums all PMTs before looking there might be some possibility, at least at lower energies, to get fake ones or miss a real one. We don't read the Pedestal RMS from the mysql at the moment (and the values are a bit out of date anyway) but maybe it's worthwhile to at least simulate a nominal Gaussian noise? @jjgomezcadenas, @gonzaponte ? The more complex electronics simulation can be left for when we hace more info and want to run diomira and irene instead of hypathia
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I can not discuss that, maybe @jjgomezcadenas or @gonzaponte can. In any case, it might be added in a later version of Hypathia.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran a quick couple of tests on a couple of events from the simulations we've been using to test the new setup etc.
I just added a Gaussian noise with a sigma of ~1 ADC equivalent (1/20 pe) and another test with ~0.7 ADC equivalent and it does make a difference to the width and signal in the resultant S1s and S2s. The summed pmt signal in the S1 varies by up to ~30% and in the S2 by up to ~1.5%. Though I only ran a few events so it could be bad luck with the seeding.
It's maybe not hugely important but it's relatively easy to implement. We can maybe leave it for another PR though. I'll hold off on approving to see if there are any opinions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simulating a Gaussian noise seems reasonable and easy to do. But if you prefer it can be done in a separate PR, maybe even reading the pedestal rms from the db,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to read from the database we'd need changes in load_db and to add the info to the DB. Since we don't have info on what the noise will be in NEXT100 other than that it's likely to be very similar to what we have in NEW. Maybe it's easier just to decide a representative number and put it in the hypathia configuration. The last time we explicitly measured it the mean over the 11 active PMTs at the time was 0.73 ADC. Maybe we could just add something along those lines.
That being said, the current status is more or less what @jmbenlloch used for his study + cosmetics. Maybe there's merit in accepting the PR as is and then getting someone to do a PR adding in a simple noise simulation. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to add the electronic noise in a different PR. Anyway, we should discuss (maybe through the email I sent a couple of days ago?) if we want to include it in the current studies of NEXT-100 or leave it for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds a new city to the code which can be used to process MC files directly to selected signal (PMaps) with a simplified electronics simulation useful for analyses in the absence of detailed information about the PMT electronics.
One issue remains that we've decided to leave for a subsequent PR.
Approved for merging
This PR adds a new city that allows us to bypass diomira. It takes MCRD from detsim and directly output pmaps. To do that this are the task performed:
We are skipping all the electronics-related part. So far is a very simplistic simulation, in the future we may include too the sensors calibrations.
You can see a notebook with some plots here: https://github.com/jmbenlloch/KrCalibNB/blob/xe0nu/MC/hypathia_demo.ipynb