-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging MC files #715
Comments
How about adding a |
There may be cases (e.g. mixing of background events from different sources) in which not only the random seed but also other configuration parameters could be different. It may be simpler to copy all the configuration tables, labelling them with a subrun tag (or something similar). |
I think that this is something that needs to be solved in the medium term but I'd propose a staged approach since we really need to get PR #693 completed so that the integration of the detector simulation code isn't delayed too much. I think we need to come up with a more general solution (please keep suggesting here) but I'd propose a minimal protection in PR #693 so that the configuration info isn't confusing/clashing (still not that simple, really) and that we deal with it in a more complete way in the PRs related with event splitting etc. I think that the favoured production paradigm needs to remain what we've used up til now in the short term -- processing single files per job. A patch could be adding file number/name in the param_keys somewhere in the configuration information and adding a check for overlap in the merging of other MC tables. Thoughts? |
This would handle well the event splitting in detsim, and events from the same MC production (which have by construction different event ids) could be processed with no issue as part of the same run. A possible problem would be the event mixing from different MC productions, e.g. what was being done for the mixing of different background sources (@msorel, @paolafer: are we going to continue doing this?). Possible solutions include:
|
We will continue mixing MC events, yes. In case it is relevant for this discussion: until now we have mixed events at the nexus level, and went through al other processing steps only for the mixed files and not for the single-source files. We want to change this, allowing for mixing files at different stages of processing. We have not decided yet if mixing post-irene, post-esmeralda or what. Concerning your possible ways to tackle this, Justo. Renumbering events would be fine (and could be done outside IC, as right now still) if information were not dropped from one processing step to other, but only added, so that effectively you would never need to go back to a previous processing step, as you have all information available at the end. But this is not the case: we drop information. If we renumber events, we cannot relate events in different processing steps anymore. So at first thought I would vote against option 2. Option 1 should work: some sort of script that runs over nexus output directories, and raises a flag if an event_id is repeated? Option 3 too, I guess, but sounds like it requires some more gymnastics? Beyond these three, perhaps there are more elegant ways to deal with this. |
This is what we're doing now: we're doing the gymnastic of not having repeated event IDs across the full background production, is that right, @msorel ? About the |
I was thinking about the implementation necessary for the long event splitting and I kept hitting mental or physical blocks. The idea to put a sub-event number made sense but I hit a problem when I got to the output of irene where the pmaps are indexed according to event number. Without adding quite a lot of complexity I couldn't think of a way to get the output non-repeating. The only other, suboptimal, idea I had was to make the event number a float (either for pmaps and above or in general) and make the sub-event be the first decimal. |
What should Irene do: merge subevents into a single event or store each subevent separately?
If Irene merges subevents into a single one this complexity should go away, I think.
I also thought of that, it is not terrible, but I agree it is not optimal. I don't know if we can also encounter precision problems... |
No, nexus simulates all the activity coming from, for example, a muon and records the times that energy was deposited or sensors recorded photons. In detsim we want to be able to recognise events which would be two or more triggers in the detector and split accordingly into subevents. These subevents need to be treated as independent entities by the processing as in data that would be the case. We come into difficulties with indexing though. |
I started to have a look at a version of IC that could read events with (evt_number, subevt_number). It's a bit fiddly but it might be an ok starting point. Have a look if you can: https://github.com/andLaing/IC/tree/new-run-table |
#722 [author: andLaing] Adds a new io option to `rwf_io` which uses the basic `rwf_writer` and other table writers to write all event info in one step. In this way long MC events can be split into multiple trigger-like events in the output file and the event numbers can be logged and mapped accordingly. Some issues remain for the logging portion that are under debate. Addresses point 2 of issue #691 [reviewer: jmalbos] This PR adds a new writer (and the corresponding test) to `rwf_io` than can handle the splitting of long (MC) events into several subevents. A new table (`MCEventMap`) is used to associate the new subevents to the original event. Nevertheless, this new writer has limited use until a decision is taken regarding #715.
Hi everyone. I recently came back to thinking about this issue as I'm starting to hit some walls (semi)related to this in the analysis of cosmogenic backgrounds. The attempt I made to solve the problem (in previous comment) involved a lot of changes to IC and was quite fiddly. I thought about some possible alternatives, it'd be good to have some comments on them or other suggestions which could be better. The two possible alternatives I came up with yesterday were:
I currently favour option 2 but that could just be that it's newer and it should cause less upstream issues. Please comment @mmkekic , @paolafer , @jmalbos , @gonzaponte , @jjgomezcadenas and all. |
Hi @andLaing , without having thought too hard on it, looks also to me that 2 is better, so that code downstream is untouched. Basically anything that falls outside the event time window defined in detsim/bufferization gets a new event id. I am not sure I understand what you mean by "structured as 000", though. Can you explain? |
Sorry, the example didn't render, I've fixed it in the original comment. |
#751 [author: andLaing] Adds functions to generate unique event numbers for MC to allow for safe processing of split nexus events and simplify MC event mixing. In need of more tests and a file number reader but ready for discussion. Discussion continues from issue #715 [reviewer: mmkekic] This PR adds a event splitting functionality keeping unique nexus event numbers across files by assuming a constant maximum number of splits per event. A new table that maps IC event number to original MC event number is added to Run group ensuring the code is compatible with the old formats of MC production. The code is documented and tested, good job!
Dealing with #693 we run into a problem of merging several nexus files information of the configuration table. Seems that it makes sense to have a configuration information (such as Geometry/Physics used to generate files) unique for all concatenated files, however, random_seed is a per-file information that needs to be saved and we are not sure what is the best way to save it. Maybe the best option is to have another table that will match event numbers and random seed?
This issue is somewhat related to
future version of detsim https://github.com/nextic/detsim.git is able to split long nexus events, making nexus event_id repeating, meaning we need a mapping between detsim_event_id and nexus_event_id (maybe random_seed can go into this table?). There is also important issue of assigning detsim_event_id in a way to ensure that it is unique per run.
in general how to deal with merging files with repeated event_id. This issue is certainly present when mixing several MC productions and we still dont have a good solution for it.
The text was updated successfully, but these errors were encountered: