Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement digi mixing for tracker #1245
Implement digi mixing for tracker #1245
Changes from 2 commits
13d5837
d31627e
eb14a95
aa8d27f
9b062d4
470204b
d09ec46
b9a81da
766b4e8
3e51ca9
024d34f
923ae76
b1574d6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest renaming this enum 'Provenance', with states "sim", "mixed", "external"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use an EnumToString so that we have access to printed values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of the EnumToString is:
https://github.com/Mu2e/Offline/blob/main/Mu2eUtilities/inc/TrackPatRecType.hh
https://github.com/Mu2e/Offline/blob/main/Mu2eUtilities/src/TrackPatRecType.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brownd1978 why "external" and not "data"?
@edcallaghan There will be two ways to make pure sim digis. One is our present way and the other is to make background frames of digis from sim and then to overlay signal sims. Will both be of Provenance "sim" (in Dave's proposed language?). If so, should we distinguish them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the "most correct" answer depends on what the conflict resolution looks like for the MC truth data products, which I haven't through through in at a detailed level. If there are no caveats to "merging" the MC truth info for the two constituent sim digis, then labeling the single output digi as "Sim" is fine. But if there is some kind of catch, then we probably want to label is as "Sim/Sim" as opposed to e.g. "Sim" or "Sim/Data".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but also, this is not limited to just two digis input. It won't occur at scale, but in principle arbitrarily many digis can participate. So I'd err on the side of Dave's suggestion of renaming the three situations (purely new MC, mixed new MC with something preexisting, and purely something preexisting). In the case of the second situation, the onus is then on the user to know what it is that they have mixed (preexisting simulation, or real data), which I think is a reasonable assumption if they are looking at MC truth info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should distinguish 3 different kinds of mixed: (sim+sim), (sim+data) and (data+data). Maybe this is not the right place to record that information - but we do need a plan for a robust audit trail so that an end user can determine what it is programatically and not by looking at a wiki page or a log book.
An example of data+data would be if we took at track from Offspill and embedded it in a random on-spill trigger as part of a study to understand reconstruction efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that that information (i.e. the mixing constituents/identities) should, for official production, be stored somewhere. I do not have a strong opinion on whether it should be part of the data structure or not. But I do not think that this class is the correct place for that book-keeping. For one thing, if someone is looking at a
StrawDigiMC
, then they are either going down the wrong path to begin with, or they are aware that they have run a simulation, and just need to know if the MC truth available gives the full story of the associated digi.Also, as a breadcrumb for any future readers, because the actual "merging" of data products is done within the
StrawDigisFromStrawGasSteps
simulation module, this PR cannot create a single collection of digis from a "data + data" situation. Though, as Rob sas, we may want to accommodate that for future use cases --- in which case, the implementation will need to be developed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brownd1978 @kutschke The latest push exchanges
Validity::{Valid, PartiallyValid, Invalid}
forProvenance::{unknown, Simulation, Mixed, External}
, which is implemented as anEnumToStringSparse
specialization. The existence ofunknown
is forced by that interface. In the latter case, the onus is on the user to understand thatExternal
may either signify that real data or preexisting MC (with or without MC info available yet undetermined, as that will depend on how the collision resolution ends up being performed). But the use case is really just to flag that one should not expect the MC truth object to actually represent MC truth information, as the digi came from an alternate source.As to resolving the ambiguity of whether it was data or simulation which was provided externally, my current thinking is as follows: the
EventID
of the mixed event is required to be propagated as a data product so that the conditions can be matched. As long as we don't drop that data product from the output, this would satisfy the book-keeping (i.e., the run number of the mixed event tells us an analyst if it came from data or not).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ed. We have converged as well as we can for now. Your suggestion for documenting that data was the external might well work - we can work through the details when the time comes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this payload is deep-copy. Does the content come from the event? If so, (const) reference payload would avoid copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two reasons that the members are values instead of references. The first is that, to use references, the underlying types (specifically the waveforms, IIRC) would need non-trivial copy constructors implemented and some of the loops need to be restructured to deal with the limitations of constructing objects with reference members. Those are hurdles which are both completely surmountable and, IMO, annoying.
The second reason is that, while right now all of the digis do come "from an event," this will not be true in the future: when we implement "real" collision resolution, e.g. by redigitizing a reconstructed analog sum, then this collection will be responsible for transferring entirely new data products into the scope of the simulation module, and hence should be deep-copied. But as @rlcee pointed out above, the getters can at least return references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking ahead, will some of these data members be purely input and others output? If so, I suggest holding inputs by const& and then outputs must be by value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They will all be used bidirectionally. That could change if, in the future, the collision resolution is factored out of
StrawDigisFromStrawGasSteps
and into some standalone module.