Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement digi mixing for tracker #1245

Merged
merged 13 commits into from
May 24, 2024
Merged

Conversation

edcallaghan
Copy link
Contributor

@edcallaghan edcallaghan commented Apr 26, 2024

This PR implements a first-pass at overlaying simulated detector signatures onto pileup taken from the real detector, i.e. "digi mixing," specifically for the tracker. This is achieved in three steps: 1) mix digital data products into art events, analogously to how physics-level data products are mixed in conventional event mixing; 2) concatenate these preexisting digis with new digis created by simulated particles; 3) resolve situations where preexisting and newly-created digis "collide" or are incompatible in some way. For the case of the tracker, this translates to reducing a set of digis such that the digitization windows, here defined by the channel dead time, pairwise overlap --- for example, a simulated electron produces a digi which cannot exist, because at the the time of the relevant energy deposition that straw had already been read out and was still dead. This generic logic can, in principle, be templated, but due to the details of how different subsystems may partition information across multiple data products, that kind of implementation can become pretty opaque, so we've opted to implement the latter two steps "manually," i.e. in the digital simulation of the tracker, which provides a reference implementation that can guide implementations for subsystems, without needing to decode a template interface.

New modules:
- ProtonBunchTimeMCFromProtonBunchTime: this foreshadows the need for a consistent "time-zero" in both data and simulation, and converts that present in data to its equivalent representation in simulation.
- MixDigis: an analogue to MixBackgroundFrames which performs the data product mixing. A new class is needed to accommodate differences in auxiliary behavior, e.g. the number of events to read from the secondary stream and the handling of EventIDs.

Changes to existing modules:
- Mu2eProductMixer: add explicit provisions for mixing digital data products, analogous to physics-level data products.
- StrawDigisFromStrawGasSteps: the structure/logic of digi creation remains unchanged. before digi creation, preexisting digital data products are optionally read into a higher-scope container. after digi creation, the new digis are placed into the same container, which is responsible for resolving collision situations as described above. after this process, the resulting digis are stored in the event. If preexising data products are read, the StrawElectronics conditions are queried using the EventIDs associated with those data products, to avoid situations where we'd simulate a detector response different from that present in the data.

Changes to existing data products:
- StrawDigiMC: a "validity" data member is defined, which is necessary to preserve the existing flow of StrawDigiMC in situations where the event contains digis which are simulated (and thus have an accompanying StrawDigiMC), come from data (and thus have no accompanying MC information), or are the result of resolving a collision between simulation and data (and hence the accompanying MC information may be partially correlated with the digital information). I think this acts as an ok flag of such a situation, but I welcome better ideas for actually handling the situation.

New axuiliary functionality:
- StrawDigiBundle class: this is a wrapper for the three tracker digi data products: StrawDigi, StrawDigiADCWaveform, StrawDigiMC. Its use 1) prevents triplicating loops, and 2) keeps the waveform information in sync with the scalar information, which (as described below) will be necessary for improving the resolution of collisions.
- StrawDigiBundleCollection class: stl-vector of the above, with an additional method for resolving collisions. The implementation in this PR is very crude, in that the new digi returned as representative of a set of overlapping digis is just the earliest in time. Future work should refine on this, e.g. by reconstructing analog signals from the waveforms, and resimulating the digitization of the summed signal.
- compare_tdcs function: helper function to allow stl-sorting StrawDigis by tdc value.
- TimeBasedBucket class: an stl-vector which is aware that its elements define a time-window, and thus knows whether a candidate element's time-window overlaps with those of its members. Right now, this is all implemented using doubles, the cast to which from unsigned longs used for tdc values makes me nervous, so input on that would be great. Maybe the numeric type can be templated. The item type is templated to facilitate the use of this same class in testing/diagnostic code, e.g. to benchmark the rate of these overlaps at the physical-simulation stage. I think the terminology "time-based bucket" isn't the best, and welcome input from those with a better vocabulary.
- TimeBasedBuckets class: an stl-vector of the above, which encapsulates the logic of "if an item doesn't belong in a preexisting bucket, it defines a new bucket."

For context, testing using TimeBasedBuckets as implemented here and a 400 ns deadtime window (roughly correct for StrawDigis) at the physical-simulation stage, the rate for a conversion electron to deposit energy in a digitization-window associated with 2BB-mixed MDC2020 pileup is roughly 1%.

  - extend Mu2eProductMixer to include digital data products
  - implement module for exclusively mixing in such products
  - propagate those products through to simulated StrawDigi creation
  - define structure for resolving situations where simulated digis overlap with preexisting digis in time (for now, just choosing the earliest)
@FNALbuild
Copy link
Collaborator

Hi @edcallaghan,
You have proposed changes to files in these packages:

  • MCDataProducts
  • CommonMC
  • EventMixing
  • TrackerMC

which require these tests: build.

@Mu2e/fnalbuild-users, @Mu2e/write have access to CI actions on main.

⌛ The following tests have been triggered for d31627e: build (Build queue has 2 jobs)

About FNALbuild. Code review on Mu2e/Offline.

@FNALbuild
Copy link
Collaborator

☀️ The build tests passed at d31627e.

Test Result Details
test with Command did not list any other PRs to include
merge Merged d31627e at 6efcc08
build (prof) Log file. Build time: 11 min 23 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
FIXME, TODO 🔶 TODO (0) FIXME (7) in 12 files
clang-tidy 🔶 2 errors 640 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at d31627e after being merged into the base branch at 6efcc08.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

Copy link
Collaborator

@rlcee rlcee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took so long for this review - I am chronically missing review requests

  • thank you for the thorough explanation in the PR - I think we could use this as an example for everyone!
  • thank you for using stl pointers
  • the proditions access looks OK, but you will be breaking ground, please double-check your results during testing
  • did you test that the new code can read old StrawDigiMCs - the schema evolution should be automatic

MCDataProducts/inc/StrawDigiMC.hh Outdated Show resolved Hide resolved
TrackerMC/inc/StrawDigiBundleCollection.hh Outdated Show resolved Hide resolved
TrackerMC/inc/StrawDigiBundleCollection.hh Outdated Show resolved Hide resolved
TrackerMC/inc/TimeBasedBucket.hh Outdated Show resolved Hide resolved
EventMixing/src/MixDigis_module.cc Outdated Show resolved Hide resolved
TrackerMC/src/StrawDigiBundle.cc Outdated Show resolved Hide resolved
TrackerMC/inc/StrawDigiBundle.hh Outdated Show resolved Hide resolved
@edcallaghan
Copy link
Contributor Author

Thanks for the feedback @rlcee. I am off for a week but will work through your comments; I'm also going to schedule a presentation explaining this PR at an upcoming Sim/Reco meeting (for the 8th or 22nd, TBD), so there is no rush if there are other aspects which you're thinking about in more detail.

@FNALbuild
Copy link
Collaborator

📝 The HEAD of main has changed to 64af1fc. Tests are now out of date.

Copy link
Collaborator

@brownd1978 brownd1978 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important development and the code and structure are very good. Since it is mostly additions unrelated to existing workflows I suggest merging it soon, and starting the parallel development for the other subsystems.

// interface for sorting into buckets of overlapping digitization windows
const double time() const;
protected:
const StrawDigi digi;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this payload is deep-copy. Does the content come from the event? If so, (const) reference payload would avoid copies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two reasons that the members are values instead of references. The first is that, to use references, the underlying types (specifically the waveforms, IIRC) would need non-trivial copy constructors implemented and some of the loops need to be restructured to deal with the limitations of constructing objects with reference members. Those are hurdles which are both completely surmountable and, IMO, annoying.

The second reason is that, while right now all of the digis do come "from an event," this will not be true in the future: when we implement "real" collision resolution, e.g. by redigitizing a reconstructed analog sum, then this collection will be responsible for transferring entirely new data products into the scope of the simulation module, and hence should be deep-copied. But as @rlcee pointed out above, the getters can at least return references.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking ahead, will some of these data members be purely input and others output? If so, I suggest holding inputs by const& and then outputs must be by value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will all be used bidirectionally. That could change if, in the future, the collision resolution is factored out of StrawDigisFromStrawGasSteps and into some standalone module.

@@ -32,16 +32,21 @@ namespace mu2e {
typedef std::array<SGSP,StrawEnd::nends> SGSPA;
typedef std::array<XYZVectorF,StrawEnd::nends> PA;
typedef std::array<float,StrawEnd::nends> FA;
enum Validity {Valid, PartiallyValid, Invalid};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming this enum 'Provenance', with states "sim", "mixed", "external"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use an EnumToString so that we have access to printed values?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@kutschke kutschke May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brownd1978 why "external" and not "data"?

@edcallaghan There will be two ways to make pure sim digis. One is our present way and the other is to make background frames of digis from sim and then to overlay signal sims. Will both be of Provenance "sim" (in Dave's proposed language?). If so, should we distinguish them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the "most correct" answer depends on what the conflict resolution looks like for the MC truth data products, which I haven't through through in at a detailed level. If there are no caveats to "merging" the MC truth info for the two constituent sim digis, then labeling the single output digi as "Sim" is fine. But if there is some kind of catch, then we probably want to label is as "Sim/Sim" as opposed to e.g. "Sim" or "Sim/Data".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but also, this is not limited to just two digis input. It won't occur at scale, but in principle arbitrarily many digis can participate. So I'd err on the side of Dave's suggestion of renaming the three situations (purely new MC, mixed new MC with something preexisting, and purely something preexisting). In the case of the second situation, the onus is then on the user to know what it is that they have mixed (preexisting simulation, or real data), which I think is a reasonable assumption if they are looking at MC truth info.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should distinguish 3 different kinds of mixed: (sim+sim), (sim+data) and (data+data). Maybe this is not the right place to record that information - but we do need a plan for a robust audit trail so that an end user can determine what it is programatically and not by looking at a wiki page or a log book.

An example of data+data would be if we took at track from Offspill and embedded it in a random on-spill trigger as part of a study to understand reconstruction efficiency.

Copy link
Contributor Author

@edcallaghan edcallaghan May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that that information (i.e. the mixing constituents/identities) should, for official production, be stored somewhere. I do not have a strong opinion on whether it should be part of the data structure or not. But I do not think that this class is the correct place for that book-keeping. For one thing, if someone is looking at a StrawDigiMC, then they are either going down the wrong path to begin with, or they are aware that they have run a simulation, and just need to know if the MC truth available gives the full story of the associated digi.

Also, as a breadcrumb for any future readers, because the actual "merging" of data products is done within the StrawDigisFromStrawGasSteps simulation module, this PR cannot create a single collection of digis from a "data + data" situation. Though, as Rob sas, we may want to accommodate that for future use cases --- in which case, the implementation will need to be developed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brownd1978 @kutschke The latest push exchanges Validity::{Valid, PartiallyValid, Invalid} for Provenance::{unknown, Simulation, Mixed, External}, which is implemented as an EnumToStringSparse specialization. The existence of unknown is forced by that interface. In the latter case, the onus is on the user to understand that External may either signify that real data or preexisting MC (with or without MC info available yet undetermined, as that will depend on how the collision resolution ends up being performed). But the use case is really just to flag that one should not expect the MC truth object to actually represent MC truth information, as the digi came from an alternate source.

As to resolving the ambiguity of whether it was data or simulation which was provided externally, my current thinking is as follows: the EventID of the mixed event is required to be propagated as a data product so that the conditions can be matched. As long as we don't drop that data product from the output, this would satisfy the book-keeping (i.e., the run number of the mixed event tells us an analyst if it came from data or not).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ed. We have converged as well as we can for now. Your suggestion for documenting that data was the external might well work - we can work through the details when the time comes.

@brownd1978
Copy link
Collaborator

brownd1978 commented May 22, 2024 via email

@edcallaghan
Copy link
Contributor Author

@rlcee and @kutschke, I have addressed each of your comments (thank you again). Right now the only remaining loose end is the nature of flagging the interpretability of MC truth info, i.e. whether a given digi was produced by simulation or originated externally. Once @brownd1978 and @kutschke are satisfied with that, I will test deserializing "old" StrawDigiMCs into the new schema as suggested by Ray.

@brownd1978 brownd1978 self-assigned this May 23, 2024
Copy link
Collaborator

@brownd1978 brownd1978 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with these updates, thanks Ed!

Copy link
Collaborator

@kutschke kutschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready to be merged. The follow on work can be done in future PRs.

@kutschke
Copy link
Collaborator

@FNALbuild run build test

@kutschke
Copy link
Collaborator

For future reference: I am 99% sure that if you had added to classes_def.xml (maybe it has to be before the line for StrawDigi), then it would have worked without the need to pull Provenance out of the scope of StrawDigi.

@FNALbuild
Copy link
Collaborator

☔ The build tests failed for 024d34f.

Test Result Details
test with Command did not list any other PRs to include
merge Merged 024d34f at a8f0771
build (prof) Log file. Build time: 10 min 58 sec
ceSimReco Log file. Return Code 1.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file. Return Code 1.
cosmicOffSpill Log file. Return Code 1.
ceSteps Log file.
ceDigi Log file. Return Code 1.
muDauSteps Log file.
ceMix Log file. Return Code 1.
rootOverlaps Log file.
g4surfaceCheck Log file.
FIXME, TODO 🔶 TODO (0) FIXME (7) in 12 files
clang-tidy 🔶 4 errors 644 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at 024d34f after being merged into the base branch at a8f0771.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@edcallaghan
Copy link
Contributor Author

For future reference: I am 99% sure that if you had added to classes_def.xml (maybe it has to be before the line for StrawDigi), then it would have worked without the need to pull Provenance out of the scope of StrawDigi.

Yeah, you're probably right, but the only example I was able to grep out had the higher-level scope so I figured the odds would be in my favor to close out in a single cycle. :) Oh well, I would forgotten the underlying type anyway...

@edcallaghan
Copy link
Contributor Author

@FNALbuild run build test

@FNALbuild
Copy link
Collaborator

⌛ The following tests have been triggered for 923ae76: build (Build queue has 1 jobs)

@FNALbuild
Copy link
Collaborator

☀️ The build tests passed at 923ae76.

Test Result Details
test with Command did not list any other PRs to include
merge Merged 923ae76 at a8f0771
build (prof) Log file. Build time: 11 min 08 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
FIXME, TODO 🔶 TODO (0) FIXME (7) in 12 files
clang-tidy 🔶 4 errors 644 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at 923ae76 after being merged into the base branch at a8f0771.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@edcallaghan
Copy link
Contributor Author

@rlcee I've verified that we can deserialize preexisting StrawDigiMCs, and their provenance (in the new language) is by default set as Simulation, consistent with there having been no other digis involved when they were created. Let me know if you'd like anything else addressed.

// interface for sorting into buckets of overlapping digitization windows
const double time() const;
protected:
const StrawDigi _digi;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused why the data members of StrawDigiBundle are const. It suggests that a deep copy isn't necessary. Could they be const&? That would avoid copying data.

The reason I think const& doesn't work is because you allow a null MC later on. Null objects are generally to be avoided.
For now this is OK, but I would like you to revisit the memory model at some point to see if these copies could be avoided, perhaps by having 2 classes, StrawDigiBundle and StrawDigiBundleMC

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important issue to resolve soon - but it can be in a future PR. I will merge now.

For future reference. There is an idea called the "Facade Pattern", discussed in the classic "Design Patterns" book, https://www.oreilly.com/library/view/design-patterns-elements/0201633612/ . Also discussed in the old art wiki: https://cdcvs.fnal.gov/redmine/projects/art/wiki/Data_Product_Design_Guide under "prefer facade classes to transient data members". This class does more than a simple facade but it does share some ideas. Have a look and see if any of that advice is useful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind there are two conversations here, one being the presence / necessity of dummy StrawDigiMCs, and the other being the necessity of value members as opposed to references. If we go down a road that involves pointers, the two can intersect. I've opened issue #1272 so that this doesn't drop off radar, which contains the details of what constraints led to the current implementation. Further engagement would be appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants