Fix multithreading in Geant4 and HitManager #694

sethrj · 2023-03-21T22:18:34Z

This adds thread safety to the HitCollector and fixes #613 .

After this fix, here's a comparison of 18 GeV $\pi^+$ in the ATLAS TileCal model:

test/corecel/cont/InitializedValue.test.cc

sethrj · 2023-03-22T13:04:22Z

Sorry, I shouldn't have requested the review yet because this is still in draft and includes changes from #693 that don't need review here.

whokion · 2023-03-22T13:16:41Z

No problem. Let's me know when it is ready to review.

sethrj · 2023-03-24T17:02:15Z

@whokion This is ready to review whenever you have a moment. Specifically could you make sure that the way I'm using G4Threading is compatible with what you know about tasking in CMSSW? Given my cursory look (CMSSW changes the thread-local ID in geant to values in [0, N)) it should be compatible. Thanks!

src/accel/LocalTransporter.cc

src/accel/LocalTransporter.hh

whokion · 2023-03-24T16:39:39Z

src/accel/detail/HitManager.cc

+/*!
+ * Ensure the local hit processor exists, and return it.
+ */
+HitProcessor& HitManager::get_local_hit_processor()


If HitManager is thread local, HitProcess can be private to HitManager (related to the early comment) instead of vector<HitProcess*>. Why is this the better approach?

All the major event loop components are "global". Only the incoming state objects are "thread-local" ¹, and normally their stream IDs should be used to access per-stream state data that (for now) has to live in the shared object. Here, instead of using G4Threading, I could (and probably should) add the StreamId to the StepState so we can access it directly...

Footnotes

I put global and thread-local in quotes above because nothing in Celeritas inherently requires streams and threads to match. Streams could be shared within a thread pool, or you could have multiple streams per CPU thread, etc. Similarly you could have two entirely different event loops running simultaneously within the same application. ↩

src/accel/detail/HitProcessor.cc

whokion · 2023-03-24T17:07:27Z

src/accel/detail/HitProcessor.hh

- * action manager.
+ * generating hit objects. It \b must be thread-local because the sensitive
+ * detectors it stores are thread-local, and additionally Geant4 makes
+ * assumptions about object allocations that cause crashes if the HitProcessor


A worker has its own event loops. So, this is not an assumption as the Geant4 MT is an event-level multithreading, but a consequence.

The "assumption" is that objects like G4Navigator, G4Step, and G4Track can only be allocated by a worker thread, and that they're deallocated by the same worker thread. Those objects aren't necessarily associated with a specific event (in the G4 fast simulation some of those are reused across multiple events), nor are they inherently associated with a specific thread (except by the invisible implementation of the thread allocator).

whokion · 2023-03-24T17:13:46Z

src/accel/detail/HitProcessor.cc

@@ -230,26 +273,21 @@ void HitProcessor::operator()(DetectorStepOutput const& out) const

        if (navi_)
        {
-            CELER_ASSERT(out.detector[i] < detector_volumes_.size());
+            CELER_ASSERT(out.detector[i] < detectors_.size());
            bool success = this->update_touchable(


This seems very expensive. Do we really need this check?

The touchable update is necessary for detectors like tilecal and hgcal that use the detailed navigation state to determine a subdetector identifier. Since their sensitive detectors require the navigator to be initialized and correct, we don't really have a choice if we're calling back to those routines. My hope is that for our initial implementation this won't completely kill the performance, and doing it this way will give us a "baseline" performance number to justify the effort of implementing the actual detector logic on GPU.

I still do not understand why HitProcess is responsible for updating the navigation state (touchable) as a hit is processed at the end of stepping (only once per each step) - isn't the subdetector a physical/logical volume (so can be a tracking volume with a boundary)? Isn't it an independent readout channel? I guess that there may be a confusion between what simulation information needs to be known (for MC "study") and what need to be recorded as hit (to be used for digitization).

I might be misunderstanding what the confusion is too, so let me try to restate from a high level what's going on.

At the end of an event on a local thread, the LocalTransporter steps through all the offloaded EM tracks until they die.

Each step calls the (shared) explicit actions using the shared "params" state and the (thread-local) track states.

The celeritas::StepCollector gathers the position and volume ID at the pre-step and the energy deposition at post-step (though the exact selection of outputs can be changed by the user).

At post-step, the (shared) HitManager takes the thread-local gathered step data, and calls the thread-local HitProcessor.

The HitProcessor loops through all the hits, translates them into a thread-local G4Step objects, uses the VecGeom logical volume ID to look up the G4VSensitiveDetector, and calls sd->Hit(step). If the SD needs to query hit->GetPreStepPoint()->GetTouchableHandle() then before calling Hit we use a thread-local (i.e., owned by HitProcessor) navigator to update the thread-local touchable for the thread-local step.

When I said "subdetector" I meant "readout channel": I guess HGCal is the subdetector, and the codeneeds the navigation state and volume information to figure out what channel it's in.

whokion · 2023-03-25T14:45:04Z

Can you add a code snippet how to access the collected hit information and retrieve some information (for an example, the total energy deposition or the size of hit collection from at the end of EventAction (or after each Flush?) into the demo-geant-integration? For CMS integration, we need to access the hit (StepInfo) collection from GPU at least in the event level to merge hits from CPU, and send them to the next stage of the simulation pipeline (i.e., digitization) or write them out to the disk in an event unit.

sethrj · 2023-03-25T16:00:18Z

@whokion For our initial CMS implementation, we're not going to merge the hits on the GPU; we're using the HitProcessor to call back to the existing sensitive detectors. That way we don't have to reimplement the complex hit processing in CMSSW, and because we can call to the other SDs the same way, we can track EM particles everywhere rather than in a particular region. The existing demo-geant-integration does this too: the sensitive detector is getting hits from the GPU via the HitProcessor.

I am working on implementing a GPU-based calorimeter which will actually do the accumulation on GPU—is this what you're asking for?

whokion · 2023-03-25T17:13:25Z

I am working on implementing a GPU-based calorimeter which will actually do the accumulation on GPU—is this what you're asking for?

No, the current implementation/workflow is good enough for a demonstration/integration. What I am asking is to pipeline the output from GPU (to create a collection of equivalent G4Steps which will be merged through CPU sensitive detectors into the final hit data) into the demo-geant-integration workflow once the transporter complete stepping for the set of EM tracks on GPU (i.e. after Flush is called) - at least, an example how to access the output hit information from GPU in EventAction or TrackingAction will be good enough.

sethrj · 2023-03-25T17:30:17Z

But this is already happening automatically through the HitManager during the stepping loop. The current demo app will have Steps sent back from GPU.

whokion · 2023-03-26T01:08:02Z

Then, I definitely missed something. So, are we calling demo_geant::SensitiveDetector::ProcessHits(G4Step* step, G4TouchableHistory*) with converted Steps from GPU? In demo-geant-intergration, where is the place receiving output of HitManager::selection() and converting StepSelection::points to G4Steps, and processing them with SensitiveDetector::ProcessHits inside a loop over the number of hits from GPU?

sethrj · 2023-03-26T13:45:52Z

We are. There are a lot of moving pieces, so I'll draw up a diagram of who's constructing what and how the tracks and hits move between CPU and GPU 😄

sethrj · 2023-03-27T13:49:25Z

@whokion Here's my attempt at a UML diagram, with background colors indicating the different libraries/layers of code, and a few classes that operate on device in green.

The user tracking action (local) sends hits via G4Track to the LocalTransporter and the user event action (local) flushes it, which initiates the ActionSequence (stepping loop), managed by the Stepper (local). The ActionSequence is just a topologically sorted graph of ExplicitActionInterface (shared), each of which usually represents a single kernel.

During the stepping loop (which uses the stream-local CoreStateData and shared CoreParamsData) the StepGatherAction (shared) fills up a StepStateData (local) with track properties like position, logical volume, etc. At the end of the step, all of the StepInterface classes (shared) originally registered with the StepCollector (shared) are called with the StepStateData. The HitManager (shared) is a StepInterface class that has stream-local HitProcessor instances. The HitProcessor copies the GPU hits to CPU DetectorStepOutput (local) hits, converts each hit into a G4Step, and calls the associated (local) G4VSensitiveDetector.

amandalund

Nice! This looks good to me @sethrj.

whokion · 2023-03-27T17:29:12Z

The user tracking action (local) sends hits via G4Track to the LocalTransporter and the user event action (local) flushes it, which initiates the ActionSequence (stepping loop), managed by the Stepper (local).

Technically, the tracking action actually flushes (through LocalTransporter::Push(G4Track), then ::Flush() and the event action flushes the remaining of tracks at the end of event.

Nevertheless, I am still not fully convinced why HitManager is globally shared as the Stepper/LocalTransporter are local even though HitManager behaviors like something similar to Geant4 Split class mechanism, but find nothing wrong either - may be a design choice. Maybe we may learn more how to directly use the HitManager/HitProcessor for the CMS integration as necessary.

Anyway, the diagram helps a lot and enlightens inter-connectivity and workflow with specific ownership/relations. Thanks for the nice work.

sethrj · 2023-03-27T17:55:09Z

OK @whokion a minor update that I think improves even further:

For clarity I switched StepCollector and StepStateData and fixed the ownership relationship between HitProcessor and DetectorsStepOutput. The dotted region are the classes that are shared across threads/tasks/streams because they operate only on the problem setup/parameter data rather than any state information. The HitManager is-a StepInterface derived class that is managed by the (shared) step gather action, so it has to be shared. The HitProcessor operates on the state data so it has to be stream-local.

Due to the way Geant4 currently manages memory, we have to make sure that our streams correspond to geant4 threads. There's a hidden "references" link here between the LocalTransporter and the HitManager which is necessary to ensure that the hit processor's temporary data is deallocated by the thread-local LocalTransporter, made necessary by the weird Geant4 allocation, which is part of the weirdness in the design that you may have noticed.

sethrj added bug Something isn't working external Dependencies and framework-oriented features labels Mar 21, 2023

sethrj requested a review from whokion March 21, 2023 22:18

whokion reviewed Mar 22, 2023

View reviewed changes

test/corecel/cont/InitializedValue.test.cc Show resolved Hide resolved

sethrj mentioned this pull request Mar 22, 2023

Support multithreaded Celeritas using single GPU #418

Closed

4 tasks

sethrj added 4 commits March 24, 2023 08:13

Make StepCollector thread-safe

40a81a7

Create thread-local hit processors

f289b77

Add debug log messages to sensitive detectors

ed37321

Extend InitializedValue to do finalization

f09ebbd

sethrj force-pushed the stream-hitmanager branch from 6bbccf4 to cff8463 Compare March 24, 2023 12:13

sethrj requested a review from whokion March 24, 2023 12:16

sethrj marked this pull request as ready for review March 24, 2023 12:16

Use InitializedValue to correctly finalize local data

0a0731d

sethrj force-pushed the stream-hitmanager branch from cff8463 to 0a0731d Compare March 24, 2023 12:17

sethrj requested a review from amandalund March 24, 2023 14:18

whokion reviewed Mar 24, 2023

View reviewed changes

sethrj requested a review from whokion March 27, 2023 17:26

amandalund approved these changes Mar 27, 2023

View reviewed changes

sethrj merged commit a0e5a90 into celeritas-project:develop Mar 27, 2023

sethrj deleted the stream-hitmanager branch March 27, 2023 17:28

sethrj added performance Changes for performance optimization and removed performance Changes for performance optimization labels Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multithreading in Geant4 and HitManager #694

Fix multithreading in Geant4 and HitManager #694

sethrj commented Mar 21, 2023 •

edited

Loading

sethrj commented Mar 22, 2023

whokion commented Mar 22, 2023

sethrj commented Mar 24, 2023

whokion Mar 24, 2023

sethrj Mar 24, 2023

whokion Mar 24, 2023

sethrj Mar 24, 2023

whokion Mar 24, 2023

sethrj Mar 24, 2023

whokion Mar 25, 2023

sethrj Mar 25, 2023

whokion commented Mar 25, 2023

sethrj commented Mar 25, 2023

whokion commented Mar 25, 2023

sethrj commented Mar 25, 2023

whokion commented Mar 26, 2023

sethrj commented Mar 26, 2023

sethrj commented Mar 27, 2023

amandalund left a comment

whokion commented Mar 27, 2023

sethrj commented Mar 27, 2023

Fix multithreading in Geant4 and HitManager #694

Fix multithreading in Geant4 and HitManager #694

Conversation

sethrj commented Mar 21, 2023 • edited Loading

sethrj commented Mar 22, 2023

whokion commented Mar 22, 2023

sethrj commented Mar 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whokion commented Mar 25, 2023

sethrj commented Mar 25, 2023

whokion commented Mar 25, 2023

sethrj commented Mar 25, 2023

whokion commented Mar 26, 2023

sethrj commented Mar 26, 2023

sethrj commented Mar 27, 2023

amandalund left a comment

Choose a reason for hiding this comment

whokion commented Mar 27, 2023

sethrj commented Mar 27, 2023

sethrj commented Mar 21, 2023 •

edited

Loading