Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index tracks by TrackSlotId #678

Merged
merged 12 commits into from
Mar 21, 2023

Conversation

esseivaju
Copy link
Contributor

As mentioned in #675, this is an initial implementation of indexing tracks by TrackSlotId instead of ThreadId. Sorting should help adjacent threads working on similar tracks (alive, same particle type, energy,...).

I added a Collection in CoreStateData which does the thread --> track slot lookup. No sorting is done yet, only a 1-1 correspondence.

@esseivaju esseivaju self-assigned this Mar 10, 2023
@esseivaju esseivaju linked an issue Mar 10, 2023 that may be closed by this pull request
6 tasks
@sethrj sethrj changed the title Sorting and indexing tracks by TrackSlotId Indexing tracks by TrackSlotId Mar 16, 2023
@sethrj sethrj changed the title Indexing tracks by TrackSlotId Index tracks by TrackSlotId Mar 16, 2023
@sethrj
Copy link
Member

sethrj commented Mar 16, 2023

@esseivaju I wonder if a good "stopping point" for this PR is to simply allocate and use the (unsorted) track slot IDs? That might make it easier to tag and experiment with the performance differences between "track ID is implicitly track slot ID", "track ID requires a single (new) indirection", "track IDs are randomized", "track IDs are sorted by alive/not", etc. Or do you think we should do all of those experiments on a single branch and then only merge once we see a consistent performance boost?

@esseivaju
Copy link
Contributor Author

I think we can merge that once we use the unsorted track slot IDs as you suggest as I would say the TrackSlotId abstraction is good to have regardless of the outcome of performance gain.

@esseivaju esseivaju marked this pull request as ready for review March 16, 2023 21:35
@sethrj sethrj changed the base branch from develop to temp March 18, 2023 12:25
@sethrj sethrj changed the base branch from temp to develop March 18, 2023 12:25
//! Multiply an opaque ID by a factor
template<class V, class S>
inline CELER_FUNCTION OpaqueId<V, S>
operator*(OpaqueId<V, S> id, std::make_signed_t<S> factor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the semantics here make sense: the best analog to OpaqueId is probably a pointer, so it makes sense to subtract and add and increment and operator[], but not to multiply.

Copy link
Contributor Author

@esseivaju esseivaju Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that if the semantics are pointers for OpaqueId then these operations don't make sense. If the semantics are more specific - an index in a collection - then it could be justified, e.g. to divide work on a range into chunks for parallel processing, though still not mandatory.

The reason for having these is that the collection CoreStateData::track_slots is an array of OpaqueId and thrust::sequence is doing arithmetic with the values to fill the array as its expecting numbers:

// Compile error if we don't have the * and + operators
// /opt/cuda/11.8.0/include/thrust/system/detail/generic/sequence.inl(65): error: no operator "+" matches these operands
//            operand types are: const celeritas::TrackSlotId + celeritas::OpaqueId<celeritas::TrackSlot, celeritas::size_type>
// sequence.inl:61-66
  __thrust_exec_check_disable__
  __host__ __device__
  T operator()(std::size_t i) const
  {
    return init + step * i;
  }

So if we don't want operator*(OpaqueId, <signed_numeric>) and operator+(OpaqueId, OpaqueId) I suppose we can use 989007d

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. And I understand the problem now, and it looks like the thrust specification of sequence is incomplete.

https://github.com/NVIDIA/thrust/blob/949ac649afc37e9f92d946f267c2390c6ab3daa7/thrust/system/detail/generic/sequence.inl#L57-L69

implicitly requires addition of another of the instance and multiplication of the index by a "step".

I'll think about this.

Copy link
Contributor Author

@esseivaju esseivaju Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution could be to have CoreStateData::track_slots defined as Collection<TrackSlotId::size_type, W, M, ThreadId>, do the sorting/filling on numbers and we wrap it in a TrackSlotId when returning from CoreTrackView::track_slot_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation of this idea in 0fc32cc. We could also remove the pre/post-increment operators but I'd say it still make sense to keep them if we don't care about OpaqueId being immutable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I like your idea and implementation. It might even be necessary when we try to use more complex algorithms (sorting). Let's stick with that.

I was thinking as an alternative to just use a kernel to set TrackSlotId independently for each thread, but yours is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
I guess another way to see is that the sorting is now done via an iterator type which is distinct from the value type track_slots.

src/corecel/OpaqueId.hh Outdated Show resolved Hide resolved
src/corecel/OpaqueId.hh Show resolved Hide resolved
The collection of ThreadId to TrackSlotId maps ThreadId to TrackSlotId::size_type instead.
When returning a track slot id, the CoreTrackView will wrap the index into a TrackSlotId.

Advantage of this variant is that we no longer need operator+(OpaqueId, OpaqueId) and operator*(OpaqueId, int)
which don't make sense for the pointer semantics of OpaqueId
src/celeritas/global/CoreTrackData.hh Outdated Show resolved Hide resolved
src/corecel/OpaqueId.hh Show resolved Hide resolved
test/corecel/OpaqueId.test.cc Show resolved Hide resolved
Copy link
Member

@sethrj sethrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! With this in, we should be able to start experimenting 😄 Good job!

@esseivaju
Copy link
Contributor Author

I added some last-minute formatting. So should we close that PR and open a new one for starting with experimentation?

@sethrj sethrj added core Software engineering infrastructure minor Minor internal changes or fixes labels Mar 21, 2023
@sethrj
Copy link
Member

sethrj commented Mar 21, 2023

@esseivaju I've converted #675 into a checklist so we can track the multiple "sorting" aspects together. So let's just proceed from the existing issue :)

@sethrj sethrj merged commit 5e3c593 into celeritas-project:develop Mar 21, 2023
@esseivaju esseivaju deleted the thread-to-trackslot branch May 12, 2023 14:27
@sethrj sethrj added performance Changes for performance optimization and removed core Software engineering infrastructure labels Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor Minor internal changes or fixes performance Changes for performance optimization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants