-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index tracks by TrackSlotId
#678
Index tracks by TrackSlotId
#678
Conversation
…o `TrackSlotId`
TrackSlotId
TrackSlotId
@esseivaju I wonder if a good "stopping point" for this PR is to simply allocate and use the (unsorted) track slot IDs? That might make it easier to tag and experiment with the performance differences between "track ID is implicitly track slot ID", "track ID requires a single (new) indirection", "track IDs are randomized", "track IDs are sorted by alive/not", etc. Or do you think we should do all of those experiments on a single branch and then only merge once we see a consistent performance boost? |
I think we can merge that once we use the unsorted track slot IDs as you suggest as I would say the |
src/corecel/OpaqueId.hh
Outdated
//! Multiply an opaque ID by a factor | ||
template<class V, class S> | ||
inline CELER_FUNCTION OpaqueId<V, S> | ||
operator*(OpaqueId<V, S> id, std::make_signed_t<S> factor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the semantics here make sense: the best analog to OpaqueId
is probably a pointer, so it makes sense to subtract and add and increment and operator[]
, but not to multiply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if the semantics are pointers for OpaqueId
then these operations don't make sense. If the semantics are more specific - an index in a collection - then it could be justified, e.g. to divide work on a range into chunks for parallel processing, though still not mandatory.
The reason for having these is that the collection CoreStateData::track_slots
is an array of OpaqueId
and thrust::sequence
is doing arithmetic with the values to fill the array as its expecting numbers:
// Compile error if we don't have the * and + operators
// /opt/cuda/11.8.0/include/thrust/system/detail/generic/sequence.inl(65): error: no operator "+" matches these operands
// operand types are: const celeritas::TrackSlotId + celeritas::OpaqueId<celeritas::TrackSlot, celeritas::size_type>
// sequence.inl:61-66
__thrust_exec_check_disable__
__host__ __device__
T operator()(std::size_t i) const
{
return init + step * i;
}
So if we don't want operator*(OpaqueId, <signed_numeric>)
and operator+(OpaqueId, OpaqueId)
I suppose we can use 989007d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see. And I understand the problem now, and it looks like the thrust specification of sequence is incomplete.
implicitly requires addition of another of the instance and multiplication of the index by a "step".
I'll think about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution could be to have CoreStateData::track_slots
defined as Collection<TrackSlotId::size_type, W, M, ThreadId>
, do the sorting/filling on numbers and we wrap it in a TrackSlotId
when returning from CoreTrackView::track_slot_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation of this idea in 0fc32cc. We could also remove the pre/post-increment operators but I'd say it still make sense to keep them if we don't care about OpaqueId being immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I like your idea and implementation. It might even be necessary when we try to use more complex algorithms (sorting). Let's stick with that.
I was thinking as an alternative to just use a kernel to set TrackSlotId
independently for each thread, but yours is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I guess another way to see is that the sorting is now done via an iterator
type which is distinct from the value type track_slots
.
The collection of ThreadId to TrackSlotId maps ThreadId to TrackSlotId::size_type instead. When returning a track slot id, the CoreTrackView will wrap the index into a TrackSlotId. Advantage of this variant is that we no longer need operator+(OpaqueId, OpaqueId) and operator*(OpaqueId, int) which don't make sense for the pointer semantics of OpaqueId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! With this in, we should be able to start experimenting 😄 Good job!
I added some last-minute formatting. So should we close that PR and open a new one for starting with experimentation? |
@esseivaju I've converted #675 into a checklist so we can track the multiple "sorting" aspects together. So let's just proceed from the existing issue :) |
As mentioned in #675, this is an initial implementation of indexing tracks by
TrackSlotId
instead ofThreadId
. Sorting should help adjacent threads working on similar tracks (alive, same particle type, energy,...).I added a
Collection
inCoreStateData
which does the thread --> track slot lookup. No sorting is done yet, only a 1-1 correspondence.