[SYCLomatic] Bugfix in non-trivial run length encode's usage of oneDPL's reduce-by-segment #2596
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current non-trivial run length encode implementation is dependent on an internal implementation detail in oneDPL. The defined operator (named
op
) in thereduce_by_segment
call is dependent on segments being processed serially by a single work-item. In particular, theget<2>(lhs) += get<0>(rhs);
update is dependent onget<2>(rhs)
having no run length information that needs to be propagated through the reduction. When the segment is reduced serially, this is always the case.oneDPL's new reduce-by-segment performance improvements does not process segments serially but rather distributes work evenly throughout work items. Information regarding lengths of runs are lost in
get<2>(rhs)
when oneDPL's new sub-group scan is performed. To resolve this issue, the flag element in the tuple is changed from a bool to an integral type and is used to compute the length of the run instead of separating the flag from the run-length count. As a result, partial computations of the run-length inget<0>(rhs)
are propagated through the reduction. Additional logic is required when defining the mask to ensure that all elements of the run are flagged, and the padded end case is properly handled.Please note that this PR is dependent on uxlfoundation/oneDPL#1987 to compile.