[xla:cpu] Optimize ThunkExecutor::Execute part #2 #15567
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[xla:cpu] Optimize ThunkExecutor::Execute part #2
Use std::aligned_storage_t trick to avoid default-initializing Node struct on a hot path.
name old cpu/op new cpu/op delta
BM_SelectAndScatterF32/128/process_time 791µs ± 4% 720µs ± 2% -8.93%
BM_SelectAndScatterF32/256/process_time 3.20ms ± 4% 2.96ms ± 2% -7.46%
BM_SelectAndScatterF32/512/process_time 13.7ms ± 5% 12.8ms ± 2% -6.80%
name old time/op new time/op delta
BM_SelectAndScatterF32/128/process_time 790µs ± 5% 719µs ± 1% -9.00%
BM_SelectAndScatterF32/256/process_time 3.20ms ± 3% 2.96ms ± 1% -7.58%
BM_SelectAndScatterF32/512/process_time 13.2ms ± 4% 12.3ms ± 1% -6.82%