Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Looking at CPU traces for microbenchmarks, I noticed a hotspot in memset (the flavor that uses AVX2 instructions) for the instruction that clears the very last double quadword at the end of an allocation context. Also, the buffer being cleared is not aligned on a 32-byte boundary. Two tiny changes address this: 1. adding additional padding at the start of regions align the allocation context for the microbenchmark cases. 2. increasing CLR_SIZE slightly ensure the end of an allocation context doesn't consistently fall on a page boundary. Change 1 makes sure we start with an aligned allocation context at the start of a region. Change 2 minimizes the number of movdqu instructions executed and makes sure we don't concistently hit a new page at the end of the memset range.
- Loading branch information