Reduce runtime of Go Encode() by another 25% #649
Merged
+75
−41
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is (another) performance optimization change for the Go implementation of Encode(), which reduces the runtime by about 25%. This is the additional performance improvement I mentioned in #566.
There are two main changes here to accomplish this, either of which may be controversial as they do harm the code's readability. I have attempted to mitigate this with refactorings, and verified the extracted functions are inlined to avoid the additional overhead of function calls.
The first change is that all loops are unrolled. Given that all loops are unconditionally executed a constant number of times, this can be done without introducing any additional branches to the code.
The second change is that the precision rounding logic is modified to require zero divisions. This new method is significantly faster, but may change the final digits of codes requiring sub-centimeter precision (I don't believe this library has this as a supported use case?). There are no cases in the test suite in which this causes a difference, but if desired I can likely find such an edge case.
Before:
After:
CPU profile of before:
CPU profile of after: