-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small stack optimization #265
Conversation
Codecov Report
@@ Coverage Diff @@
## master #265 +/- ##
=======================================
Coverage 98.84% 98.85%
=======================================
Files 42 42
Lines 12133 12184 +51
=======================================
+ Hits 11993 12044 +51
Misses 140 140 |
89341e8
to
461f6d6
Compare
Rebased (and hopefully correctly solved conflicts). |
461f6d6
to
ceb71dc
Compare
uint64_t m_small_storage[small_storage_size]; | ||
|
||
/// The unbounded storage for items. | ||
std::unique_ptr<uint64_t[]> m_large_storage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be std::array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unique_ptr<std::array>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I put comment on the wrong line, sorry, I meant it for the stack storage above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guessed that later.
We only take the pointer of m_small_storage
once in constructor. So I don't think it is even worth to include <array>
header as we are not using any of std::array
features.
The mul benchmark starts being annoying... |
ceb71dc
to
ef515a5
Compare
Though in absolute numbers it is a change from 25 to 26, I'd risk saying it is within the measuring error range. It does give quite a huge benefit to the large benchmarks (ecpairing). |
They are statistically significant (pvalue is 0, the lowest possible). |
I have run
|
ef515a5
to
6780f69
Compare
Where do you see the pvalue? |
@@ -58,32 +58,54 @@ class Stack : public std::vector<T> | |||
|
|||
class OperandStack | |||
{ | |||
/// The size of the pre-allocated internal storage: 128 bytes. | |||
static constexpr auto small_storage_size = 128 / sizeof(uint64_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to fine-tune this, e.g. benchmark it with 64, 256, 512? Or should we postpone that until we have a better set of inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tuned it to current benchmark set - the minimal value that fits all benchmarks (and unit tests by coincidence) in small storage.
std::unique_ptr<uint64_t[]> m_storage; | ||
/// The bottom of the stack. Set in the constructor and never modified. | ||
/// | ||
/// TODO: This pointer is rarely used and may be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just added it. There is some experimentation needed for it, with probably no performance implications. But it is worth just for sanity and "simpler" design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but would be nice to have answers to those two questions.
Requires #247
This implements "small vector" / "small string" optimization to
OperandStack
. When the max stack height is smaller than static limit we omit dynamic allocation and use static storage withinOperandStack
.The max stack height is usually very small. E.g. for LLVM generated code it is usually 3 except calls (as they naturally requires the stack to have all call args).