-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Baseline interpreter by using padded code #315
Conversation
1085606
to
915e7b5
Compare
Codecov Report
@@ Coverage Diff @@
## master #315 +/- ##
=======================================
Coverage 99.78% 99.78%
=======================================
Files 29 29
Lines 4108 4112 +4
=======================================
+ Hits 4099 4103 +4
Misses 9 9
Flags with carried forward coverage won't be shown. Click here to find out more.
|
915e7b5
to
fbf2d80
Compare
Haswell 4.4 GHz, clang-12
|
AMD Zen3, GCC-9
|
AMD Zen3, clang-12
|
lib/evmone/baseline.cpp
Outdated
uint8_t buffer[Len]; | ||
// This cannot overflow code buffer because code ends with valid STOP instruction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here it's not relevant that it's a STOP?
// This cannot overflow code buffer because code ends with valid STOP instruction. | |
// This cannot overflow code buffer because code is padded with 0s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original comment is more correct, as the loop is looking for STOP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the comment here is supposed to explain why memcpy
cannot overflow.
lib/evmone/baseline.cpp
Outdated
auto pc = code; | ||
while (pc != code_end) | ||
while (true) // Guaranteed to terminate because code must end with STOP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while (true) // Guaranteed to terminate because code must end with STOP. | |
while (true) // Guaranteed to terminate because padded code ends with STOP. |
ea36730
to
76a3fb9
Compare
lib/evmone/baseline.cpp
Outdated
return CodeAnalysis{std::move(map)}; | ||
|
||
// i is the needed code size including the last push data (can be bigger than code_size). | ||
std::unique_ptr<uint8_t[]> padded_code{new uint8_t[i + 1]}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_unique
would zero-initialize, as opposed to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe add a comment that this leaves it unitialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a random side-thought: but with a static +33 bytes always allocated it potentially could be optimized to do a single allocation in analyze
for both JumpdestMap
and padded code together.
Guaranteed to terminate because padded code ends with STOP.
This cannot overflow code buffer because code ends with valid STOP instruction.
76a3fb9
to
6538e6a
Compare
Yes, this was suppose to be TODO, so I have added it now. This was my original plan, but there are additional complications:
|
During code analysis the code is copied and padded with 33 zero bytes. This guarantees that push data is always available in the code buffer and the code ends with STOP. This allows optimizing the interpreter loop and PUSH instructions.
Performance gains are up to ~5%.
This is also needed to enable other optimization: implementing interpreter with "computed goto" or "tail calls".