Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put instruction immediate values in the instruction table #154

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

chfast
Copy link
Member

@chfast chfast commented Sep 9, 2019

Requires #144, #153.

@axic
Copy link
Member

axic commented Sep 10, 2019

Needs a rebase?

@codecov-io
Copy link

codecov-io commented Sep 10, 2019

Codecov Report

Merging #154 into master will increase coverage by 4%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master     #154   +/-   ##
=======================================
+ Coverage   85.31%   89.31%   +4%     
=======================================
  Files          22       22           
  Lines        2261     2256    -5     
  Branches      219      219           
=======================================
+ Hits         1929     2015   +86     
+ Misses        305      214   -91     
  Partials       27       27

@gumb0
Copy link
Member

gumb0 commented Sep 11, 2019

Needs rebase

@chfast chfast force-pushed the instr_data2 branch 5 times, most recently from 842f8a5 to c66003a Compare September 12, 2019 14:22
@chfast
Copy link
Member Author

chfast commented Sep 12, 2019

Rebased. No performance gains. And vector allocation is super messy and annoying - slight change to the preallocation has huge effect on analysis performance.

Copy link
Member

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine overall, just more documentation would be great, and I'm not sure if dump is correct.
Also why bother with these complications, if it doesn't give performance gains?

@@ -37,16 +37,16 @@ code_analysis analyze(evmc_revision rev, const uint8_t* code, size_t code_size)

code_analysis analysis;

const auto max_instrs_size = code_size + 1;
analysis.instrs.reserve(max_instrs_size);
analysis.instrs.reserve(2 * (code_size + 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment why 2x ?


struct block_info
{
/// The total base gas cost of all instructions in the block.
/// This cannot overflow, see the static_assert() below.
int32_t gas_cost = 0;
int32_t gas_cost;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removing initializer?

union instr_argument
{
int number;
const uint8_t* data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this data member never used?

@@ -98,7 +96,8 @@ code_analysis analyze(evmc_revision rev, const uint8_t* code, size_t code_size)
// TODO: Consier the same endianness-specific loop as in ANY_LARGE_PUSH case.
while (code_pos < push_end && code_pos < code_end)
*insert_pos++ = *code_pos++;
instr.arg.small_push_value = load64be(value_bytes);

analysis.instrs.emplace_back().small_push_value = load64be(value_bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be more clear to emplace_back a placeholder right after emplacing fn on line 74, and here just assign, so that emplace_back calls are always in pairs. Also it would be less repetition.

@@ -180,13 +170,15 @@ struct op_table_entry

using op_table = std::array<op_table_entry, 256>;

struct instr_info
union instr_info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some comment for this union?

@@ -546,7 +546,7 @@ const instr_info* op_jumpi(const instr_info* instr, execution_state& state) noex

const instr_info* op_pc(const instr_info* instr, execution_state& state) noexcept
{
state.stack.push(instr->arg.number);
state.stack.push((++instr)->number);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a thought, maybe helpers like these could improve clarity

inline const instr_info& arg(const instr_info* instr) { return *(instr+1); }

inline const instr_info* next_instr(const instr_info* instr) { return instr += 2; }

@@ -28,7 +28,7 @@ void dump(const evmone::code_analysis& analysis)

if (c == OPX_BEGINBLOCK)
{
block = &instr.arg.block;
block = &instr.block;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get how this works without changing the way you iterate over analysis.instrs
(instr points here to the union containing fn, not block_info, right?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm not sure what c is here actually. Least significant byte of function pointer?

Put the immediate value of small pushes just after the instruction pointer in the program table.
Put the immediate value of the pointer to the large push data just after the instruction pointer in the program table.
Put the immediate value with block info just after the instruction pointer in the program table.
This makes the program table 2x smaller
@chfast
Copy link
Member Author

chfast commented Sep 17, 2019

Also why bother with these complications, if it doesn't give performance gains?

You have to do the work to be able to benchmark it later. This way is more memory efficient - we allocate space for instruction arguments (immediate values) only when needed. This may be important factor when we decide to cache the evmone loaded programs (not to repeat the analysis for the same contracts).

I still don't see big improvements, ~1-2% only. I believe my CPU is good enough in fetching memory quickly enough - old version has the same memory layout, just wastes some space.

Old:

 Performance counter stats for 'bin/evmone-bench-master ../../test/benchmarks':

        21 502 116      cache-references                                            
         1 119 073      cache-misses              #    5,204 % of all cache refs    
   127 269 289 930      cycles                                                      
   352 107 262 713      instructions              #    2,77  insn per cycle         

      28,971433332 seconds time elapsed

New:

 Performance counter stats for 'bin/evmone-bench ../../test/benchmarks':

        17 453 822      cache-references                                            
         1 046 808      cache-misses              #    5,998 % of all cache refs    
   127 385 133 793      cycles                                                      
   358 617 530 718      instructions              #    2,82  insn per cycle         

      28,987604886 seconds time elapsed

The above shows that the new version has similar number of cache misses, it just uses less memory in general.

@chfast
Copy link
Member Author

chfast commented Sep 17, 2019

I'm leaving this for next release.

jwasinger pushed a commit to jwasinger/evmone that referenced this pull request Apr 27, 2021
Simplify signatures of Host methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants