core/vm: use fixed uint256 library instead of big #20601

holiman · 2020-01-27T21:42:35Z

This PR is another experiment at integrating the fixed-256 bit math library, instead of using the native go big.Int arbitrary precision big number library.

Theoretically, it could make a difference, but historically we haven't seen any big differences. We have some more advanced metrics now, and both @gballet and @chfast has been curious about giving it another go, so let's do a benchmark when we have a couple of machines available.

See #18191 for earlier numbers.

chfast · 2020-01-27T22:01:15Z

Eliminating so many memory allocations is big win nevertheless.
Does uint256 require any memory allocations?
Are you able to measure total memory footprint of the node?
In the final version the intPool should be eliminated, is that correct?

holiman · 2020-01-27T22:19:52Z

Eliminating so many memory allocations is big win nevertheless.

Yep, I agree

Does uint256 require any memory allocations?

Well, the struct consists of 4 x uint64 internally, but for normal operations it does not allocate. There are exceptions, see https://github.com/holiman/uint256/#benchmarks , some examples

Benchmark_Exp/large/uint256-6            	  100000	     17744 ns/op	      32 B/op	       1 allocs/op
Benchmark_Div/large/uint256-6            	 5000000	       241 ns/op	     128 B/op	       3 allocs/op
Benchmark_MulMod/large/uint256-6         	 1000000	      1245 ns/op	     608 B/op	      11 allocs/op
Benchmark_SDiv/large/uint256-6           	 5000000	       255 ns/op	     128 B/op	       3 allocs/op

(MulMod is the worst, it uses big.Int behind the scenes).

Are you able to measure total memory footprint of the node?

Yep, we'll see that in the charts, and also the number of allocs and the gc workload.

In the final version the intPool should be eliminated, is that correct?

Yes, I would assume so, but we'll have to benchmark that change. So first big.Int vs uint256, then (possibly) uint256 versus uint256-unpooled.

holiman · 2020-01-28T09:40:02Z

In my repo, I aded a branch fixedbig_v3 which removes the use of intpools. Some preliminary benchmarks (this PR vs master)

[user@work vm]$ benchstat before_uint.txt after_uint.txt 
name            old time/op  new time/op  delta
OpAdd64-6        218ns ± 1%    92ns ± 1%   -57.80%  (p=0.000 n=5+4)
OpAdd128-6       251ns ± 2%   119ns ±14%   -52.59%  (p=0.016 n=4+5)
OpAdd256-6       255ns ±23%   180ns ± 6%   -29.28%  (p=0.008 n=5+5)
OpSub64-6        183ns ±14%   102ns ± 3%   -44.21%  (p=0.008 n=5+5)
OpSub128-6       216ns ± 0%   128ns ± 5%   -40.42%  (p=0.016 n=4+5)
OpSub256-6       232ns ± 4%   186ns ±16%   -19.83%  (p=0.016 n=4+5)
OpMul-6          336ns ±26%   234ns ± 4%   -30.42%  (p=0.008 n=5+5)
OpDiv256-6       516ns ±21%   415ns ±14%   -19.58%  (p=0.008 n=5+5)
OpDiv128-6       284ns ± 3%   326ns ±20%   +14.70%  (p=0.008 n=5+5)
OpDiv64-6        205ns ± 4%   141ns ±50%      ~     (p=0.071 n=5+5)
OpSdiv-6         701ns ± 1%   542ns ±36%      ~     (p=0.190 n=4+5)
OpMod-6          398ns ± 3%   195ns ± 5%   -50.97%  (p=0.016 n=4+5)
OpSmod-6         664ns ± 1%   274ns ±42%   -58.73%  (p=0.016 n=4+5)
OpExp-6         27.4µs ± 1%  18.8µs ± 1%   -31.17%  (p=0.016 n=5+4)
OpSignExtend-6   242ns ± 4%   194ns ±28%      ~     (p=0.087 n=5+5)
OpLt-6           211ns ± 4%   179ns ± 2%   -15.26%  (p=0.016 n=4+5)
OpGt-6           206ns ± 1%   175ns ± 1%   -14.93%  (p=0.029 n=4+4)
OpSlt-6          228ns ± 2%   186ns ± 1%   -18.17%  (p=0.000 n=5+4)
OpSgt-6          226ns ± 5%   188ns ± 1%   -16.90%  (p=0.008 n=5+5)
OpEq-6           208ns ± 2%   184ns ±14%      ~     (p=0.111 n=4+5)
OpEq2-6          206ns ± 1%   184ns ±17%      ~     (p=0.119 n=5+5)
OpAnd-6          211ns ± 3%   195ns ±16%      ~     (p=0.151 n=5+5)
OpOr-6           216ns ± 2%   187ns ± 5%   -13.32%  (p=0.016 n=4+5)
OpXor-6          225ns ±15%   180ns ± 3%   -19.98%  (p=0.008 n=5+5)
OpByte-6         204ns ± 2%   176ns ± 2%   -13.56%  (p=0.016 n=5+4)
OpAddmod-6       588ns ± 2%   566ns ± 0%    -3.86%  (p=0.016 n=5+4)
OpMulmod-6       824ns ± 2%  1254ns ± 1%   +52.11%  (p=0.016 n=4+5)
OpSHL-6          303ns ± 3%   131ns ± 5%   -56.77%  (p=0.008 n=5+5)
OpSHR-6          286ns ± 1%   134ns ±18%   -53.13%  (p=0.016 n=4+5)
OpSAR-6          462ns ± 2%   127ns ± 1%   -72.60%  (p=0.008 n=5+5)
OpIsZero-6       104ns ± 1%    92ns ±18%      ~     (p=0.175 n=4+5)
OpMstore-6      35.6ns ± 7%  87.2ns ± 5%  +145.33%  (p=0.008 n=5+5)
OpSHA3-6        1.11µs ±83%  0.74µs ± 2%   -33.06%  (p=0.008 n=5+5)

Interestingly, removing the pools does not improve these benchmarks:

[user@work vm]$ benchstat after_uint.txt  after_nopools.txt 
name            old time/op  new time/op   delta
OpAdd64-6       91.8ns ± 1%   98.9ns ± 1%   +7.73%  (p=0.029 n=4+4)
OpAdd128-6       119ns ±14%    128ns ±11%     ~     (p=0.079 n=5+5)
OpAdd256-6       180ns ± 6%    225ns ± 2%  +24.64%  (p=0.008 n=5+5)
OpSub64-6        102ns ± 3%    120ns ±19%  +17.03%  (p=0.008 n=5+5)
OpSub128-6       128ns ± 5%    156ns ± 2%  +21.30%  (p=0.016 n=5+4)
OpSub256-6       186ns ±16%    226ns ± 1%  +21.66%  (p=0.008 n=5+5)
OpMul-6          234ns ± 4%    282ns ± 1%  +20.40%  (p=0.016 n=5+4)
OpDiv256-6       415ns ±14%    439ns ± 3%     ~     (p=0.151 n=5+5)
OpDiv128-6       326ns ±20%    330ns ±16%     ~     (p=0.516 n=5+5)
OpDiv64-6        141ns ±50%    120ns ± 9%     ~     (p=0.786 n=5+5)
OpSdiv-6         542ns ±36%    464ns ± 2%     ~     (p=0.841 n=5+5)
OpMod-6          195ns ± 5%    233ns ± 1%  +19.62%  (p=0.016 n=5+4)
OpSmod-6         274ns ±42%    269ns ±17%     ~     (p=1.000 n=5+5)
OpExp-6         18.8µs ± 1%   18.7µs ± 3%     ~     (p=0.905 n=4+5)
OpSignExtend-6   194ns ±28%    223ns ± 2%     ~     (p=0.175 n=5+4)
OpLt-6           179ns ± 2%    225ns ± 1%  +25.98%  (p=0.016 n=5+4)
OpGt-6           175ns ± 1%    223ns ± 0%  +27.25%  (p=0.016 n=4+5)
OpSlt-6          186ns ± 1%    285ns ±52%  +53.23%  (p=0.016 n=4+5)
OpSgt-6          188ns ± 1%    262ns ±17%  +39.62%  (p=0.008 n=5+5)
OpEq-6           184ns ±14%    247ns ±26%  +34.16%  (p=0.008 n=5+5)
OpEq2-6          184ns ±17%    224ns ± 0%  +21.87%  (p=0.008 n=5+5)
OpAnd-6          195ns ±16%    237ns ±22%  +21.33%  (p=0.048 n=5+5)
OpOr-6           187ns ± 5%    224ns ± 1%  +19.65%  (p=0.016 n=5+4)
OpXor-6          180ns ± 3%    222ns ± 1%  +23.20%  (p=0.008 n=5+5)
OpByte-6         176ns ± 2%    222ns ± 1%  +25.85%  (p=0.029 n=4+4)
OpAddmod-6       566ns ± 0%    632ns ± 2%  +11.72%  (p=0.029 n=4+4)
OpMulmod-6      1.25µs ± 1%   1.47µs ±22%  +17.21%  (p=0.008 n=5+5)
OpSHL-6          131ns ± 5%    151ns ± 2%  +15.75%  (p=0.008 n=5+5)
OpSHR-6          134ns ±18%    154ns ± 4%     ~     (p=0.079 n=5+5)
OpSAR-6          127ns ± 1%    156ns ± 1%  +22.83%  (p=0.016 n=5+4)
OpIsZero-6      92.2ns ±18%  114.4ns ± 2%  +24.13%  (p=0.008 n=5+5)
OpMstore-6      87.2ns ± 5%  103.2ns ± 2%  +18.29%  (p=0.008 n=5+5)
OpSHA3-6         745ns ± 2%    797ns ± 9%   +7.04%  (p=0.016 n=5+5)

However, I think that's just some weirdness of the tests, there's no reason that e.g. ADD would take longer time (since it doesn't even use anything from the pool)

holiman · 2020-01-28T17:46:25Z

I swapped this in during the execution of an earlier full-sync - the down-spike is when the I swapped the code. No huge gains

holiman · 2020-03-19T11:24:10Z

Updated with the latest improvements from @chfast

[user@work vm]$ benchstat before_pawel.txt after_pawel.txt 
name            old time/op  new time/op   delta
OpAdd64-6       96.0ns ± 3%  100.1ns ±22%     ~     (p=0.841 n=5+5)
OpAdd128-6       117ns ± 3%    113ns ± 4%     ~     (p=0.087 n=5+5)
OpAdd256-6       193ns ±19%    178ns ± 2%   -8.09%  (p=0.000 n=5+4)
OpSub64-6        113ns ±15%    102ns ± 1%   -9.57%  (p=0.016 n=5+4)
OpSub128-6       131ns ± 3%    129ns ± 3%     ~     (p=0.127 n=5+5)
OpSub256-6       183ns ± 2%    184ns ± 3%     ~     (p=0.794 n=4+5)
OpMul-6          251ns ±19%    205ns ±16%     ~     (p=0.056 n=5+5)
OpDiv256-6       427ns ± 5%    289ns ± 2%  -32.41%  (p=0.008 n=5+5)
OpDiv128-6       317ns ± 2%    175ns ± 1%  -44.65%  (p=0.016 n=5+4)
OpDiv64-6        119ns ±22%    118ns ±25%     ~     (p=0.460 n=5+5)
OpSdiv-6         444ns ± 1%    306ns ± 2%  -31.06%  (p=0.029 n=4+4)
OpMod-6          211ns ±23%    189ns ± 2%  -10.16%  (p=0.024 n=5+5)
OpSmod-6         217ns ± 3%    212ns ± 1%   -2.24%  (p=0.016 n=5+4)
OpExp-6         19.6µs ± 1%    9.4µs ± 1%  -52.22%  (p=0.016 n=4+5)
OpSignExtend-6   196ns ±20%    187ns ±15%     ~     (p=0.198 n=5+5)
OpLt-6           184ns ± 2%    192ns ±15%     ~     (p=0.738 n=5+5)
OpGt-6           197ns ±21%    191ns ±13%     ~     (p=0.690 n=5+5)
OpSlt-6          196ns ± 3%    199ns ± 3%     ~     (p=0.286 n=4+5)
OpSgt-6          198ns ± 3%    197ns ± 0%     ~     (p=0.635 n=5+4)
OpEq-6           186ns ± 4%    189ns ± 9%     ~     (p=1.000 n=5+5)
OpEq2-6          193ns ±19%    186ns ± 9%     ~     (p=0.389 n=5+5)
OpAnd-6          187ns ± 2%    184ns ± 1%     ~     (p=0.079 n=5+4)
OpOr-6           182ns ± 1%    181ns ± 1%     ~     (p=0.476 n=5+4)
OpXor-6          183ns ± 1%    188ns ± 5%     ~     (p=0.603 n=4+5)
OpByte-6         197ns ±23%    196ns ±22%     ~     (p=0.730 n=5+5)
OpAddmod-6       651ns ±29%    421ns ±15%  -35.23%  (p=0.008 n=5+5)
OpMulmod-6      1.32µs ± 4%   0.54µs ± 1%  -58.89%  (p=0.016 n=5+4)
OpSHL-6          135ns ± 1%    145ns ± 8%   +7.21%  (p=0.016 n=4+5)
OpSHR-6          136ns ± 0%    144ns ± 7%   +5.98%  (p=0.016 n=4+5)
OpSAR-6          143ns ± 7%    149ns ±26%     ~     (p=1.000 n=5+5)
OpIsZero-6      93.4ns ± 2%   93.6ns ± 5%     ~     (p=0.841 n=4+5)
OpMstore-6      95.5ns ± 7%   92.8ns ± 6%     ~     (p=0.222 n=5+5)
OpSHA3-6         798ns ± 2%    784ns ± 1%     ~     (p=0.127 n=5+4)

holiman · 2020-03-23T14:04:26Z

Closing in favour of #20787

holiman added the status:work-in-progress label Jan 27, 2020

holiman requested review from karalabe and rjl493456442 as code owners January 27, 2020 21:42

core/vm: use fixed uint256 library instead of big

fc287d7

holiman force-pushed the fixedbig_v2 branch from e14e9e4 to fc287d7 Compare January 27, 2020 21:53

go.mod: update uint256

45f2942

chfast mentioned this pull request Mar 19, 2020

core/vm: use uint256 in EVM implementation #20787

Merged

holiman closed this Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/vm: use fixed uint256 library instead of big #20601

core/vm: use fixed uint256 library instead of big #20601

holiman commented Jan 27, 2020

chfast commented Jan 27, 2020

holiman commented Jan 27, 2020

holiman commented Jan 28, 2020

holiman commented Jan 28, 2020

holiman commented Mar 19, 2020

holiman commented Mar 23, 2020

core/vm: use fixed uint256 library instead of big #20601

core/vm: use fixed uint256 library instead of big #20601

Conversation

holiman commented Jan 27, 2020

chfast commented Jan 27, 2020

holiman commented Jan 27, 2020

holiman commented Jan 28, 2020

holiman commented Jan 28, 2020

holiman commented Mar 19, 2020

holiman commented Mar 23, 2020