Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/vm: use fixed uint256 library instead of big #20601

Closed
wants to merge 2 commits into from

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Jan 27, 2020

This PR is another experiment at integrating the fixed-256 bit math library, instead of using the native go big.Int arbitrary precision big number library.

Theoretically, it could make a difference, but historically we haven't seen any big differences. We have some more advanced metrics now, and both @gballet and @chfast has been curious about giving it another go, so let's do a benchmark when we have a couple of machines available.

See #18191 for earlier numbers.

@chfast
Copy link
Member

chfast commented Jan 27, 2020

  • Eliminating so many memory allocations is big win nevertheless.
  • Does uint256 require any memory allocations?
  • Are you able to measure total memory footprint of the node?
  • In the final version the intPool should be eliminated, is that correct?

@holiman
Copy link
Contributor Author

holiman commented Jan 27, 2020

Eliminating so many memory allocations is big win nevertheless.

Yep, I agree

Does uint256 require any memory allocations?

Well, the struct consists of 4 x uint64 internally, but for normal operations it does not allocate. There are exceptions, see https://github.com/holiman/uint256/#benchmarks , some examples

Benchmark_Exp/large/uint256-6            	  100000	     17744 ns/op	      32 B/op	       1 allocs/op
Benchmark_Div/large/uint256-6            	 5000000	       241 ns/op	     128 B/op	       3 allocs/op
Benchmark_MulMod/large/uint256-6         	 1000000	      1245 ns/op	     608 B/op	      11 allocs/op
Benchmark_SDiv/large/uint256-6           	 5000000	       255 ns/op	     128 B/op	       3 allocs/op

(MulMod is the worst, it uses big.Int behind the scenes).

Are you able to measure total memory footprint of the node?

Yep, we'll see that in the charts, and also the number of allocs and the gc workload.

In the final version the intPool should be eliminated, is that correct?

Yes, I would assume so, but we'll have to benchmark that change. So first big.Int vs uint256, then (possibly) uint256 versus uint256-unpooled.

@holiman
Copy link
Contributor Author

holiman commented Jan 28, 2020

In my repo, I aded a branch fixedbig_v3 which removes the use of intpools. Some preliminary benchmarks (this PR vs master)

[user@work vm]$ benchstat before_uint.txt after_uint.txt 
name            old time/op  new time/op  delta
OpAdd64-6        218ns ± 1%    92ns ± 1%   -57.80%  (p=0.000 n=5+4)
OpAdd128-6       251ns ± 2%   119ns ±14%   -52.59%  (p=0.016 n=4+5)
OpAdd256-6       255ns ±23%   180ns ± 6%   -29.28%  (p=0.008 n=5+5)
OpSub64-6        183ns ±14%   102ns ± 3%   -44.21%  (p=0.008 n=5+5)
OpSub128-6       216ns ± 0%   128ns ± 5%   -40.42%  (p=0.016 n=4+5)
OpSub256-6       232ns ± 4%   186ns ±16%   -19.83%  (p=0.016 n=4+5)
OpMul-6          336ns ±26%   234ns ± 4%   -30.42%  (p=0.008 n=5+5)
OpDiv256-6       516ns ±21%   415ns ±14%   -19.58%  (p=0.008 n=5+5)
OpDiv128-6       284ns ± 3%   326ns ±20%   +14.70%  (p=0.008 n=5+5)
OpDiv64-6        205ns ± 4%   141ns ±50%      ~     (p=0.071 n=5+5)
OpSdiv-6         701ns ± 1%   542ns ±36%      ~     (p=0.190 n=4+5)
OpMod-6          398ns ± 3%   195ns ± 5%   -50.97%  (p=0.016 n=4+5)
OpSmod-6         664ns ± 1%   274ns ±42%   -58.73%  (p=0.016 n=4+5)
OpExp-6         27.4µs ± 1%  18.8µs ± 1%   -31.17%  (p=0.016 n=5+4)
OpSignExtend-6   242ns ± 4%   194ns ±28%      ~     (p=0.087 n=5+5)
OpLt-6           211ns ± 4%   179ns ± 2%   -15.26%  (p=0.016 n=4+5)
OpGt-6           206ns ± 1%   175ns ± 1%   -14.93%  (p=0.029 n=4+4)
OpSlt-6          228ns ± 2%   186ns ± 1%   -18.17%  (p=0.000 n=5+4)
OpSgt-6          226ns ± 5%   188ns ± 1%   -16.90%  (p=0.008 n=5+5)
OpEq-6           208ns ± 2%   184ns ±14%      ~     (p=0.111 n=4+5)
OpEq2-6          206ns ± 1%   184ns ±17%      ~     (p=0.119 n=5+5)
OpAnd-6          211ns ± 3%   195ns ±16%      ~     (p=0.151 n=5+5)
OpOr-6           216ns ± 2%   187ns ± 5%   -13.32%  (p=0.016 n=4+5)
OpXor-6          225ns ±15%   180ns ± 3%   -19.98%  (p=0.008 n=5+5)
OpByte-6         204ns ± 2%   176ns ± 2%   -13.56%  (p=0.016 n=5+4)
OpAddmod-6       588ns ± 2%   566ns ± 0%    -3.86%  (p=0.016 n=5+4)
OpMulmod-6       824ns ± 2%  1254ns ± 1%   +52.11%  (p=0.016 n=4+5)
OpSHL-6          303ns ± 3%   131ns ± 5%   -56.77%  (p=0.008 n=5+5)
OpSHR-6          286ns ± 1%   134ns ±18%   -53.13%  (p=0.016 n=4+5)
OpSAR-6          462ns ± 2%   127ns ± 1%   -72.60%  (p=0.008 n=5+5)
OpIsZero-6       104ns ± 1%    92ns ±18%      ~     (p=0.175 n=4+5)
OpMstore-6      35.6ns ± 7%  87.2ns ± 5%  +145.33%  (p=0.008 n=5+5)
OpSHA3-6        1.11µs ±83%  0.74µs ± 2%   -33.06%  (p=0.008 n=5+5)

Interestingly, removing the pools does not improve these benchmarks:

[user@work vm]$ benchstat after_uint.txt  after_nopools.txt 
name            old time/op  new time/op   delta
OpAdd64-6       91.8ns ± 1%   98.9ns ± 1%   +7.73%  (p=0.029 n=4+4)
OpAdd128-6       119ns ±14%    128ns ±11%     ~     (p=0.079 n=5+5)
OpAdd256-6       180ns ± 6%    225ns ± 2%  +24.64%  (p=0.008 n=5+5)
OpSub64-6        102ns ± 3%    120ns ±19%  +17.03%  (p=0.008 n=5+5)
OpSub128-6       128ns ± 5%    156ns ± 2%  +21.30%  (p=0.016 n=5+4)
OpSub256-6       186ns ±16%    226ns ± 1%  +21.66%  (p=0.008 n=5+5)
OpMul-6          234ns ± 4%    282ns ± 1%  +20.40%  (p=0.016 n=5+4)
OpDiv256-6       415ns ±14%    439ns ± 3%     ~     (p=0.151 n=5+5)
OpDiv128-6       326ns ±20%    330ns ±16%     ~     (p=0.516 n=5+5)
OpDiv64-6        141ns ±50%    120ns ± 9%     ~     (p=0.786 n=5+5)
OpSdiv-6         542ns ±36%    464ns ± 2%     ~     (p=0.841 n=5+5)
OpMod-6          195ns ± 5%    233ns ± 1%  +19.62%  (p=0.016 n=5+4)
OpSmod-6         274ns ±42%    269ns ±17%     ~     (p=1.000 n=5+5)
OpExp-6         18.8µs ± 1%   18.7µs ± 3%     ~     (p=0.905 n=4+5)
OpSignExtend-6   194ns ±28%    223ns ± 2%     ~     (p=0.175 n=5+4)
OpLt-6           179ns ± 2%    225ns ± 1%  +25.98%  (p=0.016 n=5+4)
OpGt-6           175ns ± 1%    223ns ± 0%  +27.25%  (p=0.016 n=4+5)
OpSlt-6          186ns ± 1%    285ns ±52%  +53.23%  (p=0.016 n=4+5)
OpSgt-6          188ns ± 1%    262ns ±17%  +39.62%  (p=0.008 n=5+5)
OpEq-6           184ns ±14%    247ns ±26%  +34.16%  (p=0.008 n=5+5)
OpEq2-6          184ns ±17%    224ns ± 0%  +21.87%  (p=0.008 n=5+5)
OpAnd-6          195ns ±16%    237ns ±22%  +21.33%  (p=0.048 n=5+5)
OpOr-6           187ns ± 5%    224ns ± 1%  +19.65%  (p=0.016 n=5+4)
OpXor-6          180ns ± 3%    222ns ± 1%  +23.20%  (p=0.008 n=5+5)
OpByte-6         176ns ± 2%    222ns ± 1%  +25.85%  (p=0.029 n=4+4)
OpAddmod-6       566ns ± 0%    632ns ± 2%  +11.72%  (p=0.029 n=4+4)
OpMulmod-6      1.25µs ± 1%   1.47µs ±22%  +17.21%  (p=0.008 n=5+5)
OpSHL-6          131ns ± 5%    151ns ± 2%  +15.75%  (p=0.008 n=5+5)
OpSHR-6          134ns ±18%    154ns ± 4%     ~     (p=0.079 n=5+5)
OpSAR-6          127ns ± 1%    156ns ± 1%  +22.83%  (p=0.016 n=5+4)
OpIsZero-6      92.2ns ±18%  114.4ns ± 2%  +24.13%  (p=0.008 n=5+5)
OpMstore-6      87.2ns ± 5%  103.2ns ± 2%  +18.29%  (p=0.008 n=5+5)
OpSHA3-6         745ns ± 2%    797ns ± 9%   +7.04%  (p=0.016 n=5+5)

However, I think that's just some weirdness of the tests, there's no reason that e.g. ADD would take longer time (since it doesn't even use anything from the pool)

@holiman
Copy link
Contributor Author

holiman commented Jan 28, 2020

I swapped this in during the execution of an earlier full-sync - the down-spike is when the I swapped the code. No huge gains
Screenshot_2020-01-28 Dual Geth - Grafana

@holiman
Copy link
Contributor Author

holiman commented Mar 19, 2020

Updated with the latest improvements from @chfast

[user@work vm]$ benchstat before_pawel.txt after_pawel.txt 
name            old time/op  new time/op   delta
OpAdd64-6       96.0ns ± 3%  100.1ns ±22%     ~     (p=0.841 n=5+5)
OpAdd128-6       117ns ± 3%    113ns ± 4%     ~     (p=0.087 n=5+5)
OpAdd256-6       193ns ±19%    178ns ± 2%   -8.09%  (p=0.000 n=5+4)
OpSub64-6        113ns ±15%    102ns ± 1%   -9.57%  (p=0.016 n=5+4)
OpSub128-6       131ns ± 3%    129ns ± 3%     ~     (p=0.127 n=5+5)
OpSub256-6       183ns ± 2%    184ns ± 3%     ~     (p=0.794 n=4+5)
OpMul-6          251ns ±19%    205ns ±16%     ~     (p=0.056 n=5+5)
OpDiv256-6       427ns ± 5%    289ns ± 2%  -32.41%  (p=0.008 n=5+5)
OpDiv128-6       317ns ± 2%    175ns ± 1%  -44.65%  (p=0.016 n=5+4)
OpDiv64-6        119ns ±22%    118ns ±25%     ~     (p=0.460 n=5+5)
OpSdiv-6         444ns ± 1%    306ns ± 2%  -31.06%  (p=0.029 n=4+4)
OpMod-6          211ns ±23%    189ns ± 2%  -10.16%  (p=0.024 n=5+5)
OpSmod-6         217ns ± 3%    212ns ± 1%   -2.24%  (p=0.016 n=5+4)
OpExp-6         19.6µs ± 1%    9.4µs ± 1%  -52.22%  (p=0.016 n=4+5)
OpSignExtend-6   196ns ±20%    187ns ±15%     ~     (p=0.198 n=5+5)
OpLt-6           184ns ± 2%    192ns ±15%     ~     (p=0.738 n=5+5)
OpGt-6           197ns ±21%    191ns ±13%     ~     (p=0.690 n=5+5)
OpSlt-6          196ns ± 3%    199ns ± 3%     ~     (p=0.286 n=4+5)
OpSgt-6          198ns ± 3%    197ns ± 0%     ~     (p=0.635 n=5+4)
OpEq-6           186ns ± 4%    189ns ± 9%     ~     (p=1.000 n=5+5)
OpEq2-6          193ns ±19%    186ns ± 9%     ~     (p=0.389 n=5+5)
OpAnd-6          187ns ± 2%    184ns ± 1%     ~     (p=0.079 n=5+4)
OpOr-6           182ns ± 1%    181ns ± 1%     ~     (p=0.476 n=5+4)
OpXor-6          183ns ± 1%    188ns ± 5%     ~     (p=0.603 n=4+5)
OpByte-6         197ns ±23%    196ns ±22%     ~     (p=0.730 n=5+5)
OpAddmod-6       651ns ±29%    421ns ±15%  -35.23%  (p=0.008 n=5+5)
OpMulmod-6      1.32µs ± 4%   0.54µs ± 1%  -58.89%  (p=0.016 n=5+4)
OpSHL-6          135ns ± 1%    145ns ± 8%   +7.21%  (p=0.016 n=4+5)
OpSHR-6          136ns ± 0%    144ns ± 7%   +5.98%  (p=0.016 n=4+5)
OpSAR-6          143ns ± 7%    149ns ±26%     ~     (p=1.000 n=5+5)
OpIsZero-6      93.4ns ± 2%   93.6ns ± 5%     ~     (p=0.841 n=4+5)
OpMstore-6      95.5ns ± 7%   92.8ns ± 6%     ~     (p=0.222 n=5+5)
OpSHA3-6         798ns ± 2%    784ns ± 1%     ~     (p=0.127 n=5+4)

@holiman
Copy link
Contributor Author

holiman commented Mar 23, 2020

Closing in favour of #20787

@holiman holiman closed this Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants