Stacktrie allocs 2 #30746

holiman · 2024-11-12T08:36:30Z

The PR #30743 improves derivesha:

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/types
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
                          │ derivesha.1 │             derivesha.2              │
                          │   sec/op    │    sec/op     vs base                │
DeriveSha200/stack_trie-8   477.8µ ± 2%   430.0µ ± 12%  -10.00% (p=0.000 n=10)

                          │ derivesha.1  │             derivesha.2              │
                          │     B/op     │     B/op      vs base                │
DeriveSha200/stack_trie-8   45.17Ki ± 0%   25.65Ki ± 0%  -43.21% (p=0.000 n=10)

                          │ derivesha.1 │            derivesha.2             │
                          │  allocs/op  │ allocs/op   vs base                │
DeriveSha200/stack_trie-8   1259.0 ± 0%   232.0 ± 0%  -81.57% (p=0.000 n=10)

This PR takes it one step further. It allows the derivesha method to hand a bytepool to the hasher. The stacktrie can thus use this to feed value-buffers back to the caller, and thus we can avoid a lot of common.CopyBytes.

It's not very elegant, but it does reduce the allocs even further:

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/types
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
                          │ derivesha.2  │             derivesha.3             │
                          │    sec/op    │    sec/op     vs base               │
DeriveSha200/stack_trie-8   430.0µ ± 12%   467.1µ ± 19%  +8.64% (p=0.023 n=10)

                          │  derivesha.2  │             derivesha.3              │
                          │     B/op      │     B/op      vs base                │
DeriveSha200/stack_trie-8   25.654Ki ± 0%   5.494Ki ± 0%  -78.59% (p=0.000 n=10)

                          │ derivesha.2 │            derivesha.3             │
                          │  allocs/op  │ allocs/op   vs base                │
DeriveSha200/stack_trie-8   232.00 ± 0%   37.00 ± 0%  -84.05% (p=0.000 n=10)

Definitely not merge:able as is, putting it up for discussion

goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/trie cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ stacktrie.3 │ stacktrie.4 │ │ sec/op │ sec/op vs base │ Insert100K-8 69.50m ± 12% 74.59m ± 14% ~ (p=0.128 n=7) │ stacktrie.3 │ stacktrie.4 │ │ B/op │ B/op vs base │ Insert100K-8 4.640Mi ± 0% 3.112Mi ± 0% -32.93% (p=0.001 n=7) │ stacktrie.3 │ stacktrie.4 │ │ allocs/op │ allocs/op vs base │ Insert100K-8 226.7k ± 0% 126.7k ± 0% -44.11% (p=0.001 n=7)

holiman · 2024-11-12T08:41:38Z

It allows the derivesha method to hand a bytepool to the hasher. The stacktrie can thus use this to feed value-buffers back to the caller,

A cleaner way to approach it would instead be to

on Update, always copy the value, so it's owned by the stacktrie
Whenever a value is released, put it into an internal pool

holiman added 9 commits November 11, 2024 11:22

trie: add stacktrie test

f544582

trie: new node-encoding types

21595b3

trie: make stacktrie use less alloc:y encoders

99a3b53

trie: stacktrie allocation reduction via key scratchspace

f0ccc37

trie: stacktrie pool hashing slices

96cd30f

trie: implement bytepool

1b00b6e

trie: nits and polishes

aacb8e6

trie, core: rough idea to improve derivesha

0455e7a

holiman closed this Nov 12, 2024

holiman mentioned this pull request Nov 12, 2024

trie: [wip] reduce allocations in derivesha #30747

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stacktrie allocs 2 #30746

Stacktrie allocs 2 #30746

holiman commented Nov 12, 2024

holiman commented Nov 12, 2024

Stacktrie allocs 2 #30746

Stacktrie allocs 2 #30746

Conversation

holiman commented Nov 12, 2024

holiman commented Nov 12, 2024