Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/buffer_pool, db: Reduce memory allocation #367

Merged
merged 2 commits into from
Aug 19, 2021

Conversation

kcalvinalvin
Copy link
Contributor

This PR addresses the problem of too much memory being allocated by goleveldb in the project I'm working on here: https://github.com/mit-dci/utcd.

I've conducted two benchmarks to check that this PR lessen the memory allocated:
1: pprof results from the project I'm working on.
2: Benchstat results from the existing benchmark tests in goleveldb.

pprof results

The project I'm working on is a Bitcoin node in Go. It uses goleveldb to keep track of Bitcoins that are available to be spent. The load on goleveldb is very intense with there being many reads and writes when it first starts up. I used this command to create a controlled test for profiling the memory usage.

# only the --memprofile flag differs for old goleveldb test
timeout -s SIGINT 20m ./btcd --logdir=. --datadir=. --memprofile=memprof-new-goleveldb --connect=127.0.0.1

The binary used was compiled from branch utreexo on commit e1ab5c048f37e714b46e94782c4e2a66319962f2

pprof-top results

Before, BufferPool.Get() allocated the most memory. 45.19% of all memory allocated was allocated by it.
After, BufferPool.Get() only allocated 9.12%.

old:
Screen Shot 2021-08-17 at 6 24 39 PM

new:
Screen Shot 2021-08-17 at 6 28 20 PM

pprof-flame results

The difference of the memory allocated between these two can be visibly seen on the flamegraph as well.

old:
Screen Shot 2021-08-17 at 6 33 13 PM

new:
Screen Shot 2021-08-17 at 6 34 27 PM

I've included the html files for the pprof images used above as well as the html files for pprof-graph
pprof goleveldb.zip

Benchstat results

The benchmarks were separated into the following:
DBRead, DBWrite, DBSeek, DBOverwrite, DBPut, and DBGet.

The actual benchmarking was performed with the following command:

# only the output file names would differ for old goleveldb
go test -run=XXX -test.timeout=25m -benchmem -bench=BenchmarkDBRead -count=10 >       new-dbread.txt
go test -run=XXX -test.timeout=10m -benchmem -bench=BenchmarkDBWrite -count=10 >      new-dbwrite.txt
go test -run=XXX -test.timeout=10m -benchmem -bench=BenchmarkDBSeek -count=10 >       new-dbseek.txt
go test -run=XXX -test.timeout=10m -benchmem -bench=BenchmarkDBOverwrite -count=10 >  new-dboverwrite.txt
go test -run=XXX -test.timeout=10m -benchmem -bench=BenchmarkDBPut -count=10 >        new-dbput.txt
go test -run=XXX -test.timeout=10m -benchmem -bench=BenchmarkDBGet -count=10 >        new-dbget.txt

In some there was a lot of variation so the results might not mean much for those specific benchmarks.

Biggest improvements

While most of the benchmarks showed no real difference, the biggest wins were from the following:

  • alloc/op in DBRead
name                   old alloc/op   new alloc/op   delta
DBRead-12                 17.0B ± 0%     17.0B ± 0%     ~     (all equal)
DBReadGC-12               17.0B ± 0%     17.0B ± 0%     ~     (all equal)
DBReadUncompressed-12     20.0B ± 0%     16.0B ± 0%  -20.00%  (p=0.000 n=10+10)
DBReadTable-12            17.0B ± 0%     17.0B ± 0%     ~     (all equal)
DBReadReverse-12          66.0B ± 0%     66.0B ± 0%     ~     (all equal)
DBReadReverseTable-12     66.0B ± 0%     67.0B ± 0%   +1.52%  (p=0.000 n=10+9)
DBReadConcurrent-12       24.8B ±57%      9.4B ±17%  -62.10%  (p=0.000 n=10+10)
DBReadConcurrent2-12      49.1B ±36%     31.7B ± 4%  -35.51%  (p=0.000 n=10+9)
  • time/op and speed in DBWrite
name                         old time/op    new time/op    delta
DBWrite-12                     2.05µs ± 1%    2.01µs ± 0%    -1.80%  (p=0.000 n=10+9)
DBWriteBatch-12                 535ns ± 2%     589ns ±19%      ~     (p=0.353 n=10+10)
DBWriteUncompressed-12         2.04µs ± 1%    2.01µs ± 1%    -1.48%  (p=0.000 n=10+10)
DBWriteBatchUncompressed-12    1.16µs ±35%    0.53µs ± 2%   -54.80%  (p=0.000 n=10+10)
DBWriteRandom-12               2.51µs ± 1%    2.49µs ± 2%      ~     (p=0.247 n=10+10)
DBWriteRandomSync-12            376µs ± 4%   1617µs ±204%      ~     (p=0.089 n=10+10)

name                         old speed      new speed      delta
DBWrite-12                   56.6MB/s ± 1%  57.7MB/s ± 0%    +1.84%  (p=0.000 n=10+9)
DBWriteBatch-12               217MB/s ± 2%   200MB/s ±17%      ~     (p=0.353 n=10+10)
DBWriteUncompressed-12       56.7MB/s ± 1%  57.6MB/s ± 1%    +1.50%  (p=0.000 n=10+10)
DBWriteBatchUncompressed-12   104MB/s ±48%   221MB/s ± 2%  +111.50%  (p=0.000 n=10+10)
DBWriteRandom-12             46.3MB/s ± 1%  46.5MB/s ± 2%      ~     (p=0.271 n=10+10)
DBWriteRandomSync-12          308kB/s ± 4%   221kB/s ±91%      ~     (p=0.121 n=10+10)
  • time/op and speed in DBOverwrite
name                  old time/op    new time/op    delta
DBOverwrite-12          2.54µs ±25%    2.16µs ± 1%  -14.96%  (p=0.000 n=10+9)
DBOverwriteRandom-12    3.63µs ±26%    2.69µs ± 2%  -25.88%  (p=0.009 n=10+10)

name                  old speed      new speed      delta
DBOverwrite-12        46.8MB/s ±22%  53.7MB/s ± 1%  +14.64%  (p=0.000 n=10+9)
DBOverwriteRandom-12  33.1MB/s ±31%  43.1MB/s ± 2%  +30.03%  (p=0.009 n=10+10)

Some extra memory allocation during the benchmarks

There were more memory being allocated for some tests. These are:

  • DBOverwrite (do note that the speed and time/op was better)
name                  old alloc/op   new alloc/op   delta
DBOverwrite-12            102B ± 6%      135B ± 4%  +32.58%  (p=0.000 n=10+10)
DBOverwriteRandom-12      119B ± 3%      160B ± 6%  +35.30%  (p=0.000 n=9+10)
  • DBWriteRandom (no difference in speed and time/op)
DBWriteRandom-12                 142B ± 2%      165B ± 3%   +16.20%  (p=0.000 n=10+9)
  • DBGetRandom (speed and time/op was better)
DBGetRandom-12      16.0 ± 0%      18.0 ± 0%  +12.50%  (p=0.000 n=10+10)

Full Benchstat results

name                   old time/op    new time/op    delta
DBRead-12                 236ns ± 1%     239ns ± 1%   +1.50%  (p=0.000 n=9+10)
DBReadGC-12               251ns ±10%     255ns ± 1%     ~     (p=0.382 n=10+6)
DBReadUncompressed-12     221ns ± 1%     223ns ± 1%   +1.04%  (p=0.000 n=9+8)
DBReadTable-12            222ns ±14%     227ns ± 1%     ~     (p=0.059 n=9+8)
DBReadReverse-12          339ns ± 2%     331ns ± 1%   -2.44%  (p=0.000 n=9+10)
DBReadReverseTable-12     335ns ± 1%     340ns ± 6%     ~     (p=0.483 n=9+10)
DBReadConcurrent-12      50.7ns ±12%    54.3ns ±28%     ~     (p=0.897 n=8+10)
DBReadConcurrent2-12     61.9ns ±22%    62.5ns ±21%     ~     (p=0.912 n=10+10)

name                   old speed      new speed      delta
DBRead-12               492MB/s ± 1%   485MB/s ± 1%   -1.48%  (p=0.000 n=9+10)
DBReadGC-12             464MB/s ± 9%   455MB/s ± 1%     ~     (p=0.428 n=10+6)
DBReadUncompressed-12   524MB/s ± 1%   519MB/s ± 1%   -1.04%  (p=0.000 n=9+8)
DBReadTable-12          524MB/s ±13%   511MB/s ± 1%     ~     (p=0.059 n=9+8)
DBReadReverse-12        342MB/s ± 2%   351MB/s ± 1%   +2.49%  (p=0.000 n=9+10)
DBReadReverseTable-12   346MB/s ± 1%   342MB/s ± 5%     ~     (p=0.483 n=9+10)
DBReadConcurrent-12    2.23GB/s ±25%  2.19GB/s ±24%     ~     (p=0.905 n=9+10)
DBReadConcurrent2-12   1.91GB/s ±20%  1.89GB/s ±19%     ~     (p=0.912 n=10+10)

name                   old alloc/op   new alloc/op   delta
DBRead-12                 17.0B ± 0%     17.0B ± 0%     ~     (all equal)
DBReadGC-12               17.0B ± 0%     17.0B ± 0%     ~     (all equal)
DBReadUncompressed-12     20.0B ± 0%     16.0B ± 0%  -20.00%  (p=0.000 n=10+10)
DBReadTable-12            17.0B ± 0%     17.0B ± 0%     ~     (all equal)
DBReadReverse-12          66.0B ± 0%     66.0B ± 0%     ~     (all equal)
DBReadReverseTable-12     66.0B ± 0%     67.0B ± 0%   +1.52%  (p=0.000 n=10+9)
DBReadConcurrent-12       24.8B ±57%      9.4B ±17%  -62.10%  (p=0.000 n=10+10)
DBReadConcurrent2-12      49.1B ±36%     31.7B ± 4%  -35.51%  (p=0.000 n=10+9)

name                   old allocs/op  new allocs/op  delta
DBRead-12                  0.00           0.00          ~     (all equal)
DBReadGC-12                0.00           0.00          ~     (all equal)
DBReadUncompressed-12      0.00           0.00          ~     (all equal)
DBReadTable-12             0.00           0.00          ~     (all equal)
DBReadReverse-12           0.00           0.00          ~     (all equal)
DBReadReverseTable-12      0.00           0.00          ~     (all equal)
DBReadConcurrent-12        0.00           0.00          ~     (all equal)
DBReadConcurrent2-12       0.00           0.00          ~     (all equal)
---------------------------------------------------------------
name                         old time/op    new time/op    delta
DBWrite-12                     2.05µs ± 1%    2.01µs ± 0%    -1.80%  (p=0.000 n=10+9)
DBWriteBatch-12                 535ns ± 2%     589ns ±19%      ~     (p=0.353 n=10+10)
DBWriteUncompressed-12         2.04µs ± 1%    2.01µs ± 1%    -1.48%  (p=0.000 n=10+10)
DBWriteBatchUncompressed-12    1.16µs ±35%    0.53µs ± 2%   -54.80%  (p=0.000 n=10+10)
DBWriteRandom-12               2.51µs ± 1%    2.49µs ± 2%      ~     (p=0.247 n=10+10)
DBWriteRandomSync-12            376µs ± 4%   1617µs ±204%      ~     (p=0.089 n=10+10)

name                         old speed      new speed      delta
DBWrite-12                   56.6MB/s ± 1%  57.7MB/s ± 0%    +1.84%  (p=0.000 n=10+9)
DBWriteBatch-12               217MB/s ± 2%   200MB/s ±17%      ~     (p=0.353 n=10+10)
DBWriteUncompressed-12       56.7MB/s ± 1%  57.6MB/s ± 1%    +1.50%  (p=0.000 n=10+10)
DBWriteBatchUncompressed-12   104MB/s ±48%   221MB/s ± 2%  +111.50%  (p=0.000 n=10+10)
DBWriteRandom-12             46.3MB/s ± 1%  46.5MB/s ± 2%      ~     (p=0.271 n=10+10)
DBWriteRandomSync-12          308kB/s ± 4%   221kB/s ±91%      ~     (p=0.121 n=10+10)

name                         old alloc/op   new alloc/op   delta
DBWrite-12                       110B ± 1%      110B ± 0%      ~     (p=0.103 n=10+7)
DBWriteBatch-12                 18.0B ± 0%     19.5B ±13%      ~     (p=0.108 n=9+10)
DBWriteUncompressed-12           109B ± 1%      109B ± 1%    -0.64%  (p=0.044 n=10+10)
DBWriteBatchUncompressed-12     30.0B ±43%     17.0B ± 0%   -43.33%  (p=0.000 n=10+9)
DBWriteRandom-12                 142B ± 2%      165B ± 3%   +16.20%  (p=0.000 n=10+9)
DBWriteRandomSync-12             259B ± 9%      258B ±11%      ~     (p=0.948 n=10+8)

name                         old allocs/op  new allocs/op  delta
DBWrite-12                       3.00 ± 0%      3.00 ± 0%      ~     (all equal)
DBWriteBatch-12                  0.00           0.00           ~     (all equal)
DBWriteUncompressed-12           3.00 ± 0%      3.00 ± 0%      ~     (all equal)
DBWriteBatchUncompressed-12      0.00           0.00           ~     (all equal)
DBWriteRandom-12                 3.00 ± 0%      3.00 ± 0%      ~     (all equal)
DBWriteRandomSync-12             3.00 ± 0%      3.00 ± 0%      ~     (all equal)
---------------------------------------------------------------
name             old time/op    new time/op    delta
DBSeek-12          6.17µs ± 1%    6.13µs ± 0%  -0.69%  (p=0.000 n=10+10)
DBSeekRandom-12    8.55µs ± 1%    8.49µs ± 0%  -0.67%  (p=0.001 n=10+10)

name             old alloc/op   new alloc/op   delta
DBSeek-12          1.51kB ± 0%    1.51kB ± 0%    ~     (all equal)
DBSeekRandom-12    1.75kB ± 0%    1.75kB ± 0%    ~     (all equal)

name             old allocs/op  new allocs/op  delta
DBSeek-12            18.0 ± 0%      18.0 ± 0%    ~     (all equal)
DBSeekRandom-12      23.0 ± 0%      23.0 ± 0%    ~     (all equal)
---------------------------------------------------------------
name                  old time/op    new time/op    delta
DBOverwrite-12          2.54µs ±25%    2.16µs ± 1%  -14.96%  (p=0.000 n=10+9)
DBOverwriteRandom-12    3.63µs ±26%    2.69µs ± 2%  -25.88%  (p=0.009 n=10+10)

name                  old speed      new speed      delta
DBOverwrite-12        46.8MB/s ±22%  53.7MB/s ± 1%  +14.64%  (p=0.000 n=10+9)
DBOverwriteRandom-12  33.1MB/s ±31%  43.1MB/s ± 2%  +30.03%  (p=0.009 n=10+10)

name                  old alloc/op   new alloc/op   delta
DBOverwrite-12            102B ± 6%      135B ± 4%  +32.58%  (p=0.000 n=10+10)
DBOverwriteRandom-12      119B ± 3%      160B ± 6%  +35.30%  (p=0.000 n=9+10)

name                  old allocs/op  new allocs/op  delta
DBOverwrite-12            3.00 ± 0%      3.00 ± 0%     ~     (all equal)
DBOverwriteRandom-12      3.00 ± 0%      3.00 ± 0%     ~     (all equal)
---------------------------------------------------------------
name      old time/op    new time/op    delta
DBPut-12    2.10µs ± 2%    2.07µs ± 1%  -1.56%  (p=0.003 n=10+10)

name      old speed      new speed      delta
DBPut-12  55.2MB/s ± 2%  56.1MB/s ± 1%  +1.59%  (p=0.003 n=10+10)

name      old alloc/op   new alloc/op   delta
DBPut-12      112B ± 1%      111B ± 1%  -0.89%  (p=0.005 n=10+10)

name      old allocs/op  new allocs/op  delta
DBPut-12      3.00 ± 0%      3.00 ± 0%    ~     (all equal)
---------------------------------------------------------------
name            old time/op    new time/op    delta
DBGet-12          2.43µs ± 1%    2.35µs ± 1%   -3.12%  (p=0.000 n=9+8)
DBGetRandom-12    4.97µs ± 0%    4.86µs ± 1%   -2.14%  (p=0.000 n=8+10)

name            old alloc/op   new alloc/op   delta
DBGet-12            788B ± 0%      788B ± 0%     ~     (p=0.889 n=10+8)
DBGetRandom-12    1.00kB ± 0%    1.05kB ± 0%   +4.29%  (p=0.000 n=10+10)

name            old allocs/op  new allocs/op  delta
DBGet-12            12.0 ± 0%      12.0 ± 0%     ~     (all equal)
DBGetRandom-12      16.0 ± 0%      18.0 ± 0%  +12.50%  (p=0.000 n=10+10)

For the use case of the project I'm working on, this PR was an immense improvement. From the benchstat results, we can assume that it would be an immense improvement for other projects as well.

The sync.Pool implementation in go std library has improved drastically
since the buffer pool used here has been first implemented. Using
sync.Pool results in drastic memory reduction as well as a simpler
buffer_pool code.
Instead of allocating new memory with make() for db_compaction writing,
fetch from the buffer pool and return it when finished, reducing memory
allocation.
@syndtr syndtr merged commit 079c29c into syndtr:master Aug 19, 2021
@syndtr
Copy link
Owner

syndtr commented Aug 19, 2021

Merged. Thank You!

kcalvinalvin added a commit to kcalvinalvin/btcd that referenced this pull request Sep 4, 2021
Goleveldb recently had a PR in where memory allocation was reduced
drastically (github.com/syndtr/goleveldb/pull/367).  Update goleveldb
to use that PR.
kcalvinalvin added a commit to kcalvinalvin/btcd that referenced this pull request Nov 17, 2021
Goleveldb recently had a PR in where memory allocation was reduced
drastically (github.com/syndtr/goleveldb/pull/367).  Update goleveldb
to use that PR.
kcalvinalvin added a commit to kcalvinalvin/btcd that referenced this pull request Nov 17, 2021
Goleveldb recently had a PR in where memory allocation was reduced
drastically (github.com/syndtr/goleveldb/pull/367).  Update goleveldb
to use that PR.
jcvernaleo pushed a commit to btcsuite/btcd that referenced this pull request Nov 30, 2021
Goleveldb recently had a PR in where memory allocation was reduced
drastically (github.com/syndtr/goleveldb/pull/367).  Update goleveldb
to use that PR.
roylee17 pushed a commit to lbryio/lbcd that referenced this pull request May 24, 2022
Goleveldb recently had a PR in where memory allocation was reduced
drastically (github.com/syndtr/goleveldb/pull/367).  Update goleveldb
to use that PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants