Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction time can be quite big #226

Closed
melekes opened this issue Jul 3, 2018 · 15 comments
Closed

Compaction time can be quite big #226

melekes opened this issue Jul 3, 2018 · 15 comments

Comments

@melekes
Copy link

melekes commented Jul 3, 2018

Hey! I am investigation the performance decline in our software tendermint/tendermint#1835. We're using goleveldb as a primary DB for storing blocks plus some other data.

Looks like saving a batch of data (~1MB) can result in ~ 15 sec. due to compaction process or something:

tendermint/tendermint#1835 (comment)
(the time seems to be increasing with every spike)

The write speed is approximately 1MB per second.

Is this expected and we need to fine tune compaction options or this looks like a bug to you?

Thank you!

@ackratos
Copy link

do we have any findings or suggestions on tuning options here?

@melekes
Copy link
Author

melekes commented Mar 28, 2019

@ackratos have you tried cleveldb? compaction times are much lower there

@ackratos
Copy link

@ackratos have you tried cleveldb? compaction times are much lower there

We have tried cleveldb, we don't do block indexing much (we only index on default tx hash to txindex db...) We tested 100 million account in application.db, memory leak happens (we build latest goleveldb and levigo release but didn't apply this fix yet cosmos/cosmos-sdk#3842) and very crazy (26G/32G are used...). So no chance to hit compaction issue yet.

We decide stop trying cleveldb in short term and focus on tuning goleveldb, after tracing how compaction is triggered, I have a naive idea that delayed compaction with this parameter CompactionTotalSizeMultiplier: 15:

var defaultOptions = &opt.Options{OpenFilesCacheCapacity: 1024,
	BlockCacheCapacity: 768 / 2 * opt.MiB,
	WriteBuffer:        768 / 4 * opt.MiB, // Two of these are used internally
	Filter:             filter.NewBloomFilter(10),
	CompactionTotalSizeMultiplier: 15,
}

Should have some result tomorrow.

@syndtr
Copy link
Owner

syndtr commented Mar 28, 2019

You might want to try latest master branch, there're series of PR by @rjl493456442 that has been merged recently that improves compaction performance. See #264, #267, #268 and #270.

@nullne
Copy link

nullne commented Apr 24, 2019

@syndtr i met the same problem. the put latency increase high every rough 10 minutes. i am build the project will the latest goleveldb. hope this works

image

@rjl493456442
Copy link
Contributor

@nullne 10 milliseconds?

@nullne
Copy link

nullne commented Apr 24, 2019

@rjl493456442 yes, with 40-byte(not fixed) key and 200-byte(not fixed) value. except the spike the response time is quite low

@rjl493456442
Copy link
Contributor

@nullne From the go-ethereum aspect, we monitor the archive syncing procedure and this shows with the latest master branch, the write_delay is gone. It means no write operation is stuck anymore.

The average disk writing speed of geth node is 100 MB/s.

But it is possible that if your write load is super heavy, the compaction speed is definitely slower than the write speed, then a part of write operations will be delayed. It can be the cause of the spikes.

@nullne
Copy link

nullne commented Apr 24, 2019

@rjl493456442 thanks for your time
do you think that our write load is super heavy? is there anything to config to reduce the spikes? i have built the project and test the performance now, it takes time to see the difference if there is

BTW, the spikes is not that obvious until reading from LevelDB. i have no idea about this. i started 100 workers to write and later 100 workers to read.

@rjl493456442
Copy link
Contributor

@nullne You mean if there is no read, then the write spike is not obvious?

@nullne
Copy link

nullne commented Apr 24, 2019

@rjl493456442 yes
Image

@nullne
Copy link

nullne commented May 7, 2019

i tried rocksdb and c leveldb. finally we decided to use rocksdb via cgo

smira added a commit to smira/aptly-fork that referenced this issue Sep 17, 2019
There are number of changes which went in recently which should improve
performance: syndtr/goleveldb#226 (comment)
smira added a commit to smira/aptly-fork that referenced this issue Sep 17, 2019
There are number of changes which went in recently which should improve
performance: syndtr/goleveldb#226 (comment)
sliverc pushed a commit to aptly-dev/aptly that referenced this issue Sep 18, 2019
There are number of changes which went in recently which should improve
performance: syndtr/goleveldb#226 (comment)
smira added a commit to smira/aptly-fork that referenced this issue Sep 27, 2019
PR aptly-dev#876 actually upgraded goleveldb to 1.0.0, not to the latest master.

Recent changes to goleveldb should improve performance
syndtr/goleveldb#226 (comment)
smira added a commit to aptly-dev/aptly that referenced this issue Sep 27, 2019
PR #876 actually upgraded goleveldb to 1.0.0, not to the latest master.

Recent changes to goleveldb should improve performance
syndtr/goleveldb#226 (comment)
@melekes
Copy link
Author

melekes commented Aug 27, 2020

Tested Tendermint with the latest master (5c35d60). Compaction times are much lower now ~ 2-3s. Thanks! I think this can be closed now.

@melekes melekes closed this as completed Aug 27, 2020
@0zAND1z
Copy link

0zAND1z commented Aug 28, 2020

@melekes, Does this mean that Tendermint can handle heavy load reasonably well?

The paragraph under Database sectoin of Tendermint Core docs mentions this PR. Should someone update that para?

@melekes
Copy link
Author

melekes commented Aug 28, 2020

@melekes, Does this mean that Tendermint can handle heavy load reasonably well?

I'm currently investigating long pauses, which happen every now and then tendermint/tendermint#3905 (comment). As I said, compaction is no longer an issue.

The paragraph under Database sectoin of Tendermint Core docs mentions this PR. Should someone update that para?

good catch 👍 will do

melekes added a commit to tendermint/tendermint that referenced this issue Aug 31, 2020
melekes added a commit to tendermint/tendermint that referenced this issue Sep 2, 2020
melekes added a commit to tendermint/tendermint that referenced this issue Sep 4, 2020
* docs: goleveldb is much more stable now

Refs syndtr/goleveldb#226 (comment)

* rpc/core/events: make sure WS client receives every event

previously, if the write buffer was full, the response would've been
lost without any trace (log msg, etc.)

* rpc/jsonrpc/server: set defaultWSWriteChanCapacity to 1

Refs #3905
Closes #3829

setting write buffer capacity to 1 makes transactions count per block
more stable and also reduces the pauses length by 20s.

before: #3905 (comment) net.Read - 20s
after: net.Read - 0.66s

* rpc/jsonrpc/server: buffer writes and avoid io.ReadAll during read
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants