-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resource usage during sync #262
Comments
interesting, disk write pattern shows there is still room for improvements. |
Another run of Full dataset, one second per pixel: Five seconds per pixel, to better see the memory leak: Except for the short RocksDB spikes every 6-7 minutes or so, most of the time is spent waiting for data from the network or maxing out a CPU core while processing that data. It all looks very serialised, which means it will benefit from parallelisation. The average download speed is extremely low, at 7.83 kB/s. Disk I/O is a non-issue right now, on this SSD I'm using. |
is the keep climbing up red-line-RSS an indicator of memory leak? if yes, that is very bad. |
Yep: https://en.wikipedia.org/wiki/Resident_set_size What's weird is that even the stack keeps growing, albeit much slower. |
which region of blocks do you sync? I mean the block number. I noticed during block 600K to 700K, memory consumption is very high then stable at block 800K to 900K. @stefantalpalaru: can you share some script with me? how did you produce that svg? |
I started with an empty db and let it run until it crashed due to an assert in transaction rollback (vendor/nim-eth/eth/trie/db.nim:145 - "doAssert t.db.mostInnerTransaction == t and t.state == Pending"). I don't see block numbers in the output log, because those are logged at the TRACE level which is not included by default.
Freshly published: https://github.com/status-im/process-stats |
thank you very much. |
The backend database contribute significantly to block syncing speed. at 50GB+ (900K blocks), it become very slow. my current solution is: I created separate databases on separate physical drives. every time I have synced around or near 20GB, I move the database to drive A, and open it as read only database. when a database opened as read only on drive A, it will doing compaction faster because it does not have to compete with regular read write operation on drive B. without this poor man sharding, the drive activity will always 100%, while doing this simple sharding, the disk activity on both drive A or B only less than 30%. for comparison, using single db, to sync 1.4M blocks will take many hours. |
Thanks for sharing this, @jangko. BTW, how does the lmdb performance compare to rocksdb? |
I stopped using it because it is slower compared to rocksdb when still syncing below 100K blocks, don't know the performance if it contains more data. |
Would it be possible to actually use this approach of "poor man's sharding" as a solution? Maybe divide data into 10GB snapshots, each snapshot can be one such shard i.e. one rocksdb database, and then use those same snapshots to retrieve data across the network for faster sync among Nimbus clients? |
https://www.zeroknowledge.fm/9 - interview with one of the parity devs about how they're tuning rocksdb |
A look at allocated RAM (RSS) versus heap usage according to the GC: To get these heap stats, I added at the end of persistBlocks(), in nimbus/p2p/chain.nim: dumpNumberOfInstances()
echo "===", getTime().toUnix() (and an Nimbus compile flags: I ran Nimbus like this: I processed "heap.txt" using this quick and dirty script: https://gist.github.com/stefantalpalaru/0b502def452591aaca289ec8fc119e8b This looks like memory fragmentation to me, with the RSS growing from 47 to 219 MiB in 37 minutes. The memory leak is extremely small in comparison, with the used heap minimum going from about 5 to about 10 MiB. |
currently, our rocksdb using default configuration:
if we change some of the configurations:
|
@jangko, can we use the Premix's regress tool as a benchmarking utility when deciding whether to go for these RocksDB tweaks? It would be nice if we can create a database of blocks that can be distributed in some efficient way to multiple machines with various hardware configurations and then we'll be able to use |
here what I have done: we can use the hexary-trie to tweak and benchmark the database. both of the hexary-trie and database need more optimization. |
apparently this is still an issue, syncing a fresh nimbus instance on a high performance machine will result in mediocre sync performance (less than 10 blocks/s) with one thread blocking at 100% and all the other cores using less than 10% cpu |
Obsoleted by aristo - will need to be re-run |
SVG graph with per-process statistics provided by
pidstat
(missing the network activity, for now, but still interesting):That CPU usage over 100% must come from the multi-threaded rocksdb library.
The text was updated successfully, but these errors were encountered: