-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement in bbolt/walletdb speed #111
Conversation
Rationale for passing the opts is that I plan to use additional bbolt options for tuning via Because the BBoltDB bdb backend is used for more than just the walletdb, different options appropriate for different use cases (wallet vs. NeutrinoDB, etc), so a one-size-fits-all solution setting static options in just one place would likely be counter-productive. |
Also, now I have a real benchmark for you for bboltdb recovery time - the added start-up overhead of rebuilding freelist on a GNC is approximately 40 seconds with a 1GB walletdb file. The same test on my i7 desktop is about 3 seconds. |
|
Unfortunately, the configuration that works for the wallet database is not going to be appropriate for the neutrino database, and you won't be able to easily support 32-bit platforms - and, to be clear, that is broken right now, without this patch, due to the way Bolt/BBolt will default to unclamped direct memory mapping and the default expansion configuration, which will cause the database to fail to mmap once either the walletdb (or neutrinodb) grows to 4GB (or both grow to 2GB) on 32-bit platforms. These sizes are commonly already seen in production for many Gridnodes, where the median database size is just about ~2GB. This is directly noted as a limitation and explicitly mentioned in the project's README at https://github.com/etcd-io/bbolt#caveats--limitations and boltdb/bolt#308 (comment) - Unfortunately, using DB compaction as a way to keep these size factors under control would actually cause conditions of massive I/O requirements from the constant recopying and re-mapping of the database when the compacted version is used, so greatly increasing the chances of database corruption, so using different options is the only solution. Or using a different database. So, unless we decide to throw out BBoltDB/BoltDB completely and change to a different database that doesn't use the direct memory mapping, or we switch to using different databases specifically tuned to each use case (which seems like an even larger chance for bugs), then using different options is absolutely a necessity, even more if we plan to actually support Gridnodes having the hard shutdown path in the future, because the options needed for expansion and initial mmap size change dynamically depending on the current size of the on-disk database. This is actually the way etcd (also bbolt's upstream) handles it as well. I'm actually totally with you here - I want one configuration for everything too, but it's just not going to work if we keep the databases we're currently using. I'll mention as I have before to you that this is the same route currently being taken by lnd and upstream btcd as well. Perhaps we could find one set of options that mostly work for most people for the regular distribution, and I'd maintain a separate fork for GF use? I'm not sure that's a situation you'd be happy with either, but it could be an option.
From everything I've looked at, that seems to be the normal way of doing it, and that pattern is essentially universal, see https://golang.org/pkg/database/sql/ etc. It's also done this way upstream, as well as in all the related projects. Even Decred has the same interface - amazing they haven't rewritten it yet! While I guess we could do a rewrite here, we'd be doing things completely differently than everyone else, it would be a pretty large divergence from upstream btcd and all related projects. |
Also, I've already investigated switching to storing data the same way as Core which uses Sleepycat BerkeleyDB as their on-disk format for wallet data (and actually for a few others databases they store as well), but it just isn't going to happen with Go, unless we either do a lot of work (and Oracle will probably sue us, even without any grounds to win, because that's what they do) or put ourselves at the mercy of Oracle's licensing and giant C++ codebase. Core is using v4.8 of BDB with some simple modifications - the main reason they don't upgrade to newer BDB is because the database was purchased by Oracle, who changed the licensing terms (and if they changed them once, they might change them again). Going back through the mailing lists you can see there was (often heated) discussion of upgrading the on-disk DB (and thus BDB) for quite a while (the main concerns were new bugs and wallet file compatability between systems), but the Oracle purchase essentially eliminated that option completely. You can certainly build the Satoshi code with any current Oracle BDB, but upon opening the database, it's on-disk format is automatically upgraded by the Oracle BDB code, and becomes incompatible with any previous versions, so you can never downgrade - there is no path back to the more freely licensed releases. Oracle becomes the upstream and only path for future bugfixes. Also, Oracle changes the on-disk format regularly (for awhile, upon every release.) I'd link the changelogs for you, but you need an Oracle account to even look at them. Oracle does make Go bindings for BDB available, but adding BDB would require adding a large non-Go codebase to pktd (C++), it would massively complicate the build process, and the current Go-bindings are not "freely available" in the OSI sense - you need to register an account with Oracle and agree to their license agreement to get them, which of course states they can change the licensing at any time. They also apparently bullied the developer of the alternative open Go bindings (https://github.com/jsimonetti/berkeleydb) out of the game, so those are unmaintained for nearly 5 years now. |
I will accept this if you change the interface so that it is a compile time error if arguments are passed wrong. |
I'll look into what's required to do so. |
OK, seems Go does have (a few!) ways to do compile time assertions, so I'll get this fixed up. |
@cjdelisle Other than the requested change (and an obligatory rebase), would you like me to add a clamp on the initial mmap size for 32-bit systems? While this will not actually "solve" the issue of the databases eventually becoming too big for 32-bit systems, it will greatly delay the inevitable OOM condition. @cjdelisle Edit: Besides a clamp on initial mmap size, I can import my database compaction code and have it run before the databases open on 32-bit systems - this will hamper performance (both start-up time due to the wait while the database is compacted, and run-time, as bbolt would then need to copy the database over and over to grow it along with the subsequent remappings) but would likely allow a 32-bit system to run for quite some time before any OOM condition. This can be done in a second, dependent PR, if you'd like, since it would add a lot of code. Having the compaction code in included might be useful, because it could be triggered by a command-line argument and thus available to everyone, not just 32-bit systems. Also, if I were to take this approach, some the performance issues could be mitigated by not compacting databases smaller than a certain size, say, 512MB or 1GB, and writing out a last compaction time, so compaction would not occur on every start-up, but only after some duration, such as 24 hours, has elapsed. Edit 2: I will also update the documentation to advise of the limitations of 32-bit systems, along with some suggestions, such as not running pktd and pktwalet on the same (32-bit) system - in the end, though, all this is sort of a temporary measure, and 32-bit systems will eventually become unsupportable, but the above changes will buy significant time. |
|
@cjdelisle Will do, even though I was learning how to do Go compile-time assertions! 🥇
Ah, see, this all sort of stems from me assuming pktwallet would be the same as the pktd code, which does let the user change the database out from under us:
The wallet, thankfully, has no equivalent function. We should probably remove that from pktd, seems like a landmine waiting to be stepped on. |
Yes, drop it |
* Bypass most abstractions * Use standard function definitions * Update GoDocs, package README's etc.
|
Also to pre-emptively answer a few questions that might come up:
|
* Remove option and documentation for the currently unimplemented memdb. * Update documentation to relect actual API. * Comment and document some TODO items. * Updat the primary README.md (in PR #220).
@cjdelisle Updated with corrections for most items noted at #111 (comment) |
|
First, we take the size of the walletdb and set our initial mmap size to (current size * 2.5) - this will reduces the amount of re-mapping that needs to be done at doubling and also reduces memory fragmentation - actual result is less memory usage due to reduced fragmentation, though higher usage initially.
Also enable the NoFreelistSync option to not sync the freelist to disk, which has a massive improvement on write performance.
Combined, tests on a Gridnode show a ~250% gain.
Updated Neutrino and wallettool for the new API, as well as relevant documentation.