-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Badger datastore performance with English snapshot #85
Comments
@lidel : can/should we add a regression test for this so we don't do releases that fail for this usecase? |
@BigLep ideally, yes, but I'd say it is not feasible at this stage:
|
For the record, I switched to |
Another badger issue: ipfs/go-ds-badger#111 (panic: slice bounds out of range) |
The panic turned out to be a broken datastore (went away after re-generating). New/old issue tho: I will also try to retry with MaxTableSize set to 64MB (go-ipfs uses 16MB to allocate less memory up front) |
Tried finishing pinning English with go-ds-badger patched with: --- a/datastore.go
+++ b/datastore.go
@@ -107,13 +107,13 @@ func init() {
DefaultOptions.Options.ValueLogLoadingMode = options.FileIO
// Explicitly set this to mmap. This doesn't use much memory anyways.
- DefaultOptions.Options.TableLoadingMode = options.MemoryMap
+ DefaultOptions.Options.TableLoadingMode = options.FileIO
// Reduce this from 64MiB to 16MiB. That means badger will hold on to
// 20MiB by default instead of 80MiB.
//
// This does not appear to have a significant performance hit.
- DefaultOptions.Options.MaxTableSize = 16 << 20
+ DefaultOptions.Options.MaxTableSize = 64 << 20
} Helped a bit, but crashed again after 9h (memory limited to 20GB). This most likely could be solved by throwing enough RAM at nodes pinning the data, but pinning an existing Wikipedia snapshot should work on a consumer-grade PC. |
No luck. Next step is Ended up using flatfs with |
is this project using the Zim dumps rather than Wikimedias internal MWDumper.pl and related XML to MySQL import processes? I successfully got a working database from this process, and prefer not having to have another middle man in the dump and sync process ... ideally we would speed-improve the daily dump sync and automate it ... that's what Wikipedia is asking for. We should have this thing "live and usable" and always current. 9546678083 if you can or need help. |
@alzinging We were doing like this 15 years ago... wish you good luck with this approach ;) Your comment is anyway pretty out of topic in this ticket. |
badger shows consistent issues in go-ipfs 0.8.0 with 300GB of wikipedia_en_all_maxi_2021-02
Issues
How to reproduce
Reproduction of relevant import steps:
zimdump
from https://download.openzim.org/nightly/2021-02-12/zim-tools_linux-x86_64-2021-02-12.tar.gzWould be useful if someone reproduced the memory issue so we know its not specific to my box.
Things to try
The text was updated successfully, but these errors were encountered: