-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow loading of the Wikidata .bz2 dump #105
Comments
Complementary info:
The good point is that the increase of Wikidata volume does not affect runtime, just the storage size. |
Hi Patrice
When filling the statement db if I detect a concept meeting the constraint ("instance of" "scholary article" for example) then I forget this concept and I don't store the statements
I think we can considerably reduce the size of the statement db I can even propose a PR for such a mechanism Best regards |
The Wikidata dump became very big with 1.2 billion statements which makes the initial loading of the bz2 dump into lmdb particularly slow.
To speed-up this step, we could try:
instead of having 2 pass on the dump, one to get the properties and one to get the statements, we could do both in one pass and solve the property resolution subsequently with the db
instead of reading line by line, try with larger buffer blocks
The text was updated successfully, but these errors were encountered: