-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-3416 LMDB-based SAIL. #3413
GH-3416 LMDB-based SAIL. #3413
Conversation
|
6fcb342
to
ae041b4
Compare
That query performance is pretty impressive! Edit: Any chance it's too good to be true? It's miles ahead of the MemoryStore. |
I'm also not sure if the numbers are correct. I've re-run the benchmarks several times with the same results. BTW: I've used MDB_NOSYNC and MDB_NOMETASYNC for the benchmarks. Maybe most of the data is cached in (off-heap) memory. I will dig into it. |
* byte 12 - A : the UTF-8 encoded the encoded context identifer | ||
* </pre> | ||
* | ||
* @author Jeen Broekstra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this Author field should be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some files like ContextStore.java are just (almost) verbatim copies from NativeStore. Therefore I didn't touch the author fields.
Should I change or extend the field even for small changes?
I think this is fantastic. I would love to have it with longs as identifiers. Making it usable for large datasets like UniProt or Wikidata. I put some small issues in already from a brief first look. But to me it looks like good quality code. |
I also think that longs are the way to go. To keep storage space efficiency also VarInts can be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very impressvie @kenwenzel , thanks! I'll try and find time soon to do a more comprehensive review as soon as possible.
I've noticed you have copy-pasted a lot of existing code into this new module to be able to reuse it (can tell by the use of old copyright headers and author tags). Two things:
- I'd prefer treating these as "new" (so with a new copyright header and author) even if they are derived from existing code.
- I haven't looked in detail but do we need code duplication in this fashion, or is there some way we can organize this to make code reuse a little easier?
core/sail/lmdb/pom.xml
Outdated
<dependency> | ||
<groupId>org.lwjgl</groupId> | ||
<artifactId>lwjgl-lmdb</artifactId> | ||
<version>${lwjgl.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.lwjgl</groupId> | ||
<artifactId>lwjgl-lmdb</artifactId> | ||
<classifier>natives-linux</classifier> | ||
<version>${lwjgl.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.lwjgl</groupId> | ||
<artifactId>lwjgl-lmdb</artifactId> | ||
<classifier>natives-macos</classifier> | ||
<version>${lwjgl.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.lwjgl</groupId> | ||
<artifactId>lwjgl-lmdb</artifactId> | ||
<classifier>natives-windows</classifier> | ||
<version>${lwjgl.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.lwjgl</groupId> | ||
<artifactId>lwjgl</artifactId> | ||
<classifier>natives-linux</classifier> | ||
<version>${lwjgl.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.lwjgl</groupId> | ||
<artifactId>lwjgl</artifactId> | ||
<classifier>natives-macos</classifier> | ||
<version>${lwjgl.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.lwjgl</groupId> | ||
<artifactId>lwjgl</artifactId> | ||
<classifier>natives-windows</classifier> | ||
<version>${lwjgl.version}</version> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to to check the IP status of these and if necessary file CQs for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to file CQs against this: they are not sufficiently documented in ClearlyDefined, and not previously registered for an IP check with Eclipse Foundation either. I'll start filing some CQs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gotten CQs for the two main libraries (lwjgl and lwjgl-lmdb) approved. If we include the native extensions as optional runtimes only, we won't need to file additional CQs for those (especially since they're just different compilations of essentially the same code).
@@ -0,0 +1,41 @@ | |||
/******************************************************************************* | |||
* Copyright (c) 2015 Eclipse RDF4J contributors, Aduna, and others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This copyright header needs to be changed - even if you copied it from an existing file, it's essentially a new contribution I'd say.
@kenwenzel can you please create an issue in our issue tracker that describes the proposed improvement, and link this PR to it? And if you have them, I'd appreciate some pointers to the source of the LMDB libraries that you're using, and any licensing information for them - I'll need that info to log them for IP review. |
I'm on OS X with the new ARM based M1 processor. Seems that support is coming in the next release (3.3.0). I'm currently not able to run the tests or benchmarks. We could consider trying out the 3.3.0 snapshot version. |
Short update from my side: I'm going to switch to varint encoded long ids and I'm planning to push this change within the next few days. |
@jeenbroekstra You mean besides #3416? So I should create an LMDB-specific issue with some explanation? |
Ah, no, you can use that issue, but it was not clear to me that was what you were using. Can you link them up by mentioning the issue number in your commit messages and the PR description please? |
@hmottestad I've pushed a commit with an update to LWJGL 3.3.0. |
A short update: I have some open issues with this.
|
I got it running on my laptop :) Thanks! I took a look at the benchmarks and changed the complex query one to this:
It prints |
Thank you! I will dig into this. |
Minor remark - you're using isse number 3415 in several commits, but the actual related issue is #3416. |
@jeenbroekstra The SPARQL compliance tests for LMDB fail now due to the optional dependencies to LWJGL native libraries. |
@kenwenzel Ah, yes, adding them as runtime dependence in the compliance test module should fix that |
… hashed values.
Findings:
|
…ove runtime of test.
…and triple store.
This is now feature complete in comparison to the native store. Two open points are:
Point 1 is really necessary while point 2 would be nice to have. How should we proceed? Do you want to merge this PR first and then we add GC and maybe other features? |
Previously we've marked features as experimental or for internal use only. That way we can merge early while still allowing for large changes later on. |
As the work is stand alone as a new sail, I agree with @hmottestad to merge it with an experimental tag. |
…Store as experimental.
Great work @kenwenzel , thanks! |
GitHub issue resolved: #3416
Briefly describe the changes proposed in this PR:
This change adds a full LMDB-based SAIL that stores values and triples in two LMDB databases.
PR Author Checklist (see the contributor guidelines for more details):
mvn process-resources
to format from the command line)