-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LuceneSailConnection breaks the semantics of a query with COUNT #1229
Comments
Thanks for reporting this @KMax , at first glance your analysis looks spot-on, and I can't immediately think we wouldn't do an in-place swap of the search pattern with its result. |
I had a look at a quick fix, but this will require some significant refactoring unfortunately. |
I coded (commit) a solution for QuerySpec (the rest of SearchQueryEvaluator subclasses should be refactored) and wrote a test. Are there any tests for the rest of the subclasses (DistanceQuerySpec and GeoRelationQuerySpec), so I'd refactor them too? Maybe I looked at wrong places. |
@KMax I had a look at your commit, looks legit. Would it be a lot of extra work to generalize your override in QuerySpec to all SearchQueryEvaluator implementations? The logic at first glance appears to be transferable. As for tests: unfortunately unit test coverage for the lucene sail components is low. By all means introduce more unit tests, we'd be grateful :) We do have a test suite that test the overall functionality and compliance of the lucene sail. But I just had a look and for some reason that abstract test suite is only executed for the Solr and ElasticSearch indexers, not for Lucene itself. Not your problem: I will do a quick fix around that and publish to the master branch. |
Yea, I'm working on it. Currently, trying to write unit tests based on the docs, so far without success.
Cool, I'll take a look. |
…ance eclipse-rdf4j/rdf4j#1229 compliance tests for Lucene itself
* develop: eclipse-rdf4j/rdf4j#1384 bug fix eclipse-rdf4j/rdf4j#1384 optimize formatter faster uniqueLang on empty base sail eclipse-rdf4j/rdf4j#1384 test and fix don't activate quick profile on skipTests shacl tests use logback explicitly instead of slf4j-simple eclipse-rdf4j/rdf4j#1290 allow to read and delete shapes eclipse-rdf4j/rdf4j#1236 clean install succesful eclipse-rdf4j/rdf4j#1236 moved test suites for geosparql, serql, shacl, and sparql eclipse-rdf4j/rdf4j#1236 moved tests for shacl and spin eclipse-rdf4j/rdf4j#1236 moved store and lucene test suites eclipse-rdf4j/rdf4j#1236 cleanup eclipse-rdf4j/rdf4j#1236 migrating sailmodel compliance [wip] eclipse-rdf4j/rdf4j#1236 sail compliance tests to respective modules fixed formatting eclipse-rdf4j/rdf4j#1229 compliance tests for Lucene itself Signed-off-by: Håvard Ottestad <hmottestad@gmail.com> # Conflicts: # shacl/src/main/java/org/eclipse/rdf4j/sail/shacl/ShaclSailConnection.java
Looks like
LuceneSailConnection.addBindingSets
is too smart than it should be. Let me explain with an example problem which it causes.LucenseSailConnection changes the semantics
Query:
Query plan before
LuceneSailConnection
:Query plan after
LuceneSailConnection
:As you can see
LuceneSailConnection
sets the bindings resulted from the full text search right after the projection, instead of just replacing the search pattern.It changes the semantics of the query. The projection produces more than a single solution if the full-text search produces any result.
QueryJoinOptimizer fixes the issue, but query become suboptimal
The QueryJoinOptimizer saves the situation by moving the binding set to the right side of the join. The query plan after the join optimizer:
However, the plan becomes suboptimal, since the results of the full-text search aren't used in the joins as they must be.
What is the fix?
Is there any reason for
LuceneSailConnection.addBindingSets
being so "smart"? Can it just put the resulted binding sets in the place where the search pattern was?The text was updated successfully, but these errors were encountered: