-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execution of SPARQL query (as part of a reasoning task) does not finish on version 3.1.3 (rdf4j) #2140
Comments
Rewriting all your FILTER NOT EXISTS together and changing your initial filter into a bind seems to make it fast:
Does this return the same results or am I missing something? |
Hi, |
@jeenbroekstra I am wondering if there is an edge case in your code from #1405 that isn't handled correctly. The original query has no unions in it. But the query plan introduces unions. The code in I guess we are introducing unions for the alternate paths When I remove I've created a simplified version of this query that exhibits the same issue:
Query plan
|
You're correct that path alternatives are internally treated as unions. I don't think it's so much an edge case that we overlooked, but more that #1405 (or rather, its followup #1642) itself is an edge case. The fix we introduced to make sure we are correct according to the spec is really only relevant in very specific corner cases, and unfortunately it means evaluation of unions generally is a lot less efficient (as this shows). Currently, we decide to use a merge join iteration based purely on the fact that something is a union. I'm wondering if we can do slightly smarter evaluation by inspecting the contents of the union clauses. I haven't thought this through fully but I wonder if we really only need to revert to merging afterwards if the union contains BIND operations in one of its arguments, or if it filters on variables that are not (also) bound inside the union itself. |
I'm also looking at why we have this logic in the JoinIterator itself in the first place, instead of in StrictEvaluationStrategy, and why/how this case is different from the decision to use a HashJoinIterator instead of a normal JoinIterator (which we do for subqueries). |
- more efficient - less clutter as the decision on join strategy is now in the EvaluationStrategy, instead of in the joiniterator itself
I was thinking of maybe adding a “new scope” flag to union. Default to true. But if we introduce a union because of a path query we can set it to false. We could then introduce an optimizer that could optimize other unions by checking their sub tree for potential scoping issues. |
We can use the existing is |
…dled more efficiently. Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
…oping - fixed issue in cloning that dropped attribute - fixed issue in querymodelnormalizer that dropped attribute - handled corner cases (filters, left-joins)
…ives - not strictly necessary as it's the default, but perhaps clearer
* GH-2140 use HashJoinIteration instead of doing loop join - more efficient - less clutter as the decision on join strategy is now in the EvaluationStrategy, instead of in the joiniterator itself * GH-2140 use isGraphPatternGroup consistently to determine variable scoping - fixed issue in cloning that dropped attribute - fixed issue in querymodelnormalizer that dropped attribute - handled corner cases (filters, left-joins) * GH-2140 explicitly set isGraphPatternGroup to false for path alternatives - not strictly necessary as it's the default, but perhaps clearer * GH-2140 disable failing test in deprecated federation sail
Hi, just curious to know, is the bug going to be fixed for the next version (3.1.5) ? |
Yes. @jeenbroekstra fixed the underlying bug so it should have the same performance now. Would you test that it works for you by doing:
Then use this tag in your project: |
Yes, i pulled and clean/install it, but still, my project requires onejar file. And i can not build it currently using the command "mvn -Passembly package". It throws errors. |
I'll try to create one for you |
Thank you very much |
Hi again, |
There will be some documentation with the 3.2.0 release on how to use the new query .explain() feature. Hopefully that will make diagnosing this issue easier. We are also working on two improvements, one to the way we handle scoping or variables when parsing the SPARQL queries and another that exposes the this scoping in the query plan. |
Alright, thank you. I tested it again on Linux server, and the query execution was even faster. Is there going to be official release for 3.1.5 version ? Knowing this we would be able to have clear picture on how we will continue. |
All the fixes are included in the new 3.2.0 release. |
Hello,
We updated our system rdf4j library from version 2.3.3 to version 3.1.3. Now, we are facing an issue running this query (which is part of reasoning process)
The execution of the task does not end on the newer version (3.1.3) but it runs (it takes about 40 seconds) on the older Rdf4j version (2.3.3). Could you maybe give us some insights about what changed between the two versions that might affect the execution of the query?
P.S I have attached the data
pp_project_mesh2016_sqagpsabclocalmemory (1).zip
The text was updated successfully, but these errors were encountered: