Execution of SPARQL query (as part of a reasoning task) does not finish on version 3.1.3 (rdf4j) #2140

swc-kdzekov · 2020-04-27T12:13:56Z

Hello,
We updated our system rdf4j library from version 2.3.3 to version 3.1.3. Now, we are facing an issue running this query (which is part of reasoning process)

PREFIX h:<urn:pp:internalHelper/>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX skosxl:<http://www.w3.org/2008/05/skos-xl#>
PREFIX swcs:<http://schema.semantic-web.at/ppt/>
CONSTRUCT {?c a ?t. ?c a ?t2. } 
FROM <https://performance.sqa.semantic-web.at/MeSH2016-SQAGPS_ABC-localMemory/thesaurus> 
FROM <https://performance.sqa.semantic-web.at/MeSH2016-SQAGPS_ABC-localMemory/thesaurus/reasoning#7a9e2803-4ad7-4346-a9aa-e9fb4192b71b> 
WHERE {
    ?c a ?t.
    FILTER (?t IN (<https://pp-remotestore-test.semantic-web.at/SQA-GPO/GenericClass>))
    FILTER NOT EXISTS {
        ?c (swcs:appliedType|swcs:propagateType)/rdfs:subClassOf* ?t
    }
    FILTER NOT EXISTS {
        ?c ^skos:member ?root. ?root swcs:appliedType/rdfs:subClassOf* ?t
    }
    FILTER NOT EXISTS {
        ?root a skos:OrderedCollection; 
                skos:memberList ?member; 
                <http://schema.semantic-web.at/ppt/appliedType>/rdfs:subClassOf* ?t. 
        ?member rdf:rest*/rdf:first ?c.     }
    FILTER NOT EXISTS {
        ?c (skos:broader|^skos:narrower)+ ?root. ?root swcs:propagateType/rdfs:subClassOf* ?t
    }
    FILTER NOT EXISTS {
        ?c (skos:broader|^skos:narrower)*/(skos:topConceptOf|^skos:hasTopConcept) ?root. ?root swcs:appliedType/rdfs:subClassOf* ?t
    }
    OPTIONAL {?t rdfs:subClassOf* ?t2.} 
}

The execution of the task does not end on the newer version (3.1.3) but it runs (it takes about 40 seconds) on the older Rdf4j version (2.3.3). Could you maybe give us some insights about what changed between the two versions that might affect the execution of the query?
P.S I have attached the data
pp_project_mesh2016_sqagpsabclocalmemory (1).zip

The text was updated successfully, but these errors were encountered:

hmottestad · 2020-04-27T13:38:31Z

Rewriting all your FILTER NOT EXISTS together and changing your initial filter into a bind seems to make it fast:

PREFIX h:<urn:pp:internalHelper/>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX skosxl:<http://www.w3.org/2008/05/skos-xl#>
PREFIX swcs:<http://schema.semantic-web.at/ppt/>
CONSTRUCT {?c a ?t. ?c a ?t2. } 
FROM <https://performance.sqa.semantic-web.at/MeSH2016-SQAGPS_ABC-localMemory/thesaurus> 
FROM <https://performance.sqa.semantic-web.at/MeSH2016-SQAGPS_ABC-localMemory/thesaurus/reasoning#7a9e2803-4ad7-4346-a9aa-e9fb4192b71b> 
WHERE {
    BIND (<https://pp-remotestore-test.semantic-web.at/SQA-GPO/GenericClass> as ?t)
    ?c a ?t.
    FILTER NOT EXISTS {
        ?c (swcs:appliedType|swcs:propagateType)/rdfs:subClassOf* ?t .
        ?c ^skos:member ?root. ?root swcs:appliedType/rdfs:subClassOf* ?t .
        ?root a skos:OrderedCollection; 
                skos:memberList ?member; 
                <http://schema.semantic-web.at/ppt/appliedType>/rdfs:subClassOf* ?t. 
        ?member rdf:rest*/rdf:first ?c.     
        ?c (skos:broader|^skos:narrower)+ ?root. ?root swcs:propagateType/rdfs:subClassOf* ?t.
        ?c (skos:broader|^skos:narrower)*/(skos:topConceptOf|^skos:hasTopConcept) ?root. ?root swcs:appliedType/rdfs:subClassOf* ?t.
    }
    OPTIONAL {?t rdfs:subClassOf* ?t2.} 
}

Does this return the same results or am I missing something?

swc-kdzekov · 2020-04-27T14:01:32Z

Hi,
Thanks for the feedback. Truly, it runs fast but returns different results.
I'll try to fix the query using your approach.

hmottestad · 2020-04-27T14:32:25Z

@jeenbroekstra I am wondering if there is an edge case in your code from #1405 that isn't handled correctly.

The original query has no unions in it. But the query plan introduces unions. The code in isOutOfScopeForLeftArgBindings in JoinIterator sees this union and uses the MergeIteration (which is actually a loop join I think).

I guess we are introducing unions for the alternate paths |.

When I remove isOutOfScopeForLeftArgBindings the query runs at the same speed as it used to.

I've created a simplified version of this query that exhibits the same issue:

PREFIX h:<urn:pp:internalHelper/>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX skosxl:<http://www.w3.org/2008/05/skos-xl#>
PREFIX swcs:<http://schema.semantic-web.at/ppt/>

SELECT * WHERE {
	?c (skos:broader|^skos:narrower)*/(skos:topConceptOf|^skos:hasTopConcept) ?root. 
	?root swcs:appliedType <https://pp-remotestore-test.semantic-web.at/SQA-GPO/GenericClass>  
}

Query plan

Projection
   ProjectionElemList
      ProjectionElem "c"
      ProjectionElem "root"
   Join
      StatementPattern (costEstimate=2, resultSizeEstimate=3)
         Var (name=root)
         Var (name=_const_901ac966_uri, value=http://schema.semantic-web.at/ppt/appliedType, anonymous)
         Var (name=_const_89eb3ebb_uri, value=https://pp-remotestore-test.semantic-web.at/SQA-GPO/GenericClass, anonymous)
-->   Union   <--- this union here is the problem
         Join
            StatementPattern (costEstimate=2, resultSizeEstimate=52)
               Var (name=_anon_134cb562_26a0_4608_8b1e_3c0a080ca76c, anonymous)
               Var (name=_const_e8cbb1af_uri, value=http://www.w3.org/2004/02/skos/core#topConceptOf, anonymous)
               Var (name=root)
            ArbitraryLengthPath (costEstimate=710, resultSizeEstimate=504.3K)
               Var (name=c)
               Union
                  StatementPattern (resultSizeEstimate=13.5K)
                     Var (name=c)
                     Var (name=_const_7123f66a_uri, value=http://www.w3.org/2004/02/skos/core#broader, anonymous)
                     Var (name=_anon_134cb562_26a0_4608_8b1e_3c0a080ca76c, anonymous)
                  StatementPattern (resultSizeEstimate=13.5K)
                     Var (name=_anon_134cb562_26a0_4608_8b1e_3c0a080ca76c, anonymous)
                     Var (name=_const_14b25ded_uri, value=http://www.w3.org/2004/02/skos/core#narrower, anonymous)
                     Var (name=c)
               Var (name=_anon_134cb562_26a0_4608_8b1e_3c0a080ca76c, anonymous)
         Join
            StatementPattern (costEstimate=2, resultSizeEstimate=52)
               Var (name=root)
               Var (name=_const_d2459908_uri, value=http://www.w3.org/2004/02/skos/core#hasTopConcept, anonymous)
               Var (name=_anon_134cb562_26a0_4608_8b1e_3c0a080ca76c, anonymous)
            ArbitraryLengthPath (costEstimate=710, resultSizeEstimate=504.3K)
               Var (name=c)
               Union
                  StatementPattern (resultSizeEstimate=13.5K)
                     Var (name=c)
                     Var (name=_const_7123f66a_uri, value=http://www.w3.org/2004/02/skos/core#broader, anonymous)
                     Var (name=_anon_134cb562_26a0_4608_8b1e_3c0a080ca76c, anonymous)
                  StatementPattern (resultSizeEstimate=13.5K)
                     Var (name=_anon_134cb562_26a0_4608_8b1e_3c0a080ca76c, anonymous)
                     Var (name=_const_14b25ded_uri, value=http://www.w3.org/2004/02/skos/core#narrower, anonymous)
                     Var (name=c)
               Var (name=_anon_134cb562_26a0_4608_8b1e_3c0a080ca76c, anonymous)

abrokenjester · 2020-04-28T00:07:16Z

You're correct that path alternatives are internally treated as unions.

I don't think it's so much an edge case that we overlooked, but more that #1405 (or rather, its followup #1642) itself is an edge case. The fix we introduced to make sure we are correct according to the spec is really only relevant in very specific corner cases, and unfortunately it means evaluation of unions generally is a lot less efficient (as this shows).

Currently, we decide to use a merge join iteration based purely on the fact that something is a union. I'm wondering if we can do slightly smarter evaluation by inspecting the contents of the union clauses. I haven't thought this through fully but I wonder if we really only need to revert to merging afterwards if the union contains BIND operations in one of its arguments, or if it filters on variables that are not (also) bound inside the union itself.

abrokenjester · 2020-04-28T00:13:19Z

I'm also looking at why we have this logic in the JoinIterator itself in the first place, instead of in StrictEvaluationStrategy, and why/how this case is different from the decision to use a HashJoinIterator instead of a normal JoinIterator (which we do for subqueries).

- more efficient - less clutter as the decision on join strategy is now in the EvaluationStrategy, instead of in the joiniterator itself

hmottestad · 2020-04-28T06:12:02Z

I was thinking of maybe adding a “new scope” flag to union. Default to true. But if we introduce a union because of a path query we can set it to false. We could then introduce an optimizer that could optimize other unions by checking their sub tree for potential scoping issues.

abrokenjester · 2020-04-28T07:33:55Z

We can use the existing is isGraphPatternGroup property for that I think.

…dled more efficiently. Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

…oping - fixed issue in cloning that dropped attribute - fixed issue in querymodelnormalizer that dropped attribute - handled corner cases (filters, left-joins)

…ives - not strictly necessary as it's the default, but perhaps clearer

* GH-2140 use HashJoinIteration instead of doing loop join - more efficient - less clutter as the decision on join strategy is now in the EvaluationStrategy, instead of in the joiniterator itself * GH-2140 use isGraphPatternGroup consistently to determine variable scoping - fixed issue in cloning that dropped attribute - fixed issue in querymodelnormalizer that dropped attribute - handled corner cases (filters, left-joins) * GH-2140 explicitly set isGraphPatternGroup to false for path alternatives - not strictly necessary as it's the default, but perhaps clearer * GH-2140 disable failing test in deprecated federation sail

swc-kdzekov · 2020-05-01T08:17:36Z

Hi, just curious to know, is the bug going to be fixed for the next version (3.1.5) ?
Put it in the other way, could we expect that in the next version of rdf4j we would be able to run the query posted ?

hmottestad · 2020-05-01T08:43:55Z

Yes. @jeenbroekstra fixed the underlying bug so it should have the same performance now. Would you test that it works for you by doing:

git clone https://github.com/eclipse/rdf4j.git
mvn clean install -DskipTests

Then use this tag in your project: 3.1.5-SNAPSHOT

swc-kdzekov · 2020-05-01T11:53:31Z

Yes, i pulled and clean/install it, but still, my project requires onejar file. And i can not build it currently using the command "mvn -Passembly package". It throws errors.

hmottestad · 2020-05-01T11:57:19Z

I'll try to create one for you

hmottestad · 2020-05-01T12:19:42Z

eclipse-rdf4j-3.2.0-SNAPSHOT-onejar.jar.zip

swc-kdzekov · 2020-05-01T12:30:48Z

Thank you very much

swc-kdzekov · 2020-05-05T19:29:52Z

Hi again,
Using the git repository you shared i built the necessary artifacts and updated the system to 3.1.5-SNAPSHOT.
I confirm that the posted query executes, that is the good part.
However, the execution time comparing to an older version (2.3.3) is much longer.
For the same data corpus, on version 3.1.5-SNAPSHOT takes above 3 minutes, while on version 2.3.3 it takes 46 seconds.

hmottestad · 2020-05-05T20:04:33Z

There will be some documentation with the 3.2.0 release on how to use the new query .explain() feature. Hopefully that will make diagnosing this issue easier. We are also working on two improvements, one to the way we handle scoping or variables when parsing the SPARQL queries and another that exposes the this scoping in the query plan.

swc-kdzekov · 2020-05-08T06:33:39Z

Alright, thank you. I tested it again on Linux server, and the query execution was even faster.
I have one more question regarding the future releases, if you please.

Is there going to be official release for 3.1.5 version ?

Knowing this we would be able to have clear picture on how we will continue.
Thank you in advance.

hmottestad · 2020-05-08T06:56:54Z

All the fixes are included in the new 3.2.0 release.

swc-kdzekov changed the title ~~Execution of SPARQL query (as part of a reasoning task) does not finishes on version 3.1.3 (rdf4j)~~ Execution of SPARQL query (as part of a reasoning task) does not finish on version 3.1.3 (rdf4j) Apr 27, 2020

abrokenjester added 🐞 bug issue is a bug ⏩ performance labels Apr 28, 2020

abrokenjester added this to the 3.1.5 milestone Apr 28, 2020

abrokenjester added a commit that referenced this issue Apr 28, 2020

GH-2140 use HashJoinIteration instead of doing loop join

5ca589d

- more efficient - less clutter as the decision on join strategy is now in the EvaluationStrategy, instead of in the joiniterator itself

This was referenced Apr 28, 2020

GH-2140 use HashJoinIteration instead of doing loop join #2141

Merged

Expose if "rightArg is out of scope for leftArg" as part of the query plan #2119

Closed

hmottestad added a commit that referenced this issue Apr 28, 2020

GH-2140 Unions can now be created without a new scope and will be han…

7104576

…dled more efficiently. Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

hmottestad mentioned this issue Apr 28, 2020

GH-2140 Unions can have specified new scope #2147

Closed

4 tasks

abrokenjester self-assigned this Apr 29, 2020

abrokenjester added a commit that referenced this issue Apr 29, 2020

GH-2140 explicitly set isGraphPatternGroup to false for path alternat…

6123e3b

…ives - not strictly necessary as it's the default, but perhaps clearer

abrokenjester added a commit that referenced this issue May 1, 2020

GH-2140 disable failing test in deprecated federation sail

4a5f12b

abrokenjester closed this as completed in #2141 May 1, 2020

abrokenjester modified the milestones: 3.1.5, 3.2.0 May 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution of SPARQL query (as part of a reasoning task) does not finish on version 3.1.3 (rdf4j) #2140

Execution of SPARQL query (as part of a reasoning task) does not finish on version 3.1.3 (rdf4j) #2140

swc-kdzekov commented Apr 27, 2020 •

edited

Loading

hmottestad commented Apr 27, 2020

swc-kdzekov commented Apr 27, 2020

hmottestad commented Apr 27, 2020 •

edited

Loading

abrokenjester commented Apr 28, 2020 •

edited

Loading

abrokenjester commented Apr 28, 2020

hmottestad commented Apr 28, 2020

abrokenjester commented Apr 28, 2020

swc-kdzekov commented May 1, 2020

hmottestad commented May 1, 2020 •

edited

Loading

swc-kdzekov commented May 1, 2020

hmottestad commented May 1, 2020

hmottestad commented May 1, 2020

swc-kdzekov commented May 1, 2020

swc-kdzekov commented May 5, 2020

hmottestad commented May 5, 2020

swc-kdzekov commented May 8, 2020

hmottestad commented May 8, 2020

Execution of SPARQL query (as part of a reasoning task) does not finish on version 3.1.3 (rdf4j) #2140

Execution of SPARQL query (as part of a reasoning task) does not finish on version 3.1.3 (rdf4j) #2140

Comments

swc-kdzekov commented Apr 27, 2020 • edited Loading

hmottestad commented Apr 27, 2020

swc-kdzekov commented Apr 27, 2020

hmottestad commented Apr 27, 2020 • edited Loading

Query plan

abrokenjester commented Apr 28, 2020 • edited Loading

abrokenjester commented Apr 28, 2020

hmottestad commented Apr 28, 2020

abrokenjester commented Apr 28, 2020

swc-kdzekov commented May 1, 2020

hmottestad commented May 1, 2020 • edited Loading

swc-kdzekov commented May 1, 2020

hmottestad commented May 1, 2020

hmottestad commented May 1, 2020

swc-kdzekov commented May 1, 2020

swc-kdzekov commented May 5, 2020

hmottestad commented May 5, 2020

swc-kdzekov commented May 8, 2020

hmottestad commented May 8, 2020

swc-kdzekov commented Apr 27, 2020 •

edited

Loading

hmottestad commented Apr 27, 2020 •

edited

Loading

abrokenjester commented Apr 28, 2020 •

edited

Loading

hmottestad commented May 1, 2020 •

edited

Loading