Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect empty path determination? #1628

Open
pfps opened this issue Nov 18, 2024 · 4 comments
Open

incorrect empty path determination? #1628

pfps opened this issue Nov 18, 2024 · 4 comments

Comments

@pfps
Copy link

pfps commented Nov 18, 2024

The query

https://qlever.cs.uni-freiburg.de/wikidata-test/SBRpDD

SELECT DISTINCT ?c ?cLabel WHERE {
{ ?c wdt:P31/wdt:P279* ?c }
OPTIONAL { ?c rdfs:label ?cLabel . FILTER ( lang(?cLabel) = 'en' ) }
}

results in the message

This query might have to evaluate the empty path, which is currently not supported. In file "/home/local/qlever/qlever-code/src/engine/TransitivePathImpl.h " at line 158

But it doesn't appear that there is an empty path in this query.

@hannahbast
Copy link
Member

hannahbast commented Nov 19, 2024

@pfps Thanks for another monster query! This variant of the query does the job. It takes around 10 minutes (because it has to produce the complete wdt:P317/wdt:P279*, which has 3.7 B results and is expensive to compute), but it works and with little RAM at that:

SELECT * WHERE {
  { SELECT DISTINCT ?s WHERE { ?s wdt:P31/wdt:P279* ?o . FILTER (?s = ?o) } }
  OPTIONAL { s rdfs:label ?s_label FILTER (LANG(?s) = "en") }
}

https://qlever.cs.uni-freiburg.de/wikidata/saPuxK

Your ?c wdt:P31/wdt:P279* ?c should actually be treated just like ?c wdt:P31/wdt:P279* ?tmp . FILTER (?c = ?tmp). We will investigate why QLever does not do that.

@pfps
Copy link
Author

pfps commented Nov 19, 2024

I have a sequence of queries that try to do similar things, which I ran over the summer when looking for issues in the Wikidata ontology. These queries look for instance loops, i.e, a class that is an instance of itself when the intended meaning of P279 (instances of subclasses are also instances) is taken into account.

The desired query is:

SELECT DISTINCT ?c ?cLabel WHERE {
?c wdt:P31/wdt:P279* ?c .
OPTIONAL { ?c rdfs:label ?cLabel . FILTER ( lang(?cLabel) = 'en' ) }
}

but that triggers the empty path error.

A closely related query that sidesteps the empty path message is:

SELECT DISTINCT ?c ?cLabel WHERE {
?c wdt:P31/wdt:P279+ ?c .
OPTIONAL { ?c rdfs:label ?cLabel . FILTER ( lang(?cLabel) = 'en' ) }
}

This runs in 22 seconds but might not include all the results I want because it excludes the empty subclass path.

The query that I created to include the empty subclass path is:

EDIT: The query I had here was a slow one. The fast query is as shown now.

SELECT DISTINCT ?c ?cLabel WHERE {
{ ?c wdt:P31/wdt:P279+ ?c } UNION { ?c wdt:P31 ?c }
OPTIONAL { ?c rdfs:label ?cLabel . FILTER ( lang(?cLabel) = 'en' ) }
}

This query runs in 24 seconds, much faster than your version.

I'm going to try various variants of this query and see how fast they run.

@pfps
Copy link
Author

pfps commented Nov 19, 2024

There is a very large speed difference between

SELECT DISTINCT ?c WHERE { 
  ?c wdt:P31/wdt:P279+ ?c . 
}

https://qlever.cs.uni-freiburg.de/wikidata/64XbsS

and

SELECT DISTINCT ?c WHERE { 
  ?c wdt:P31/wdt:P279+ ?s . FILTER ( ?c = ?s )
}

https://qlever.cs.uni-freiburg.de/wikidata/69cPqa

I seem to remember that the second query used to be reasonably fast, but I may be remembering wrong.

@hannahbast
Copy link
Member

@pfps If you execute the queries and click on "Analysis", you see the difference:

  1. The first query uses a multi-column join, which is reasonably fast in this case because both wdt:P31 and wdt:P279+ have a manageable size, namely around 120M rows each.

  2. The second query lazily produces the complete wdt:P31/wdt:P279+ result, which has over 3 billion rows, and then filters it.

Of course, QLever could recognize that these two queries are equivalent and pick the query plan that is faster to execute. This would be easy to fix for this particular kind of query, but it is an extremely hard problem in general.

It is important to note that these are very particular queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants