Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardize behavior of nested brackets and UNION clauses #103

Closed
VladimirAlexiev opened this issue Oct 29, 2019 · 15 comments
Closed

standardize behavior of nested brackets and UNION clauses #103

VladimirAlexiev opened this issue Oct 29, 2019 · 15 comments

Comments

@VladimirAlexiev
Copy link
Contributor

VladimirAlexiev commented Oct 29, 2019

Why?

I'm getting data from a root object, and a bunch of branches stemming from it that may include multiple values at any level.

I want to use UNION to avoid Cartesian product (returning all combinations of values from the different branches). This applies not only to SELECT, but also to CONSTRUCT and INSERT which afaik work over the result-set of a corresponding SELECT, and would be slowed down from the Caresian explosion.

Query

Consider this simple query that accesses no data

prefix wd: <http://www.wikidata.org/entity/> 

select * {
  bind(wd:Q156578 as ?co)
  bind(strafter(str(?co),str(wd:)) as ?WD)
  {
    bind(?WD as ?WD1)
  } union {
    bind(?WD as ?WD2)
  }
}

And even its simpler variant that doesn't use UNION

prefix wd: <http://www.wikidata.org/entity/> 

select * {
  bind(wd:Q156578 as ?co)
  bind(strafter(str(?co),str(wd:)) as ?WD)
  {
    bind(?WD as ?WD1)
  } 
}

Differences

  • GraphDB (and presumably rdf4j) return WD1, WD2 cc @jeenbroekstra
  • Virtuoso returns 37000 Error SP031: SPARQL compiler: The list of return values contains '*' but the pattern does not contain variables cc @kidehen
  • Jena does not return WD1, WD2 even though WD is returned in the same row! cc @afs
  • Blazegraph at https://query.wikidata.org does not return WD1, WD2
  • a Python in-memory repo does not return WD1, WD2
  • perl's RDF::Query returns these errors @kasei
Odd number of elements in hash assignment at C:/Strawberry/perl/site/lib/RDF/Query.pm line 210, <> line 13.
Bad IRI character: ' ' (0x20)

IMHO GraphDB and rdf4j are doing the right thing

@TallTed
Copy link
Member

TallTed commented Oct 29, 2019

@VladimirAlexiev -

First thing, GraphDB and RDF4J are doing the wrong thing, because SPARQL evaluates nested queries from inside-out (a/k/a bottom-up). ?WD1 and ?WD2 must be unbound, because ?WD has not yet been bound, because the BINDs of ?co and ?WD in the outer query are not available to the inner queries.

Then -- what version of Virtuoso (output of virtuoso-t -? or virtuoso-iodbc-t -?) did you test with? And did it return the same quoted error for both of your test queries?

I ask because I'm not seeing your results from Enterprise/Commercial Edition v08.03.3315 (9c4e63d226) built 2019-10-25 (behind URIBurner.com) --

  • q1 -- returns two rows with values for ?co and ?WD, and empty columns for ?WD1 and ?WD2
  • q2 -- returns Virtuoso 37000 Error SP031: SPARQL compiler: Internal error: sparp_equiv_tighten_by_equiv(): can't change deprecated eq (/cc @kidehen @IvanMikhailov)

-- nor Enterprise/Commercial Edition v07.20.3232 (2afcc45d7c) built 2019-08-09 (behind DBpedia.org) --

  • q1 -- returns two rows with values for ?co and ?WD, and empty columns for ?WD1 and ?WD2
  • q2 -- returns one row with values for ?co and ?WD, and an empty column for ?WD1

-- nor Open Source Edition (VOS) v07.20.3229 (17c4ba1) built 2018-08-17 (behind UniProt.org) --

  • q1 -- returns two rows with values for ?co and ?WD, and empty columns for ?WD1 and ?WD2
  • q2 -- returns one row with values for ?co and ?WD, and an empty column for ?WD1

With the obvious exception, these results are what should be delivered.

@abrokenjester
Copy link
Collaborator

@VladimirAlexiev, @TallTed is correct that the GraphDB/RDF4J behavior is incorrect here. In fact we have recently done several fixes in RDF4J's SPARQL engine to correct for these kinds of scoping corner cases (see for example eclipse-rdf4j/rdf4j#1405). If you'll try your second query in the latest version of RDF4J (3.0.2) you'll see that it in fact returns no binding for ?WD1. Unfortunately it looks as if the first query (with the union) is still not quite treated right, we'll investigate that.

The reason these kinds of scoping gotchas are hard for the RDF4J engine, by the way, is that its engine was originally designed as an iterative "top-down" evaluation mechanism. For most SPARQL queries, this is not a real issue as evaluation order has no real influence on the eventual result, and we have made corrections to cater for the more obvious cases where evaluation order does make a difference in scoping (there are several such cases in the DAWG query test suite, all of which RDF4J correctly evaluates). Clearly though, we haven't quite caught them all yet.

@VladimirAlexiev
Copy link
Contributor Author

@TallTed I tested on the DBpedia endpoint.

SPARQL evaluates nested queries from inside-out

But these are not queries, they are just brackets (q2) or UNION clauses (q1).

About the inside-out strategy: SPARQL also says that implementations are free to optimize, implying the situation where triple pattern results will be joined between the two levels. The problem in this case is that BIND doesn't perform a join but an assignment...

So: outer bindings are not visible in a set of brackets. This semantics may be correct, but is very non-intuitive and IMHO totally useless. @afs @ericprud, could you please comment?

@namedgraph
Copy link

namedgraph commented Oct 30, 2019

semantics may be correct, but is very non-intuitive and IMHO totally useless

@VladimirAlexiev what are you suggesting here? Changing the specification more to your liking? Intuitiveness is subjective.

You should really be looking at an algebra representation (e.g. on http://sparql.org):

(base <http://example/base/>
  (prefix ((wd: <http://www.wikidata.org/entity/>))
    (join
      (extend ((?WD (strafter (str ?co) (str wd:))))
        (extend ((?co wd:Q156578))
          (table unit)))
      (union
        (extend ((?WD1 ?WD))
          (table unit))
        (extend ((?WD2 ?WD))
          (table unit))))))

@afs
Copy link
Collaborator

afs commented Oct 30, 2019

@VladimirAlexiev -- There are already channels for bug reporting. When there are differences in implementations, the first step is to discuss with the implementers or use public-rdf-dawg-comments@w3.org.

w3c/rdf-tests is the place to propose tests but bind/bind07.rq looks like it covers the matter.

"inside-out" is in fact what you are used to : (1+3)*4 = 16

Thank you to @jeenbroekstra for explaining RDF4J.

Jena works by starting with a join tree and then tries to find better ways to evaluate. One important optimization is finding if a query can be execute iteratively (less memory, usually faster -- a form of index join with globally scoped variables).This area has been the scene of several bugs. Optimizations aren't always easy.

SPARQL defines the correct answers. Implementations can do what they like but the correct results are well defined.

@TallTed
Copy link
Member

TallTed commented Oct 30, 2019

@VladimirAlexiev

But these are not queries, they are just brackets (q2) or UNION clauses (q1).

Optional elements of the syntax certainly make it appear that these are "just" clauses, but they are in fact subqueries. Braces {} (when found within the main braces of a SPARQL query) surround subqueries. (Brackets [] surround blank nodes.) A UNION clause combines the results of 2 such subqueries.

I cannot immediately explain how you got your reported results on the DBpedia endpoint. You can see my results from that endpoint above. Can you provide live links that get your reported results?

@VladimirAlexiev
Copy link
Contributor Author

VladimirAlexiev commented Oct 30, 2019

@TallTed https://www.w3.org/TR/sparql11-query/#subqueries are different from https://www.w3.org/TR/sparql11-query/#GroupPatterns.

(Note: you can navigate the grammar easily here: http://rawgit2.com/VladimirAlexiev/grammar-diagrams/master/sparql11-grammar.xhtml#GroupGraphPattern)

The translation of the two is also different https://www.w3.org/TR/sparql11-query/#sparqlTranslateGraphPatterns.

I get the pattern does not contain variables from this dbpedia query

@afs bind07.rq looks like it covers the matter
{ BIND(?o+1 AS ?z) } UNION { BIND(?o+2 AS ?z) }

I don't think it does. It's clear that the bindings between two UNION clauses are independent. My objection is why bindings from the outer scope are not used in the inner.

But anyway, I see I'm in the wrong, though I still think this is a horrible mis-feature of SPARQL. Closing the issue.

@afs
Copy link
Collaborator

afs commented Oct 30, 2019

The query is:

{ 
  ?s ?p ?o .
  { BIND(?o+1 AS ?z) } UNION { BIND(?o+2 AS ?z) }
}

and ?z does not get bound either arm of the union because ?o is not available in the union. Whether BIND or pattern matching on the left side of the join makes no difference.

@lisp
Copy link
Contributor

lisp commented Jan 24, 2020

this issue is referenced recently in a discussion about the expected behaviour of rdf4j.
(https://groups.google.com/forum/#!topic/rdf4j-users/cV2SzgLb7EE)
the course of that discussion shows that one thing which would make it easier to understand this issue would be to drop the concept of "bottom-up" or "inside-out" evaluation and rephrase the issue just in terms of variable scope.
evaluation order does not matter.

consider this variant of the query which introduced this issue:

select * {
  bind(2 as ?WD)
 # bind(1 as ?WD) # there was intended to be this '#' here to indicate alternatives
  {
    bind(1 as ?WD)
    bind('a' as ?union)
  } union {
    bind(2 as ?WD)
    bind('b' as ?union)
  }
}

the evaluation order does not affect the result - the scoping rules do.

@afs
Copy link
Collaborator

afs commented Jan 26, 2020

The scoping rules are a syntax issue and may be useful to explain evaluation as well.

In the above example, the query isn't legal syntax - the example tries to bind ?WD twice (at the top).

https://www.w3.org/TR/sparql11-query/#variableScope

(Note: the example has been edited since. It was:

select * {
    bind(2 as ?WD)
    bind(1 as ?WD)
    ...

@lisp
Copy link
Contributor

lisp commented Jan 26, 2020

independent of the lexocograhic error in the initial version, above, is the following query also illegal "syntax"?

select * {
  # bind(2 as ?WD)
  # bind(1 as ?WD)
  {
    bind(1 as ?WD)
  } union {
    bind(2 as ?WD)
  }
}

@afs
Copy link
Collaborator

afs commented Jan 26, 2020

That's OK. It is two attempts to bind a variable in the same row that is not allowed (not via a join).

The union is two different rows. It can be joined with a bind in another block.

@lisp
Copy link
Contributor

lisp commented Jan 26, 2020

please re-prase this:

Its two attempts to bind a variable in the same row that is not allowed (not via a join).

as it reads, neither does it refer to a "syntax issue" nor is the meaning of "not via a join" clear.
to more clearly express my confusion, is the following legal syntax?
is there some aspect of sparql sematics which precludes it?

select * {
  {  bind(1 as ?WD) } 
  {  bind(2 as ?WD) }
}

@afs
Copy link
Collaborator

afs commented Jan 26, 2020

That is legal, there two parts, echo adding a binding to an empty solution.

The ?WD are combined by a join (try it at sparql.org).

@lisp
Copy link
Contributor

lisp commented Jan 26, 2020

i agree that it is legal. i tried it here.
there is another variant to be found there as well,

select * {
  bind(1 as ?WD)
  {  bind(1 as ?WD) } 
  {  bind(1 as ?WD) }
  # bind(1 as ?WD)
}

where i understand that implementation to be correct in that the first variant yields a result under sparql binding semantics, but the second does not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants