Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression/Change in 4.0 M3: query parser fails with "variable 'o1' in projection not present in GROUP BY" (ORDER BY clause) #3782

Closed
aschwarte10 opened this issue Apr 8, 2022 · 8 comments · Fixed by #3788
Assignees
Labels
🐞 bug issue is a bug 📦 sparql affects the SPARQL parser / engine specification issues related to compliance to standards and external specs
Milestone

Comments

@aschwarte10
Copy link
Contributor

Current Behavior

We have applied the 4.0 M3 milestone in our application, and one of our query parsing/rendering unit tests is failing (while it was working with 3.7.x and 4.0-M2)

It looks like some regression (or at least a change) was introduced in the recent milestone

I have reduced the query example to the minimum where it occurs at parsing time

SELECT DISTINCT ?s (?o AS ?o1) 
WHERE {
	?s ?p ?o 
} GROUP BY ?s ?o 
ORDER BY ?s ?o1 

Fails with: variable 'o1' in projection not present in GROUP BY

Expected Behavior

As this is a change compared to previous versions, we need to have a good understanding of "correctness w.r.t the SPARQL spec". I do not have sufficient knowledge whether ORDER BY is executed on the final projected result or on an intermediate result.

If the change / regression is not intended and correct w.r.t the spec, the expectation is that the query parser behaves as in previous versions of RDF4J

Steps To Reproduce

Can be reproduced with the following unit test snippet

String query = "SELECT DISTINCT ?s (?o AS ?o1) \n"
                + "WHERE {\n"
                + "    ?s ?p ?o \n"
                + "} GROUP BY ?s ?o \n"
                + "ORDER BY ?s ?o1 ";
QueryParserUtil.parseOperation(QueryLanguage.SPARQL, query, null);

which fails with

org.eclipse.rdf4j.query.MalformedQueryException: variable 'o1' in projection not present in GROUP BY.
	at org.eclipse.rdf4j.query.parser.sparql.SPARQLParser.buildQueryModel(SPARQLParser.java:220)
	at org.eclipse.rdf4j.query.parser.sparql.SPARQLParser.parseQuery(SPARQLParser.java:178)
	at org.eclipse.rdf4j.query.parser.QueryParserUtil.parseOperation(QueryParserUtil.java:49)
	...
Caused by: org.eclipse.rdf4j.query.parser.sparql.ast.VisitorException: variable 'o1' in projection not present in GROUP BY.
	at org.eclipse.rdf4j.query.parser.sparql.TupleExprBuilder.visit(TupleExprBuilder.java:659)
	at org.eclipse.rdf4j.query.parser.sparql.TupleExprBuilder.visit(TupleExprBuilder.java:235)
	at org.eclipse.rdf4j.query.parser.sparql.ast.ASTSelect.jjtAccept(ASTSelect.java:32)
	at org.eclipse.rdf4j.query.parser.sparql.TupleExprBuilder.visit(TupleExprBuilder.java:398)
	at org.eclipse.rdf4j.query.parser.sparql.TupleExprBuilder.visit(TupleExprBuilder.java:235)
	at org.eclipse.rdf4j.query.parser.sparql.ast.ASTSelectQuery.jjtAccept(ASTSelectQuery.java:24)
	at org.eclipse.rdf4j.query.parser.sparql.TupleExprBuilder.visit(TupleExprBuilder.java:347)
	at org.eclipse.rdf4j.query.parser.sparql.TupleExprBuilder.visit(TupleExprBuilder.java:235)
	at org.eclipse.rdf4j.query.parser.sparql.ast.ASTQueryContainer.jjtAccept(ASTQueryContainer.java:26)
	at org.eclipse.rdf4j.query.parser.sparql.SPARQLParser.buildQueryModel(SPARQLParser.java:218)
	... 29 more

Version

4.0.0-M3

Are you interested in contributing a solution yourself?

No response

Anything else?

I would like to see a good assessment (and potentially a discussion) on this (cc @jeenbroekstra , @hmottestad )

@aschwarte10 aschwarte10 added the 🐞 bug issue is a bug label Apr 8, 2022
@hmottestad hmottestad added the specification issues related to compliance to standards and external specs label Apr 8, 2022
@hmottestad
Copy link
Contributor

We should test this against other implementations to see how they handle it.

@abrokenjester
Copy link
Contributor

I would also like to isolate the change that causes this difference.

@abrokenjester
Copy link
Contributor

abrokenjester commented Apr 8, 2022

By the way, given that this query does not even involve an aggregate, I don't believe it should fail on this. I'm taking a closer look at the test cases and what changes we recently made that could have introduced this regression. Most likely candidate for this is GH-2990.

@abrokenjester
Copy link
Contributor

I've confirmed that this regression is a side effect of GH-2990.

@abrokenjester
Copy link
Contributor

Relevant section in the query spec is https://www.w3.org/TR/sparql11-query/#aggregateRestrictions :

In a query level which uses aggregates, only expressions consisting of aggregates and constants may be projected, with one exception. When GROUP BY is given with one or more simple expressions consisting of just a variable, those variables may be projected from the level.

The above section is why we do these variable presence checks when parsing a GROUP BY clause. However, this really should only apply if the query involves aggregates. Using a GROUP BY without an aggregate is kind of pointless, but it certainly should not result in an error.

If I read the restrictions correcltly, if this query had involved an aggregate, it would have been correct to throw an error here, as (?o as ?o1) is not a "simple expression consisting of just a variable".

For example:

SELECT (COUNT(?s) as ?count) (?o AS ?o1) 
WHERE {
	?s ?p ?o 
} GROUP BY ?s ?o 
ORDER BY ?s ?o1 

is not strictly legal. You'd have to rewrite like htis:

SELECT (COUNT(?s) as ?count) ?o1
WHERE {
	?s ?p ?o 
} GROUP BY ?s (?o as ?o1) 
ORDER BY ?s ?o1 

@abrokenjester abrokenjester self-assigned this Apr 9, 2022
@abrokenjester abrokenjester added this to the 4.0.0 milestone Apr 9, 2022
abrokenjester added a commit that referenced this issue Apr 9, 2022
…gregate

- covers SPARQL Negative parser test cases :group06 and :group07
abrokenjester added a commit that referenced this issue Apr 10, 2022
…gregate

- covers SPARQL Negative parser test cases :group06 and :group07
abrokenjester added a commit that referenced this issue Apr 10, 2022
…gregate

- covers SPARQL Negative parser test cases :group06 and :group07
@abrokenjester
Copy link
Contributor

Hm, there are two syntax test cases in the W3C SPARQL test suite that seem to contradict what I said about it only applying when aggregates are involved.

I am seeking clarification from the wider community (via the sparql 1.2 group), but right now, my reading of the standard is as follows:

When a query has a GROUP BY clause, the projection may only contain:

  1. aggregate expressions
  2. constants
  3. simple expressions consisting of just a variable (if and only if that var is in the GROUP BY)

What it hinges on is what is covered by "simple expressions consisting of just a variable" - I previously read that strictly as only being a single variable, and not an aliasing expression (such as (?o AS ?o1)). But arguably an aliasing expression is also simple, and its source is just a variable (the alias target is just that: an alias, not a variable in its own right).

If we apply this to your example query:

SELECT DISTINCT ?s (?o AS ?o1) 
WHERE {
	?s ?p ?o 
} GROUP BY ?s ?o 
ORDER BY ?s ?o1 

should be syntactically legal, because:

  1. ?s is a single var (and present in the GROUP BY)
  2. (?o AS ?o1) is a simple expression consisting of just a variable (?o), which is in the GROUP BY.

@abrokenjester
Copy link
Contributor

The above seems to be borne out by responses on the sparql 1.2 mailinglist. So for now I'm going with this interpretation as it is consistent and easy to implement.

@aschwarte10
Copy link
Contributor Author

Thanks a lot @jeenbroekstra for following up on this 👍

I think the above interpretation of considering "renamings" as simple expressions makes totally sense and is in-line with what I would expect as a user.

This morning I have done some further testings accross some different databases. It looks like the interpretation and evaluation there is very much the same for my example query:

SELECT DISTINCT (?t AS ?type) (COUNT(?s) AS ?cnt)
WHERE {
	?s a ?t
} GROUP BY ?t
ORDER BY ?type 

Note that for this query RDF4J 4.0 M3 fails

As a second note: the original query in our unit tests has some more expressions, and also contains an aggregation (COUNT). In my issue description I just tried to reduce it to the minimal.

As next step I'll now look at your PR and I will also put the snapshot jar build of that branch into our code-base for cross validation

patrickwyler pushed a commit to patrickwyler/rdf4j that referenced this issue Jun 20, 2022
patrickwyler pushed a commit to patrickwyler/rdf4j that referenced this issue Jun 20, 2022
…even if no aggregate

- covers SPARQL Negative parser test cases :group06 and :group07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug issue is a bug 📦 sparql affects the SPARQL parser / engine specification issues related to compliance to standards and external specs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants